Question

我试图在网站https://www.yemeksepeti.com/en/istanbul/zeytinburnu-merkezefendi-mah-cevizlibag中获取所有餐厅名称使用HtmlAgilityPack：

Uri url = new Uri("https://www.yemeksepeti.com/en/istanbul/zeytinburnu-merkezefendi-mah-cevizlibag");
WebClient client = new WebClient();
string downloadString = client.DownloadString(url);

HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(downloadString);

HtmlNodeCollection nodes = document.DocumentNode.SelectNodes("//a[@class='restaurantName withTooltip']");

foreach(var node in nodes) {
  listBox1.Items.Add(node.InnerText);
}

这很好用！

但是，另一方面，我真正想做的是深入研究并获得MainCuisineName：

<a class="restaurantName withTooltip" href="/en/meshur-merkez-kofte-zeytinburnu-merkezefendi-mah-istanbul" target="_parent" data-hasqtip="1">
  <span data-tooltip="{&quot;MainCuisineName&quot;:&quot;Meatball&quot;&quot;cc_genel.gif}">Meşhur Merkez Köfte, Zeytinburnu (Merkezefendi Mah.)</span>
</a>

我如何从中获得MainCuisineName，即“肉丸” 相同的网址？我尝试过：

HtmlNodeCollection nameNodes = doc.DocumentNode.SelectNodes("//*[@class='restaurantName withTooltip']/span='MainCuisineLabelName'");

foreach(var node in nameNodes) {
  listBox1.Items.Add(node.InnerText);
}

但是它显然不起作用。

有什么建议吗？

Answer 1

这就是我得到的：

Uri filteredurl = new Uri("https://www.yemeksepeti.com/en/istanbul/zeytinburnu-merkezefendi-mah-cevizlibag#kt:b5ceacf5-9724-4751-a600-78d35cfcf72b,24ef27f9-32d5-44ff-993c-21e59b0f6f83");

HtmlNodeCollection nodes = 
document.DocumentNode.SelectNodes("//a[@class='restaurantName withTooltip']");

foreach(var node in nodes) {
   listBox1.Items.Add(node.InnerText);
}

然后使用过滤器： #kt:b5ceacf5-9724-4751-a600-78d35cfcf72b，24ef27f9-32d5-44ff-993c-21e59b0f6f83，然后再次执行搜索。每个类别（肉丸，快餐等）都有一个唯一的过滤器。也查看此link，您就知道了。

分为跨度数据-解析HTML网站

1 个答案: