WebControl中提供的解析表

时间:2012-08-21 18:55:55

标签: c# linq html-parsing dom

HTML正在使用内置WebControl的WinForm中显示

我决定尝试HTMLAgilityPack。

var query = from table in doc.DocumentNode.SelectNodes("//table[@class='TABLEBORDER').Cast<HtmlNode>() 
            from row in table.SelectNodes("tr").Cast<HtmlNode>() 
            from cell in row.SelectNodes("th|td").Cast<HtmlNode>() 
            select new {Table = table.Id, CellText = cell.InnerText}; 

foreach(var cell in query) { 
    Console.WriteLine("{0}: {1}", cell.Table, cell.CellText); 
} 

我根据@ L.B更新了代码 我得到以下输出

The thread '<No Name>' (0x1e94) has exited with code 0 (0x0).
: 
Target

: 
Triggerenabled?

: 
Account

: 
Passwordchanged?


: 
Error message(if any)

The thread '<No Name>' (0x2564) has exited with code 0 (0x0).

其他数据在网络控制中清晰可见。

1 个答案:

答案 0 :(得分:1)

  

该页面有其他表格,但我只对类为“TABLEBORDER”的表感兴趣。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var table = doc.DocumentNode.SelectSingleNode("//table[@class='TABLEBORDER']");

修改

var res = table.Descendants("tr")
               .Select(tr => tr.Descendants("td")
                               .Select(td => td.InnerText)
                               .ToList())
               .ToList();

<强> EDIT2

foreach (List<string> tr in res)
{
    foreach (string td in tr)
    {
        Console.Write("[{0}] ", td);
    }
    Console.WriteLine();
}