WebBrowser切断了DocumentText

时间:2013-08-14 22:11:26

标签: c# browser html-agility-pack

我有一个简单(和奇怪)的问题。当我手动将WebBrowser.DocumentText属性设置为某个HTML字符串时,它会在随机字符后将其剪切掉。我使用的HTML是其他页面的简单HTML,通过HtmlAgilityPack下载(在实际应用程序中我对它进行了一些处理,但即使没有任何处理,也存在错误)。当我在Internet Explorer中加载相同的页面时,整个页面都会正确呈现。

这是最小的例子:

const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
var doc = new HtmlWeb().Load(url);

HtmlNode basehref = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "base" };
basehref.Attributes.Add("href", url.Substring(0, url.LastIndexOf("/") + 1));
doc.DocumentNode.SelectSingleNode("//head").ChildNodes.Insert(0, basehref);

string html;
using (var writer = new StringWriter()) {
    doc.Save(writer);
    html = writer.ToString();
}

var thread = new Thread(() => {
    var browser = new WebBrowser {
        Location = new Point(0, 0),
        Size = new Size(1920, 1080),
        ScriptErrorsSuppressed = true,
        AllowNavigation = true,
        DocumentText = html
    };
    browser.DocumentCompleted += (sender, e) => {
        Console.WriteLine(html.Length);
        Console.WriteLine(browser.DocumentText.Length);
        Application.ExitThread();
    };
    Application.Run();
});
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();

输出:

35259
20477

2 个答案:

答案 0 :(得分:3)

我尝试了没有Application.ExitThread()的代码,当它转过来时,DocumentCompleted被触发两次,第二次看起来是正确的。因此,您尝试加载的网站可能具有一些动态内容或正在刷新自己。我没有挖掘它的功能,而是继续删除所有脚本,样式和iframe:

    const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
    var doc = new HtmlWeb().Load(url);

    doc.DocumentNode.Descendants()
                    .Where(n => n.Name == "script" || n.Name == "style" || n.Name == "iframe")
                    .ToList()
                    .ForEach(n => n.Remove());

现在DocumentCompleted被触发一次,文档长度一致。

答案 1 :(得分:0)

我用这种方式解决了:

const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
var doc = new HtmlWeb().Load(url);

HtmlNode basehref = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "base" };
basehref.Attributes.Add("href", url.Substring(0, url.LastIndexOf("/") + 1));
doc.DocumentNode.SelectSingleNode("//head").ChildNodes.Insert(0, basehref);

string html;
using (var writer = new StringWriter()) {
    doc.Save(writer);
    html = writer.ToString();
}

var thread = new Thread(() => {
    var browser = new WebBrowser {
        Location = new Point(0, 0),
        Size = new Size(1920, 1080),
        ScriptErrorsSuppressed = true,
        AllowNavigation = true,
        DocumentText = html
    };
    browser.DocumentCompleted += (sender, e) => {
        Console.WriteLine(html.Length);
        Console.WriteLine(browser.DocumentText.Length);
        //Application.ExitThread();

        if (browser.ReadyState == WebBrowserReadyState.Complete)
        {
                Application.ExitThread();   // Stops the thread
        }
    };
    Application.Run();
});
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();