我有一个简单(和奇怪)的问题。当我手动将WebBrowser.DocumentText
属性设置为某个HTML字符串时,它会在随机字符后将其剪切掉。我使用的HTML是其他页面的简单HTML,通过HtmlAgilityPack下载(在实际应用程序中我对它进行了一些处理,但即使没有任何处理,也存在错误)。当我在Internet Explorer中加载相同的页面时,整个页面都会正确呈现。
这是最小的例子:
const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
var doc = new HtmlWeb().Load(url);
HtmlNode basehref = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "base" };
basehref.Attributes.Add("href", url.Substring(0, url.LastIndexOf("/") + 1));
doc.DocumentNode.SelectSingleNode("//head").ChildNodes.Insert(0, basehref);
string html;
using (var writer = new StringWriter()) {
doc.Save(writer);
html = writer.ToString();
}
var thread = new Thread(() => {
var browser = new WebBrowser {
Location = new Point(0, 0),
Size = new Size(1920, 1080),
ScriptErrorsSuppressed = true,
AllowNavigation = true,
DocumentText = html
};
browser.DocumentCompleted += (sender, e) => {
Console.WriteLine(html.Length);
Console.WriteLine(browser.DocumentText.Length);
Application.ExitThread();
};
Application.Run();
});
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
输出:
35259
20477
答案 0 :(得分:3)
我尝试了没有Application.ExitThread()
的代码,当它转过来时,DocumentCompleted
被触发两次,第二次看起来是正确的。因此,您尝试加载的网站可能具有一些动态内容或正在刷新自己。我没有挖掘它的功能,而是继续删除所有脚本,样式和iframe:
const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
var doc = new HtmlWeb().Load(url);
doc.DocumentNode.Descendants()
.Where(n => n.Name == "script" || n.Name == "style" || n.Name == "iframe")
.ToList()
.ForEach(n => n.Remove());
现在DocumentCompleted
被触发一次,文档长度一致。
答案 1 :(得分:0)
const string url = "http://www.zip-codes.com/county/IL-COOK.asp";
var doc = new HtmlWeb().Load(url);
HtmlNode basehref = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "base" };
basehref.Attributes.Add("href", url.Substring(0, url.LastIndexOf("/") + 1));
doc.DocumentNode.SelectSingleNode("//head").ChildNodes.Insert(0, basehref);
string html;
using (var writer = new StringWriter()) {
doc.Save(writer);
html = writer.ToString();
}
var thread = new Thread(() => {
var browser = new WebBrowser {
Location = new Point(0, 0),
Size = new Size(1920, 1080),
ScriptErrorsSuppressed = true,
AllowNavigation = true,
DocumentText = html
};
browser.DocumentCompleted += (sender, e) => {
Console.WriteLine(html.Length);
Console.WriteLine(browser.DocumentText.Length);
//Application.ExitThread();
if (browser.ReadyState == WebBrowserReadyState.Complete)
{
Application.ExitThread(); // Stops the thread
}
};
Application.Run();
});
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();