Question

我遇到了一些奇怪的事情，我想要你的意见。

有一个网页包含span元素，其中包含InnerText和InnerHtml属性中的一些希腊文字。

页面的编码为希腊语（Windows）。

我的if声明是：

if (mySpan != null && mySpan.InnerText.Contains(greekText))

此行有效率为100％，但之前的非工作代码为：

if (mySpan != null && browser.DocumentText.Contains(greekText))

这一行不起作用，当我点击调试器的预览时，我注意到希腊文本是不可读的（奇怪的符号而不是希腊字符）。但是，应用程序成功读取了包含希腊文本的所有其他元素，即我可以将它们的属性保存在变量中并使用它们。有没有解释为什么DocumentText失败并且InnerText成功了？

Answer 1

查看WebBrowser.DocumentText的来源，默认情况下它会使用UTF8编码：

public string DocumentText
{
  get
  {
    Stream documentStream = this.DocumentStream;
    if (documentStream == null)
      return "";
    StreamReader streamReader = new StreamReader(documentStream);
    documentStream.Position = 0L;
    return streamReader.ReadToEnd();
  }

也就是说，使用StreamReader而不指定编码将采用UTF8编码。

请参阅this link了解此问题

我只能假设使用browser.Document.GetElementById(mySpanId)尊重页面的规定编码，这就是使用此调用时正确看到的原因。

WebBrowser DocumentText编码

1 个答案: