无法下载utf-8网页内容

时间:2013-02-22 23:12:21

标签: c# utf-8 webclient

我有一个简单的代码来获取越南网站的响应:http://vnexpress.net,但是有一个小问题。这是第一次,它下载确定,但在那之后,内容包含这样的未知符号: \ b \ 0 \ 0 \ 0 \ 0 \ 0 \0 \a`I %& / m ...有什么问题?

    string address = "http://vnexpress.net";
    WebClient webClient = new WebClient();
    webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
    webClient.Encoding = System.Text.Encoding.UTF8;
    return webClient.DownloadString(address);

3 个答案:

答案 0 :(得分:9)

你会发现响应是GZipped。除非您创建派生类并修改基础WebClient以允许自动解压缩,否则似乎无法使用HttpWebRequest下载该方法。

这是你如何做到的:

    public class MyWebClient : WebClient
    {
        protected override WebRequest GetWebRequest(Uri address)
        {
            var req = base.GetWebRequest(address) as HttpWebRequest;
            req.AutomaticDecompression = DecompressionMethods.GZip;
            return req;
        }
    }

使用它:

string address = "http://vnexpress.net";
MyWebClient webClient = new MyWebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
webClient.Encoding = System.Text.Encoding.UTF8;
return webClient.DownloadString(address);

答案 1 :(得分:1)

尝试使用代码,你会没事的:

string address = "http://vnexpress.net";

WebClient webClient = new WebClient();

webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64)   AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");

return Encoding.UTF8.GetString(Encoding.Default.GetBytes(webClient.DownloadString(address)));             

答案 2 :(得分:0)

DownloadString要求服务器在Content-Type响应头中正确指示charset。如果你在Fiddler中观看,你会发现服务器会在HTML响应体中的META标签内发送字符集:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />   

如果您需要处理这样的响应,您需要自己解析HTML或使用像FiddlerCore这样的库来为您执行此操作。