无法从网页中删除内容

时间:2018-02-16 19:31:20

标签: c# web-scraping html-agility-pack

我是网络抓取新手。我试图抓一个网页,但无法这样做。我试图添加不同的标题。这是我的代码:

        private static async void GetPersonHtmlAsync()
    {
        var url = "http://www.lrs.lt/sip/portal.show?p_r=8801&p_k=1&p_a=498&p_asm_id=51970";

        string html = await GetPageAsStringAsync(url);

        var htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(html);

        Console.WriteLine(htmlDocument.ParsedText);

        var HeadHtml = htmlDocument.DocumentNode.Descendants("head").ToList();

        var Name = HeadHtml[0].Descendants("title").FirstOrDefault().InnerText;

        Console.WriteLine(Name);
    }

    public static async Task<string> GetPageAsStringAsync(string url)
    {
        HttpClient x = new HttpClient();
        x.DefaultRequestHeaders.Add("user-agent", 
            "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)");
        HttpResponseMessage response = await x.GetAsync(url);
        string content = await response.Content.ReadAsStringAsync();
        return content;
    }

这是回复:

<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe src="/_Incapsula_Resource?CWUDNSAI=9&xinfo=10-4507639-0%202CNN%20RT%281518808565894%2010%29%20q%280%20-1%20-1%20-1%29%20r%280%20-1%29%20B12%284%2c316%2c0%29&incident_id=723000330015808054-22736350026596890&edet=12&cinfo=04000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 723000330015808054-22736350026596890</iframe></body></html>

网页似乎使用阻止机器人请求的服务。我已经搜索了一个解决方案,我能找到的唯一建议是更改标题,以便我的呼叫似乎来自浏览器。但它没有用。

0 个答案:

没有答案