登录后刮取网站

时间:2014-03-07 19:12:18

标签: c# web-scraping webrequest

我正在尝试抓一个需要登录的网站。获取之前未收到的错误,过去从其他论坛成功复制了代码:

异常详细信息:System.Net.ProtocolViolationException:无法使用此动词类型发送内容正文。

代码: 流newStream = http.GetRequestStream(); //打开连接

以下是整个代码:

@{
    var strUserId = "userName";
    var strPassword = "password";
    var url = "formSubmitLandingSite";
    var url2 = "pageToScrape";
    HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
    http.KeepAlive = true;
    http.Method = "POST";
    http.ContentType = "application/x-www-form-urlencoded";
    string postData = "email=" + strUserId + "&password=" + strPassword;
    byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
    http.ContentLength = dataBytes.Length;
    using (Stream postStream = http.GetRequestStream())
    {
        postStream.Write(dataBytes, 0, dataBytes.Length);
    }
    HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
    // Probably want to inspect the http.Headers here first
    http = WebRequest.Create(url2) as HttpWebRequest;
    http.CookieContainer = new CookieContainer();
    http.CookieContainer.Add(httpResponse.Cookies);
    HttpWebResponse httpResponse2 = http.GetResponse() as HttpWebResponse;

    Stream newStream = http.GetRequestStream(); //open connection
    newStream.Write(dataBytes, 0, dataBytes.Length); // Send the data.
    newStream.Close();

    string sourceCode;
    HttpWebResponse getResponse = (HttpWebResponse)http.GetResponse();
    using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
    {
        sourceCode = sr.ReadToEnd();
    }
    Response.Write(sourceCode);
}

1 个答案:

答案 0 :(得分:0)

您在此处创建了一个新的请求对象:

http = WebRequest.Create(url2) as HttpWebRequest;

请注意,使用的默认 HTTP动词为GET。然后,您尝试在此处打开请求流:

Stream newStream = http.GetRequestStream();

此方法用于启用writing data to the request's content。但是,GET请求没有内容。正如您在错误上方的代码中所做的那样,您需要使用不同的HTTP谓词。 POST是最常见的,也是您在上面使用的内容:

http.Method = "POST";

所以请再次使用POST请求。 (当然,假设这就是服务器所期望的。无论如何,如果服务器期望内容,那么肯定期待GET请求。 )