通过C#登录网站(登录,传递和城市cb)

时间:2018-01-05 18:48:02

标签: c# web httpwebrequest

我有一个应用程序,需要在网站上获取html页面的节点。问题是该页面要求用户登录。我试图找到有关登录网站的主题,人们大多有两个字段:登录和密码。
 但在我的情况下,有一个组合框与城市列表:login form screenshot. 我目前的代码:

class Program
{
    static void Main(string[] args)
    {
        var client = new CookieAwareWebClient();
        client.BaseAddress = @"https://mystat.itstep.org/ru/login";
        var loginData = new NameValueCollection();
        loginData.Add("login", "login");
        loginData.Add("password", "password");
        client.UploadValues("login.php", "POST", loginData);

        string htmlSource = client.DownloadString("index.php");
        Console.WriteLine("Logged in!");
    }
}

public class CookieAwareWebClient : WebClient
{
    private CookieContainer cookie = new CookieContainer();

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest request = base.GetWebRequest(address);
        if (request is HttpWebRequest)
        {
            (request as HttpWebRequest).CookieContainer = cookie;
        }
        return request;
    }
}

如何通过c#选择此列表中的某个城市?

1 个答案:

答案 0 :(得分:1)

您必须先进行初始GET,才能获得第一篇文章中所需的Cookie和csrf令牌。需要从第一个html响应中解析csrf令牌,以便您可以将其与用户名和密码一起提供。

这就是你的主流应该是这样的:

var client = new CookieAwareWebClient();
client.BaseAddress = @"https://mystat.itstep.org/en/login";

// do an initial get to have cookies sends to you
// have a server session initiated
// and we need to find the csrf token
var login = client.DownloadString("/");

string csrf;   
// parse the file and go looking for the csrf token
ParseLogin(login, out csrf);

var loginData = new NameValueCollection();
loginData.Add("login", "someusername");
loginData.Add("password", "somepassword");
loginData.Add("city_id", "29"); // I picked this value fromn the raw html
loginData.Add("_csrf", csrf);
var loginResult = client.UploadValues("login.php", "POST", loginData);
// get the string from the received bytes
Console.WriteLine(Encoding.UTF8.GetString(loginResult));
// your task is to make sense of this result
Console.WriteLine("Logged in!");

解析需要尽可能复杂。我只实现了一些能让你获得csrf令牌的东西。我离开了城市的解析(提示:它们以<select开头,然后每行都有<option,直到找到</select>)作为预先练习。不要再问我了。

这是csrf解析逻辑:

void ParseLogin(string html,  out string  csrf)
{
    csrf = null;
    // read each line of the html
    using(var sr = new StringReader(html))
    {
        string line;
        while((line = sr.ReadLine()) != null)   
        {
            // parse for csrf by looking for the input tag      
            if (line.StartsWith(@"<input type=""hidden"" name=""_csrf""") && csrf == null) 
            {
                // string split by space
                csrf = line
                    .Split(' ')  // split to array of strings
                    .Where(s => s.StartsWith("value"))  // value="what we need is here">
                    .Select(s => s.Substring(7,s.Length -9)) // remove value=" and the last ">
                    .First();
            }
        }
    }
}

如果您有冒险精神,可以编写html解析器,使用字符串方法疯狂,尝试一些正则表达式或使用library

请注意,抓取网站可能违反网站的服务条款。确认您正在做的事情是否被允许/不会干扰他们的操作。

相关问题