python3登录网站tumblr.com

时间:2016-08-14 12:58:23

标签: python python-3.x web-crawler

如何使用python3中的请求登录tumblr? 这是我的代码,但它不能正常工作并返回登录页面。 我使用request.post发布登录表单数据,但失败了。

import requests
from bs4 import BeautifulSoup

start_url = 'https://www.tumblr.com'

# set a session for request
s = requests.Session()
s.headers.update({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0', 'accept-language': 'zh-CN,zh;'}
                 )

# get the form_key for login_in
r = s.get(start_url)
login_soup = BeautifulSoup(r.text, 'lxml')
hidden_div = login_soup.find('div', class_='form_row_hidden').find_all('input')
key_dict = {}

for input_tag in hidden_div:
    tmp_dict = input_tag.attrs
    key_dict.update({tmp_dict['name']: tmp_dict['value']})

user_data_dict = {'determine_email': '×××××××××',
                  'user[email]': '××××××××',
                  'user[password]': '××××××××',
                  'user[age]': '',
                  'tumblelog[name]': ''}

key_dict.update(user_data_dict)


# log in tumblr
r_login=s.post(start_url, headers=headers, data=key_dict)

home_soup=BeautifulSoup(r.text, 'lxml')
print(home_soup)
# the output is still the log-in page.

1 个答案:

答案 0 :(得分:5)

你几乎要瞄准。

首先,您必须向tumblr登录页面(https://tumblr.com/login)发出请求。 (你做了)

然后,您必须解析html页面并获得form_key值。该值用于进行真正的登录。

最后,使用有效负载发布帖子请求:

{'user[email]': your_mail,
'user[password]': your_pass,
'form_key': form_key
}

下面是python 2中的示例代码,但我没有使用BeautifulSoup(您要求仅使用requests;)

In [1]: import requests

In [2]: from lxml import html

In [3]: url = 'https://www.tumblr.com/login'

In [4]: ua = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36'

In [5]: headers = {'User-Agent': ua}

In [6]: s = requests.session()

In [7]: lg = s.post(url, headers=headers)

In [8]: lg_html = html.fromstring(str(lg.text))

In [9]: form_key = lg_html.xpath("//meta[@name='tumblr-form-key']/@content")[0]

In [10]: payload = {'user[email]': 'your_mail',
   ....:            'user[password]': 'your_pass',
   ....:            'form_key': form_key}

In [11]: # real login

In [12]: s.post(url, headers=headers, data=payload)
Out[12]: <Response [200]>

In [13]: print s.get('https://www.tumblr.com/svc/post/get_post_form_builder_data').text
{"meta":{"status":200,"msg":"OK"},"response":{"channels":[{"name":"your_name","tags":[]}],"limits":{"videoSecondsRemaining":300,"preuploadPhotoUsed":0,"preuploadAudioUsed":0,"inlineEmbedsPerPost":5}}}