使用scrapy登录instagram。
我使用FormRequest发布用户名和密码。并启用COOKIES_ENABLED = True
我的scrapy代码:
import scrapy
from scrapy.http import Request, FormRequest
class InsSpider(scrapy.Spider):
name = 'InsVideo'
allowed_domains = ['instagram.com']
url = 'https://www.instagram.com/'
url_login = 'https://www.instagram.com/accounts/login/ajax/'
def start_requests(self):
return [Request(self.url_login, callback=self.login)]
def login(self, response):
login_post = {'username': 'username',
'password': 'password'}
return [FormRequest.from_response(response,
formdata=login_post,
# callback=self.start_requests,
dont_filter=True
)]
我运行scrapy crawl InsVideo
,并返回错误消息:
2017-03-18 12:15:49 [scrapy.core.engine] INFO: Spider opened
2017-03-18 12:15:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-03-18 12:15:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <200 https://www.instagram.com/robots.txt>
Set-Cookie: mid=WMy0dwALAAGACJPXOYvoxHfHO00m; expires=Fri, 13-Mar-2037 04:15:51 GMT; Max-Age=630720000; Path=/
Set-Cookie: csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi; expires=Sat, 17-Mar-2018 04:15:51 GMT; Max-Age=31449600; Path=/; Secure
2017-03-18 12:15:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.instagram.com/robots.txt> (referer: None)
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET https://www.instagram.com/accounts/login/ajax/>
Cookie: mid=WMy0dwALAAGACJPXOYvoxHfHO00m; csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <405 https://www.instagram.com/accounts/login/ajax/>
Set-Cookie: csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi; expires=Sat, 17-Mar-2018 04:15:52 GMT; Max-Age=31449600; Path=/; Secure
2017-03-18 12:15:52 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://www.instagram.com/accounts/login/ajax/> (referer: None)
2017-03-18 12:15:52 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.instagram.com/accounts/login/ajax/>: HTTP status code is not handled or not allowed
2017-03-18 12:15:52 [scrapy.core.engine] INFO: Closing spider (finished)
我不知道代码有什么问题。谢谢
答案 0 :(得分:0)
您的url_login错误,应为https://www.instagram.com/accounts/login/。
无论如何,Istagram登录页面通过JavaScript生成登录表单。您可以使用浏览器的“查看页面源”功能查看:在生成的HTML代码中没有{{1}}标记。这正是Scrapy所看到的。您必须使用系统来运行JavaScript代码,可能是无头浏览器。
[:编辑]更正的句子