Question

使用scrapy登录instagram。我使用FormRequest发布用户名和密码。并启用COOKIES_ENABLED = True

我的scrapy代码：

import scrapy
from scrapy.http import Request, FormRequest
class InsSpider(scrapy.Spider):
    name = 'InsVideo'
    allowed_domains = ['instagram.com']

    url = 'https://www.instagram.com/'
    url_login = 'https://www.instagram.com/accounts/login/ajax/'

    def start_requests(self):
        return [Request(self.url_login, callback=self.login)] 
    def login(self, response):
        login_post = {'username': 'username',
                      'password': 'password'}
        return [FormRequest.from_response(response,  
                                          formdata=login_post,
                                          # callback=self.start_requests,
                                          dont_filter=True
                                          )]

我运行scrapy crawl InsVideo，并返回错误消息：

2017-03-18 12:15:49 [scrapy.core.engine] INFO: Spider opened
2017-03-18 12:15:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-03-18 12:15:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <200 https://www.instagram.com/robots.txt>
Set-Cookie: mid=WMy0dwALAAGACJPXOYvoxHfHO00m; expires=Fri, 13-Mar-2037 04:15:51 GMT; Max-Age=630720000; Path=/

Set-Cookie: csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi; expires=Sat, 17-Mar-2018 04:15:51 GMT; Max-Age=31449600; Path=/; Secure

2017-03-18 12:15:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.instagram.com/robots.txt> (referer: None)
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET https://www.instagram.com/accounts/login/ajax/>
Cookie: mid=WMy0dwALAAGACJPXOYvoxHfHO00m; csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi

2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <405 https://www.instagram.com/accounts/login/ajax/>
Set-Cookie: csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi; expires=Sat, 17-Mar-2018 04:15:52 GMT; Max-Age=31449600; Path=/; Secure

2017-03-18 12:15:52 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://www.instagram.com/accounts/login/ajax/> (referer: None)
2017-03-18 12:15:52 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.instagram.com/accounts/login/ajax/>: HTTP status code is not handled or not allowed
2017-03-18 12:15:52 [scrapy.core.engine] INFO: Closing spider (finished)

我不知道代码有什么问题。谢谢

Answer 1

您的url_login错误，应为https://www.instagram.com/accounts/login/。

无论如何，Istagram登录页面通过JavaScript生成登录表单。您可以使用浏览器的“查看页面源”功能查看：在生成的HTML代码中没有{{1}}标记。这正是Scrapy所看到的。您必须使用系统来运行JavaScript代码，可能是无头浏览器。

[：编辑]更正的句子

使用scrapy进行Instagram用户登录

1 个答案: