从scrapy请求中获取401响应

时间:2021-03-03 07:19:05

标签: python-3.x api python-requests scrapy http-headers

我正在尝试从此 page 中提取表数据。在网络工具中导航后,我发现 api 调用可以为我提供所需的表数据,因此我尝试使用 python scrapy 模拟请求。这是代码和响应消息。

In [27]: url                                                                    
Out[27]: 'https://www.barchart.com/proxies/core-api/v1/quotes/get?symbol=MSFT&lists=stocks.inSector.all(-COSO)&fields=symbol,symbolName,weightedAlpha,lastPrice,priceChange,percentChange,highPrice1y,lowPrice1y,percentChange1y,tradeTime,symbolCode,symbolType,hasOptions&orderBy=weightedAlpha&orderDir=desc&meta=field.shortName,field.type,field.description&hasOptions=true&page=1&limit=100&raw=1'

In [28]: headers                                                                
Out[28]: {'X-XSRF-TOKEN': 'eyJpdiI6Ims2ZVJxT3pRRUplSCtLZXRVZXA3cXc9PSIsInZhbHVlIjoiaDJaQ0hhVWQwUU9zMEQ2S1FqVEVxR3hPYTJYRzd3d0VWWkZzMUhYQmRPSGVoaWVtTnBNUXZzdkJhTngvS2xNLyIsIm1hYyI6Ijc3MzY1N2M4ZDljMWQ4MDY4OTA5ZGQwNmUzYThiNDNkMDNlZDUyZmQ1Mjc4ZTU0MzkwMjA3ZDFmMDAwMTdkYTMifQ=='}

In [29]: fetch(scrapy.Request(url,headers=headers))                             
2021-03-03 12:12:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.barchart.com/proxies/core-api/v1/quotes/get?symbol=MSFT&lists=stocks.inSector.all(-COSO)&fields=symbol,symbolName,weightedAlpha,lastPrice,priceChange,percentChange,highPrice1y,lowPrice1y,percentChange1y,tradeTime,symbolCode,symbolType,hasOptions&orderBy=weightedAlpha&orderDir=desc&meta=field.shortName,field.type,field.description&hasOptions=true&page=1&limit=100&raw=1> (referer: None)

我在标题或其他地方遗漏了什么吗?

1 个答案:

答案 0 :(得分:1)

当您访问 https://www.barchart.com/stocks/quotes/MSFT/competitors 时,您会收到带有 set-cookie=larvel-token... 和其他一些 cookie 的响应标头。我尝试了所有 cookie,laravel-token 是用于身份验证的 cookie。您还需要提取已经提取的 x-xsrf-token。

在 Scrapy 中解决您的问题。首先确保您在 settings.py 中启用了 cookie。 然后向:https://www.barchart.com/stocks/quotes/MSFT/competitors 发送请求。在该请求的解析方法中,您将下一个请求发送到您在上面发送的 url。然后 Scrapy 会自动处理 cookie。

这是一个对我有用的示例蜘蛛(我很草率地提取了 xsrf 令牌,您可能有更好的方法):

import re
from urllib.parse import unquote
import scrapy

class TestSpider(scrapy.Spider):
    name='testspider'
    
    def start_requests(self):
        yield scrapy.Request(
            url='https://www.barchart.com/stocks/quotes/MSFT/competitors',
        )

    def parse(self, response):
        for set_cookie in response.headers.getlist('Set-Cookie'):
            try:
                xsrf_token = re.findall(r'XSRF-TOKEN=(\w+==);', unquote(set_cookie.decode('utf-8')))[0]
            except IndexError:
                pass

        yield scrapy.Request(
            url='https://www.barchart.com/proxies/core-api/v1/quotes/get?'\
                'symbol=MSFT&lists=stocks.inSector.all(-COSO)&fields=symb'\
                'ol,symbolName,weightedAlpha,lastPrice,priceChange,percen'\
                'tChange,highPrice1y,lowPrice1y,percentChange1y,tradeTime'\
                ',symbolCode,symbolType,hasOptions&orderBy=weightedAlpha&'\
                'orderDir=desc&meta=field.shortName,field.type,field.desc'\
                'ription&hasOptions=true&page=1&limit=100&raw=1',
            callback=self.parse_data,
            headers={
                'x-xsrf-token': xsrf_token
            }
        )
    
    def parse_data(self, response):
        pass

输出

2021-03-03 12:26:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.barchart.com/stocks/quotes/MSFT/competitors> (referer: None)
2021-03-03 12:26:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.barchart.com/proxies/core-api/v1/quotes/get?symbol=MSFT&lists=stocks.inSector.all(-COSO)&fields=symbol,symbolName,weightedAlpha,lastPrice,priceChange,percentChange,highPrice1y,lowPrice1y,percentChange1y,tradeTime,symbolCode,symbolType,hasOptions&orderBy=weightedAlpha&orderDir=desc&meta=field.shortName,field.type,field.description&hasOptions=true&page=1&limit=100&raw=1> (referer: https://www.barchart.com/stocks/quotes/MSFT/competitors)
相关问题