为什么我的反应迟钝为空?

时间:2018-11-11 15:47:33

标签: xpath scrapy

我开始

scrapy shell -s USER_AGENT='Mozilla/5.0' https://www.gumtree.com/p/property-to-rent/brand-new-modern-studio-flat-%C2%A31056pcm-all-bills-included-in-willesden-green-area/1303463798

下一步

In [5]: response                                                                                                                                                                                            
Out[5]: <405 https://www.gumtree.com/p/property-to-rent/brand-new-modern-studio-flat-%C2%A31056pcm-all-bills-included-in-willesden-green-area/1303463798>

检查页面元素后,复制XPath

In [6]: response.xpath('//*[@id="ad-title"]').extract()                                                                                                                                                     
Out[6]: []

复制externalHTML

<h1 itemprop="name" id="ad-title">Brand New Modern Studio Flat £1056pcm | All Bills Included | In Willesden Green area</h1>

图像视图响应 enter image description here

为什么?

1 个答案:

答案 0 :(得分:1)

尝试将用户代理设置为更现实的内容,例如:Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0

某些网站在用户代理上进行了一些基本的验证,如果发现奇怪的内容,则会将您重定向到一些特殊的页面。

scrapy shell -s USER_AGENT='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0' https://www.gumtree.com/p/property-to-rent/brand-new-modern-studio-flat-%C2%A31056pcm-all-bills-included-in-willesden-green-area/1303463798
>>> response.xpath('//*[@id="ad-title"]').extract()
['<h1 itemprop="name" id="ad-title">Brand New Modern Studio Flat £1056pcm | All Bills Included | In Willesden Green area</h1>']
>>>