Question

向下滚动时的响应网址为：

https://dir.dummymart.com/impcat/next?mcatId=20467&prod_serv=P&mcatName=laser-cutting-machines&srt=97&end=116&ims_flag=&cityID=&fcilp=0&pr=0&pg=5&frsc=28

响应数据是这样的ajax：

{"page_var":"<div id=\"page_variables................

我的蜘蛛代码是：

import scrapy


class DummymartSpider(scrapy.Spider):
    name = 'dummymart'
    allowed_domains = ['dir.dummymart.com']
    start_urls = ['https://dir.dummymart.com/impcat/industrial-machinery.html',

                ]

    def parse(self, response):
        Company = response.xpath('//*[@class="lcname"]/text()').extract()
        product = response.xpath('//*[@class="pnm ldf cur"]/text()').extract()
        address = response.xpath('//*[@class="clg"]/text()').extract()
        phone = response.xpath('//*[@class="ls_co phn bo"]/text()').extract()

        for item in zip(Company,product,address,phone):
            scraped_info = {
                'Company':item[0],
                'Product': item[1],
                'Address':item[2],
                'phone':item[3]

            }
            yield scraped_info

如何滚动页面向下滚动后加载的数据？而且数据在ajax中而不是json中。谢谢

Answer 1

您可以通过两种方式进行处理：- 1.使用像Selenium这样的无头浏览器，或者如果您在Scrapy中工作，则还可以尝试Splash，它允许您通过scrapy运行js函数。 2.只需将页面滚动到要剪贴的位置，然后以HTML格式下载该页面，然后运行常规代码即可。

第二种方法几乎不需要人工，但是如果只想刮掉几页，那么我建议您只选择后者。

使用scrapy从无限滚动页面中删除数据？

1 个答案: