单击li按钮的最佳方法是什么?

时间:2019-05-02 12:56:50

标签: python scrapy

我正在尝试找出在www.booking.com酒店列表中单击下一页按钮并继续运行蜘蛛的最佳方法。

当检查按钮时:

<li class="nextpage"
   a href="/bigcity/offset=15"class=gotopage_2"
</li>

单页工作代码:

import scrapy
from ..items import BookItem 

class BookSpiderSpider(scrapy.Spider):
    name = "book_spider"
    start_urls = (
        'https://www.booking.com/smallcity/offset=10',
    )

    def parse(self, response) :
        items = BookItem()

        title_name = response.css('span.sr-hotel__name::text').extract()

        items['title_name'] = title_name

        yield items

每次单击按钮时,h href和class都会更改

因此,我猜测python代码应该找到该按钮,然后采用不同的href替换为现有的url并转到

2 个答案:

答案 0 :(得分:0)

您好,请在您的应用程序中使用此代码段

[{name:1,6:'',7:'',8:'',9:''},{name:2,6:'',7:'',8:'',9:''},{name:3,6:'',7:'',8:'',9:''},{name:4,6:'',7:'',8:'',9:''},{name:5,6:'',7:'',8:'',9:''}]

答案 1 :(得分:0)

用户.urljoin,以避免任何URL模式问题:

next_page_url = response.urljoin( next_href )