scrapy从多个站点获取值

时间:2017-09-16 20:52:41

标签: python web-scraping scrapy web-crawler

我正在尝试从函数传递值。

我查了一下这些文档并且只是不理解它。 参考:

cat(echo("today"),date()) | ...

这是我想要实现的psudo代码:

cat /etc/passwd | (read line ; cat)

1 个答案:

答案 0 :(得分:0)

这是你可以将任何值,链接等传递给其他方法的方法:

import scrapy

class GotoSpider(scrapy.Spider):
    name = 'goto'
    allowed_domains = ['first.com', 'second.com']
    start_urls = ['http://first.com/']

    def parse(self, response):
        name = response.xpath(...)
        link = response.xpath(...)  # link for second.com where you may find the price
        request = scrapy.Request(url=link, callback = self.parse_check)
        request.meta['name'] = name
        yield request

    def parse_check(self, response):
        name = response.meta['name']
        price = response.xpath(...)
        yield {"name":name,"price":price} #Assuming that in your "items.py" the fields are declared as name, price