如何在第一个Spider之后使for循环中的cmdline不破坏?

时间:2017-02-07 08:30:08

标签: python scrapy scrapy-spider

scrapy中的

是否可以在cmdline.execute()循环中运行for函数?以下是示例。尝试执行时,脚本在for link in links的第一次迭代后停止,仅指示INFO: Closing spider (finished)。如何让脚本在没有破坏的情况下返回循环?

Execute.py:

from scrapy import cmdline

links = ["http://quotes.toscrape.com/page/1/", "http://quotes.toscrape.com/page/2/"]

for link in links:
    command = "scrapy crawl quotes1 -a source_url="
    command += link
    cmdline.execute(command.split())

Spider.py

import scrapy

class QuotesSpiderS(scrapy.Spider):
    name = "quotes1"

    def start_requests(self):
        urls = []
        urls.append("%s" % self.source_url)
        print(urls)
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

1 个答案:

答案 0 :(得分:0)

将链接列表移至Spider中的start_urls attribute

class QuotesSpiderS(scrapy.Spider):
    name = "quotes1"
    start_urls = ["http://quotes.toscrape.com/page/1/",
                  "http://quotes.toscrape.com/page/2/"]
相关问题