Question

我想将apscheduler与scrapy。结合起来，但是我的代码是错误的。我应该如何修改？

settings = get_project_settings()
configure_logging(settings)
runner = CrawlerRunner(settings)

@defer.inlineCallbacks
def crawl():
    reactor.run()
    yield runner.crawl(Jobaispider)#this is my spider
    yield runner.crawl(Jobpythonspider)#this is my spider
    reactor.stop()

sched = BlockingScheduler()
sched.add_job(crawl, 'date', run_date=datetime(2018, 12, 4, 10, 45, 10))
sched.start()

Error：builtins.ValueError：信号仅在主线程中起作用

Answer 1

此问题已在以下位置得到了详细解答：How to integrate Flask & Scrapy?，其中涵盖了各种用例和想法。我还发现该线程中的链接之一非常有用：https://github.com/notoriousno/scrapy-flask

要直接回答您的问题，请尝试一下。它使用以上两个链接中的解决方案，特别是使用钩针编织库。

import crochet
crochet.setup()

settings = get_project_settings()
configure_logging(settings)
runner = CrawlerRunner(settings)

# Note: Removing defer here for the example
#@defer.inlineCallbacks

@crochet.run_in_reactor
def crawl():
    runner.crawl(Jobaispider)#this is my spider
    runner.crawl(Jobpythonspider)#this is my spider

sched = BlockingScheduler()
sched.add_job(crawl, 'date', run_date=datetime(2018, 12, 4, 10, 45, 10))
sched.start()

Apscheduler + scrapy信号仅在主线程中有效

1 个答案: