创建一个通用的scrapy蜘蛛和多个特定的

时间:2017-06-17 13:03:31

标签: python scrapy scrapy-spider

我正在尝试创建一个通用蜘蛛来处理最常见的任务和特定的蜘蛛,这些蜘蛛继承了通用任务并声明了网站特定的变量。

genericspider.py

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import Spider, CrawlSpider

class GenericProductSpider(scrapy.Spider):
    def __init__(self, start_urls=[], finditemprop='', keywords='', **kwargs):
        CrawlSpider.__init__(self, **kwargs)
        print ( "\n\n Init Generic \n" )

然后我将specificspider.py放在与通用目录相同的目录中。

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import Spider, CrawlSpider
from .genericfabric import GenericFabricsSpider

class SpecificSpider(GenericProductSpider):

    def __init__(self, **kwargs):
        print ( "\n init specific \n" )
        name = "specific1"
        start_urls = ['http://www.specificdomian.com',]

        super(SpecificSpider, self).__init__(name, start_urls, **kwargs)

我似乎对如何正确调用超类的初始化程序有了解。我收到了各种错误消息,但通用蜘蛛的 init 方法从未被执行过。

1 个答案:

答案 0 :(得分:0)

实际上......似乎工作正常 - 可能只是参数问题。

超类的工作代码:

# -*- coding: utf-8 -*-
from scrapy.spiders import Spider
from test.items import TestItem


class TestsuperSpider(Spider):
    name = "testsuper"
    allowed_domains = ["craigslist.org"]
    start_urls = ["http://sfbay.craigslist.org/search/npo"]
    supervar = "meine super var"

    def __init__(self):
        print ( "super init" )

    def parse(self, response):
        print ( "super Parse" )

    def supermethod ( self, subvar ):
        print ( "\n\n Supermethod \n\n " )
        print ( self.supervar + " - " + subvar )

子类:

# -*- coding: utf-8 -*-
from scrapy.spiders import Spider
from test.items import TestItem
from test.spiders.testsuper import TestsuperSpider


class TestsubSpider(TestsuperSpider):
    name = "testsub"
    allowed_domains = ["craigslist.org"]
    start_urls = ["http://sfbay.craigslist.org/search/npo"]
    subvar = "subvar"

    def __init__(self):
        print ( "sub init" )
        super(TestsubSpider, self).__init__()

    def parse(self, response):
        super(TestsubSpider, self).supermethod(self.subvar)
        print ( "sub Parse" )

现在它需要清理并调整它的目的,但至少代码按预期运行。