使用下载中间件

时间:2016-06-04 03:32:53

标签: python proxy scrapy

我是Scrapy的新手,我正在尝试构建自己的Downloader Middleware,以便通过代理来抓取网络。我收到了这个错误:

Traceback (most recent call last):
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
    result = g.send(result)
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/crawler.py", line 90, in crawl
    six.reraise(*exc_info)
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/crawler.py", line 72, in crawl
    self.engine = self._create_engine()
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/crawler.py", line 97, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/core/engine.py", line 68, in __init__
    self.downloader = downloader_cls(crawler)
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "/Users/bli1/Development/projects/hinwin/chisel/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named downloaders.downloader_middlewares.proxy_connect

此错误是由于Scrapy无法找到我的中间件。我不确定这是否是由于我没有设置正确的路径或者我的中间件出错了。

这是我的项目结构:

/chisel
    __init__.py
    pipelines.py
    items.py
    settings.py
    /downloaders
        __init__.py
        /downloader_middlewares
            __init__.py
        proxy_connect.py
        /resources
          config.json
    /spiders
        __init__.py
        craiglist_spider.py
        /spider_middlewares
            __init__.py
        /resources
          craigslist.json
scrapy.cfg

在我的settings.py中,我有

DOWNLOADER_MIDDLEWARES = {
    'downloaders.downloader_middlewares.proxy_connect.ProxyConnect': 100,
    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110
}

1 个答案:

答案 0 :(得分:1)

根据docs,路径应该包含项目('myproject.middlewares.CustomDownloaderMiddleware'),我认为它应该是:

'chisel.downloaders.downloader_middlewares.proxy_connect.ProxyConnect': 100