proxylist无法在Scrapy Cloud上加载

时间:2017-05-28 18:00:13

标签: python scrapy

模块使用" https://github.com/aivarsk/scrapy-proxies" 如果链接到电脑上的现有txt磁贴,则指定的设置和PC上的设置完美无缺。

我在settings.py文件中尝试了几次Scrapy Cloud的不同方法。

我添加文件" proxylist.txt"在比设置项目相同的文件夹中,我将其上传到" https://dl.dropboxusercontent.com/s/esdm19mnvz2yguf/proxylist.txt"

我将名称替换为: PROXY_LIST =' https://dl.dropboxusercontent.com/s/esdm19mnvz2yguf/proxylist.txt' 要么 PROXY_LIST =' proxylist.txt' 要么 PROXY_LIST =' /proxylist.txt' PROXY_LIST =' ../ proxylist.txt'

如果我这样做,就像PROXY_LIST =' proxylist.txt'在我的电脑中,它就像一个魅力,但不是一次我在Scrapy Cloud中加载它。

我收到错误。

Traceback (most recent call last): File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks result = g.send(result) File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 90, in crawl six.reraise(*exc_info) File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 72, in crawl self.engine = self._create_engine() File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 97, in _create_engine return ExecutionEngine(self, lambda _: self.stop()) File "/usr/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in init self.downloader = downloader_cls(crawler) File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/init.py", line 88, in init self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 36, in from_settings mw = mwcls.from_crawler(crawler) File "/app/python/lib/python2.7/site-packages/scrapy_proxies/randomproxy.py", line 55, in from_crawler return cls(crawler.settings) File "/app/python/lib/python2.7/site-packages/scrapy_proxies/randomproxy.py", line 35, in init fin = open(self.proxy_list) IOError: [Errno 2] No such file or directory: '../proxylist.txt'

我需要一些帮助。

1 个答案:

答案 0 :(得分:0)

您很可能不会在setup.py说明中包含此文件。

提供此功能的机制是MANIFEST.in文件。这相对来说非常简单:MANIFEST.in实际上只是指定要包含的文件或整数的相对文件路径列表。:

include README.rst
include docs/*.txt
include funniest/data.json

为了将这些文件在安装时复制到site-packages中的软件包文件夹,您需要向include_package_data=True函数提供setup()

请参阅http://python-packaging.readthedocs.io/en/latest/non-code-files.html