Scrapy适用于本地,但不适用于生产

时间:2018-05-22 14:50:40

标签: python django deployment web-scraping scrapy

我已经部署了我的Django + Scrapy项目,已经运行scrapyd。但是当我尝试运行蜘蛛时,它会完成,但实际上没有使用这些统计数据来删除信息:

{'memusage/startup': 92348416,
 'scheduler/enqueued': 1,
 'scheduler/dequeued': 1,
 'downloader/request_bytes': 628, 
 'httperror/response_ignored_status_count/403': 1,
 'finish_time': datetime.datetime(2018, 5, 21, 22, 6, 38, 333018), 
 'downloader/response_status_count/403': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 4992,
 'downloader/response_count': 2,
 'start_time': datetime.datetime(2018, 5, 21, 22, 6, 33, 894037), 
 'response_received_count': 2,
 'memusage/max': 92348416,
 'scheduler/dequeued/disk': 1,
 'httperror/response_ignored_count': 1,
 'downloader/request_count': 2,
 'finish_reason': 'finished',
 'scheduler/enqueued/disk': 1}

这是否意味着我的第一个请求被403 error拒绝了? W hy可能适用于本地,但不适用于生产?
 正如我所读到的,可能是由错误的USER_AGENT设置引起的,但我已将其设置为:

USER_AGENT='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.36 Safari/'

正如我所说 - 它适用于本地而非生产。


P.S。以下是在本地运行的同一个蜘蛛的统计信息:

{
"start_time": "2018-05-22 14:25:45.857504",
"scheduler/enqueued/disk": "89",
"scheduler/enqueued": "89",
"scheduler/dequeued/disk": "89",
"scheduler/dequeued": "89",
"downloader/request_count": "101",
"downloader/request_method_count/GET": "101",
"downloader/request_bytes": "39724",
"downloader/response_count": "101",
"downloader/response_status_count/200": "101",
"downloader/response_bytes": "1901849",
"response_received_count": "101",
"request_depth_max": "1",
"file_count": "88",
"file_status_count/uptodate": "77",
"item_scraped_count": "88",
"file_status_count/downloaded": "11",
"finish_time": "2018-05-22 14:26:31.596248",
"finish_reason": "finished",
}

0 个答案:

没有答案