我正在使用Scrapy从网站上抓取一些图片,当我在回调函数中处理下载逻辑时,它运行良好。像这样的代码:
typeName = response.meta["typeName"]
current_page = response.meta["current_page"]
category = response.meta["category"]
mess = category + " " + current_page
print(current_page + mess)
b = selector.xpath( '//*[@id="postmessage"]/img/@src').extract()
i = 0
imgName = typeName[0]
path = "D:\\pho\\" + imgName
os.mkdir(path)
for imgUrl in b:
fileName = path + "\\a_" + category + "____" + str(i) + ".jpg"
print('===================')
print('===================')
print('===================')
print('===================')
print('===================')
print(mess + " : " + imgUrl)
print('===================')
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'}
req = urllib.request.Request(url=imgUrl, headers=headers)
try:
res = urllib.request.urlopen(req, timeout=180 )
if str(res.status) != '200':
print('未下载成功:', imgUrl)
continue
with open(fileName, 'wb') as f:
f.write(res.read())
print('下载完成\n')
except Exception as e:
print("出现异常:" + str(e))
i += 1
但是当我在pipelines.py中处理下载逻辑时,似乎该项目无法尖叫到pipelines.py,并且当应该在pipelines.py中处理该项目时,终端打印“ None”。 / p>
要在pipelines.py中进行配置的设置
ITEM_PIPELINES = {
'tutorial.pipelines.TutorialPipeline': 1 ,
}
item类定义如下:
class TutorialItem(scrapy.Item):
imgName = scrapy.Field()
imgUrl = scrapy.Field()
pass
piplines.py包含以下内容:
class TutorialPipeline(object):
def process_item(self, item, spider):
i = 0
imgName = item["imgName"]
for imgUrl in item["imgUrl"]:
try:
path = "c://photo/" + imgName + "/_" + str(i) + ".jpg"
print('===================')
print('===================')
print('===================')
print('===================')
print('===================')
print(imgUrl)
print('===================')
urllib.request.urlretrieve(imgUrl, path)
i += 1
except:
pass
返回这样的项目的代码:
selector = Selector(response)
item = TutorialItem()
item["imgName"] = response.meta["typeName"]
item["imgUrl"] = selector.xpath( '//*[@id="postmessage"]/img/@src').extract()
yield item