Scrapy管道设置

时间:2018-07-02 03:49:48

标签: scrapy

我正在使用Scrapy从网站上抓取一些图片,当我在回调函数中处理下载逻辑时,它运行良好。像这样的代码:

    typeName = response.meta["typeName"]
    current_page = response.meta["current_page"]
    category = response.meta["category"]
    mess = category + "   " + current_page
    print(current_page + mess)
    b = selector.xpath( '//*[@id="postmessage"]/img/@src').extract()
    i = 0
    imgName = typeName[0]
    path = "D:\\pho\\" + imgName
    os.mkdir(path)
    for imgUrl in b:
        fileName = path + "\\a_" + category + "____" + str(i) + ".jpg"
        print('===================')
        print('===================')
        print('===================')
        print('===================')
        print('===================')
        print(mess + "    :    " + imgUrl)
        print('===================')
        headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'}
        req = urllib.request.Request(url=imgUrl, headers=headers)
        try:
            res = urllib.request.urlopen(req, timeout=180 )
            if str(res.status) != '200':
                print('未下载成功:', imgUrl)
                continue
            with open(fileName, 'wb') as f:
                f.write(res.read())
                print('下载完成\n')
        except Exception as e:
            print("出现异常:" + str(e))

        i += 1

但是当我在pipelines.py中处理下载逻辑时,似乎该项目无法尖叫到pipelines.py,并且当应该在pipelines.py中处理该项目时,终端打印“ None”。 / p>

要在pipelines.py中进行配置的设置

ITEM_PIPELINES = {
'tutorial.pipelines.TutorialPipeline': 1 ,

}

item类定义如下:

class TutorialItem(scrapy.Item):
    imgName = scrapy.Field()
    imgUrl = scrapy.Field()
    pass

piplines.py包含以下内容:

class TutorialPipeline(object):
def process_item(self, item, spider):
    i = 0
    imgName = item["imgName"]
    for imgUrl in item["imgUrl"]:
        try:
            path = "c://photo/" + imgName + "/_" + str(i) + ".jpg"
            print('===================')
            print('===================')
            print('===================')
            print('===================')
            print('===================')
            print(imgUrl)
            print('===================')
            urllib.request.urlretrieve(imgUrl, path)
            i += 1
        except:
            pass

返回这样的项目的代码:

    selector = Selector(response)
    item = TutorialItem()
    item["imgName"] = response.meta["typeName"]
    item["imgUrl"] = selector.xpath( '//*[@id="postmessage"]/img/@src').extract()
     yield item

0 个答案:

没有答案
相关问题