如何在Scrapy .csv结果中获得双引号

时间:2017-03-07 21:42:02

标签: python csv web-scraping scrapy scrapy-spider

我在使用Scrapy的输出中引用时出现问题。我正在尝试废弃包含逗号的数据,这会在某些列中产生双引号,如下所示:

TEST,TEST,TEST,ON,TEST,TEST,"$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
TEST,TEST,TEST,ON,TEST,TEST,"$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"

只有带逗号的列才能获得双引号。如何双引号我的所有数据列?

我希望Scrapy输出:

"TEST","TEST","TEST","ON","TEST","TEST","$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
"TEST","TEST","TEST","ON","TEST","TEST","$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"

我可以更改任何设置吗?

1 个答案:

答案 0 :(得分:5)

默认情况下,对于CSV输出,Scrapy使用csv.writer() with the defaults

对于字段引号,the default is csv.QUOTE_MINIMAL

  

指示writer对象仅引用包含的字段   特殊字符,如分隔符,quotechar或任何   lineterminator中的字符。

但您可以构建自己的CSV项目导出器并设置新的方言,并使用默认的'excel'方言构建。

例如,在exporters.py模块中,定义以下内容

import csv

from scrapy.exporters import CsvItemExporter


class QuoteAllDialect(csv.excel):
    quoting = csv.QUOTE_ALL


class QuoteAllCsvItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        kwargs.update({'dialect': QuoteAllDialect})
        super(QuoteAllCsvItemExporter, self).__init__(*args, **kwargs)

然后您只需要reference this item exporter in your settings进行CSV输出,例如:

FEED_EXPORTERS = {
    'csv': 'myproject.exporters.QuoteAllCsvItemExporter',
}

这样的简单蜘蛛:

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ['http://example.com/']

    def parse(self, response):
        yield {
            "name": "Some name",
            "title": "Some title, baby!",
            "description": "Some description, with commas, quotes (\") and all"
        }

将输出:

"description","name","title"
"Some description, with commas, quotes ("") and all","Some name","Some title, baby!"
相关问题