我在使用Scrapy的输出中引用时出现问题。我正在尝试废弃包含逗号的数据,这会在某些列中产生双引号,如下所示:
TEST,TEST,TEST,ON,TEST,TEST,"$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
TEST,TEST,TEST,ON,TEST,TEST,"$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"
只有带逗号的列才能获得双引号。如何双引号我的所有数据列?
我希望Scrapy输出:
"TEST","TEST","TEST","ON","TEST","TEST","$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
"TEST","TEST","TEST","ON","TEST","TEST","$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"
我可以更改任何设置吗?
答案 0 :(得分:5)
默认情况下,对于CSV输出,Scrapy使用csv.writer()
with the defaults。
对于字段引号,the default is csv.QUOTE_MINIMAL:
指示writer对象仅引用包含的字段 特殊字符,如分隔符,quotechar或任何 lineterminator中的字符。
但您可以构建自己的CSV项目导出器并设置新的方言,并使用默认的'excel'
方言构建。
例如,在exporters.py
模块中,定义以下内容
import csv
from scrapy.exporters import CsvItemExporter
class QuoteAllDialect(csv.excel):
quoting = csv.QUOTE_ALL
class QuoteAllCsvItemExporter(CsvItemExporter):
def __init__(self, *args, **kwargs):
kwargs.update({'dialect': QuoteAllDialect})
super(QuoteAllCsvItemExporter, self).__init__(*args, **kwargs)
然后您只需要reference this item exporter in your settings进行CSV输出,例如:
FEED_EXPORTERS = {
'csv': 'myproject.exporters.QuoteAllCsvItemExporter',
}
这样的简单蜘蛛:
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = ['http://example.com/']
def parse(self, response):
yield {
"name": "Some name",
"title": "Some title, baby!",
"description": "Some description, with commas, quotes (\") and all"
}
将输出:
"description","name","title"
"Some description, with commas, quotes ("") and all","Some name","Some title, baby!"