在scrapy中写入单独的列而不是为csv文件分隔的逗号

时间:2012-05-31 07:38:41

标签: python csv scrapy

我正在使用scrapy并将从网页获取的数据写入CSV文件

我的pipeline代码:

def __init__(self):
    self.file_name = csv.writer(open('example.csv', 'wb'))
    self.file_name.writerow(['Title', 'Release Date','Director'])

def process_item(self, item, spider):
    self.file_name.writerow([item['Title'].encode('utf-8'),
                                item['Release Date'].encode('utf-8'),
                                item['Director'].encode('utf-8'),
                                ])
    return item 

我在CSV文件中的输出格式是:

Title,Release Date,Director
And Now For Something Completely Different,1971,Ian MacNaughton
Monty Python And The Holy Grail,1975,Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,1979,Terry Jones
.....

但是有可能将title及其值写入一列Release date及其值到下一列Director及其值到下一列(因为CSV是逗号分隔值)在CSV文件中,如下面的格式。

        Title,                                 Release Date,            Director
And Now For Something Completely Different,      1971,              Ian MacNaughton
Monty Python And The Holy Grail,                 1975,     Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,                    1979,              Terry Jones

任何帮助将不胜感激。提前谢谢。

2 个答案:

答案 0 :(得分:1)

TSV(制表符分隔值)可能会得到你想要的东西,但是当行的长度非常不同时,它常常变得难看。

您可以轻松编写一些代码来生成这样的表。诀窍是你需要在输出之前拥有所有行,以便计算列的宽度。

您可以在互联网上找到大量的代码段here is one I used before

答案 1 :(得分:1)

  

更新 - 重新计算代码以便:

     
      
  1. 使用@madjar和
  2. 建议的生成器函数   
  3. 更贴近OP提供的代码段。
  4.   

目标输出

我正在尝试使用texttable替代方案。它产生与问题中相同的输出。此输出可能会写入csv文件(记录将需要按摩适当的csv方言,我找不到仍然使用csv.writer的方法,仍然可以获得每个字段中的填充空格。

                  Title,                      Release Date,             Director            
And Now For Something Completely Different,       1971,              Ian MacNaughton        
Monty Python And The Holy Grail,                  1975,       Terry Gilliam and Terry Jones 
Monty Python's Life Of Brian,                     1979,                Terry Jones    

守则

以下是生成上述结果所需代码的草图:

from texttable import Texttable

# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function

def process_item(item):
    # This massages each record in preparation for writing to csv
    item['Title'] = item['Title'].encode('utf-8') + ','
    item['Release Date'] = item['Release Date'].encode('utf-8') + ','
    item['Director'] = item['Director'].encode('utf-8')
    return item

def initialise_dataset():
    data = [{'Title' : 'Title',
         'Release Date' : 'Release Date',
         'Director' : 'Director'
         }, # first item holds the table header
            {'Title' : 'And Now For Something Completely Different',
         'Release Date' : '1971',
         'Director' : 'Ian MacNaughton'
         },
        {'Title' : 'Monty Python And The Holy Grail',
         'Release Date' : '1975',
         'Director' : 'Terry Gilliam and Terry Jones'
         },
        {'Title' : "Monty Python's Life Of Brian",
         'Release Date' : '1979',
         'Director' : 'Terry Jones'
         }
        ]

    data = [ process_item(item) for item in data ]
    return data

def records(data):
    for item in data:
        yield [item['Title'], item['Release Date'], item['Director'] ]

# this ends the data simulation part
# --------------------------------------------------------

def create_table(data):
    # Create the table
    table = Texttable(max_width=0)
    table.set_deco(Texttable.HEADER)
    table.set_cols_align(["l", "c", "c"])
    table.add_rows( records(data) )

    # split, remove the underlining below the header
    # and pull together again. Many ways of cleaning this...
    tt = table.draw().split('\n')
    del tt[1] # remove the line under the header
    tt = '\n'.join(tt)
    return tt

if __name__ == '__main__':
    data = initialise_dataset()
    table = create_table(data)
    print table