从CSV文件创建字典

时间:2012-12-30 14:10:26

标签: python csv dictionary

我正在尝试编写一个python脚本,该脚本将从CSV文件中获取输入,然后将其推送到字典格式(我使用的是Python 3.x)。

我使用下面的代码读取CSV文件并且有效:

import csv

reader = csv.reader(open('C:\\Users\\Chris\\Desktop\\test.csv'), delimiter=',', quotechar='|')

for row in reader:
    print(', '.join(row))

但现在我想将结果放入字典中。我想将CSV文件的第一行用作字典的“键”字段,CSV文件中的后续行填写数据部分。

示例数据:

     Date        First Name     Last Name     Score
12/28/2012 15:15        John          Smith        20
12/29/2012 15:15        Alex          Jones        38
12/30/2012 15:15      Michael       Carpenter      25

我想对这段代码做些额外的事情,但现在只是让字典工作就是我要找的。

任何人都可以帮我吗?

EDITED Version 2:

import csv
reader = csv.DictReader(open('C:\\Users\\Chris\\Desktop\\test.csv'))

result = {}

for row in reader:
    for column, value in row.items():
        result.setdefault(column, []).append(value)
        print('Column -> ', column, '\nValue -> ', value)
print(result)

fieldnames = result.keys()

csvwriter = csv.DictWriter(open('C:\\Users\\Chris\\Desktop\\test_out.csv', 'w'), delimiter=',', fieldnames=result.keys())

csvwriter.writerow(dict((fn,fn) for fn in fieldnames))

for row in result.items():
    print('Values -> ', row)
    #csvwriter.writerow(row)

'''
Test output

'''
test_array = []
test_array.append({'fruit': 'apple', 'quantity': 5, 'color': 'red'});
test_array.append({'fruit': 'pear', 'quantity': 8, 'color': 'green'});
test_array.append({'fruit': 'banana', 'quantity': 3, 'color': 'yellow'});
test_array.append({'fruit': 'orange', 'quantity': 11, 'color': 'orange'});
fieldnames = ['fruit', 'quantity', 'color']
test_file = open('C:\\Users\\Chris\\Desktop\\test_out.csv','w')
csvwriter = csv.DictWriter(test_file, delimiter=',', fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in test_array:
    print(row)
    csvwriter.writerow(row)
test_file.close()

5 个答案:

答案 0 :(得分:54)

创建一个字典,然后遍历结果并填充字典中的行。请注意,如果遇到具有重复日期的行,则必须确定要执行的操作(引发异常,替换上一行,丢弃后一行等)。

这是test.csv:

Date,Foo,Bar
123,456,789
abc,def,ghi

和相应的程序:

import csv
reader = csv.reader(open('test.csv'))

result = {}
for row in reader:
    key = row[0]
    if key in result:
        # implement your duplicate row handling here
        pass
    result[key] = row[1:]
print result

的产率:

{'Date': ['Foo', 'Bar'], '123': ['456', '789'], 'abc': ['def', 'ghi']}

或者,使用DictReader:

import csv
reader = csv.DictReader(open('test.csv'))

result = {}
for row in reader:
    key = row.pop('Date')
    if key in result:
        # implement your duplicate row handling here
        pass
    result[key] = row
print result

结果:

{'123': {'Foo': '456', 'Bar': '789'}, 'abc': {'Foo': 'def', 'Bar': 'ghi'}}

或者您可能希望将列标题映射到该列的值列表:

import csv
reader = csv.DictReader(open('test.csv'))

result = {}
for row in reader:
    for column, value in row.iteritems():
        result.setdefault(column, []).append(value)
print result

产量:

{'Date': ['123', 'abc'], 'Foo': ['456', 'def'], 'Bar': ['789', 'ghi']}

答案 1 :(得分:10)

您需要一个Python DictReader类。可以从here

找到更多帮助
import csv

with open('file_name.csv', 'rt') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print row

答案 2 :(得分:1)

来自@ phil-frost的帮助非常有帮助,正是我所寻找的。

之后我做了一些调整,所以我想在这里分享一下:

def csv_as_dict(file, ref_header, delimiter=None):

    import csv
    if not delimiter:
        delimiter = ';'
    reader = csv.DictReader(open(file), delimiter=delimiter)
    result = {}
    for row in reader:
        print(row)
        key = row.pop(ref_header)
        if key in result:
            # implement your duplicate row handling here
            pass
        result[key] = row
    return result

你可以叫它:

myvar = csv_as_dict(csv_file, 'ref_column')

ref_colum将成为每行的主键。

答案 3 :(得分:0)

您是否考虑过使用Apache Solr?它支持搜索评分并轻松使用CSV文件数据。您会发现它的扩展性令人印象深刻,还有许多其他分析数据的选项,例如支持多种语言或拼写错误的查询。

实施例

答案 4 :(得分:0)

import csv
def parser_csv(PATH):
    reader = csv.reader(open("{}.csv".format(PATH), 'r'))
    dict = {}
    list_dict = []
    counter = 0
    for row in reader:
        if counter == 0:
            first_row = row
            ecc = len(first_row)
            counter += 1
        else:
            for col in range(ecc):
                dict.update({first_row[col]:row[col]})
            list_dict.append(dict)
    return list_dict
print(len(parser_csv("path")))
# Have one less csv file (first row is keys of dict)