读取每个标签包含多个值的CSV文件

时间:2017-10-04 15:06:49

标签: python pandas csv

我有一个CSV文件,其结构如下:

{'2012-01-01 01:01:55.000000': {'P1': [1, 2, 3], 'P2': [4, 5, 6], 'P3': [7, 8, 9]},
 '2012-01-01 01:01:56.000000': {'P1': [4, 9, 2], 'P2': [0, 2, 1], 'P3': [1, 6, 8]}}

如何使用 Python (以及可选的Pandas)阅读它以获得以下结果:

{{1}}

谢谢!

2 个答案:

答案 0 :(得分:1)

使用csv.reader对象和itertools.islice()函数:

import csv, itertools

with open('test.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',', skipinitialspace=True)
    header = next(reader)[1:]   # getting `P<number>` keys
    d = {}
    for l in reader:
        d[l[0]] = {header[i]: list(itertools.islice(l[1:], i*3, i*3+3)) for i in range(len(header))}

print(d)

输出(3条输入线):

{'2012-01-01 01:01:55.000000': {'P2': ['4', '5', '6'], 'P1': ['1', '2', '3'], 'P3': ['6', '8', '9']}, '2012-01-01 01:01:56.000000': {'P2': ['0', '2', '1'], 'P1': ['4', '9', '2'], 'P3': ['1', '6', '8']}}

请注意,Python中的 dict 是无序结构。
要获得有序结构,请将生成的dict定义为OrderedDict对象(来自{ {1}}模块)

collections

在这种情况下,结果将是:

...
d = collections.OrderedDict()

答案 1 :(得分:0)

使用pandas和numpy

with open('tst.csv') as f:
    _, *params = map(str.strip, f.readline().split(','))

d1 = pd.read_csv(
    'tst.csv', comment='#', header=None,
    index_col=0, parse_dates=True)

i = d1.index.rename(None)
v = d1.values
t = v.reshape(v.shape[0], -1, v.shape[1] // len(params)).transpose(1, 0, 2)

pd.DataFrame(dict(zip(params, t.tolist())), i)

                            P1         P2         P3
2012-01-01 01:01:55  [1, 2, 3]  [4, 5, 6]  [6, 8, 9]
2012-01-01 01:01:56  [4, 9, 2]  [0, 2, 1]  [1, 6, 8]

没有

with open('tst.csv') as f:
    _, *params = map(str.strip, f.readline().split(','))
    k = len(params)
    d = {ts: dict(zip(
        params,
        (data[i*len(data)//k:(i+1)*len(data)//k] for i in range(k))
    )) for ts, *data in map(lambda x: x.strip().split(','), f.readlines())}

d

{'2012-01-01 01:01:55.000000': {'P1': ['1', '2', '3'],
                                'P2': ['4', '5', '6'],
                                'P3': ['6', '8', '9']},
 '2012-01-01 01:01:56.000000': {'P1': ['4', '9', '2'],
                                'P2': ['0', '2', '1'],
                                'P3': ['1', '6', '8']}}