为每个唯一ID收集python中csv的所有值

时间:2018-10-24 14:50:50

标签: python python-2.7 pandas

我有一个像这样的数据集:

id,LON,LAT
00x1,2.17105,41.31353
00x1,1.935983,41.04712
00x2,-5.381285,36.11647
00x2,0.830717,42.19835
00x1,10.21912,43.51599

我希望每个唯一的ID都以表格形式(预期的输出)收集LAT,LON列:

[00x1, [2.17105,41.31353], [1.935983,41.04712], [10.21912,43.51599]]
[00x2, [-5.381285,36.11647], [0.830717,42.19835]]

到目前为止,我的代码是

 df = pd.read_csv('/home/repos/master/testdat.csv')
 ids = []
 ids.append((df.as_matrix(columns=['id'])))
 #find unique ids
 unique_ids  =  np.unique(ids)
 coordinates = (df.as_matrix(columns=['LON', 'LAT']))

访问所有ID和坐标,但我无法弄清楚如何实现预期的输出。

2 个答案:

答案 0 :(得分:1)

对于Pandas,可以将2个系列组合成一系列列表,并与GroupBy聚合,然后使用列表理解。给定数据框df

df['LON-LAT'] = list(map(list, zip(df['LON'], df['LAT'])))
grouped = df.groupby('id')['LON-LAT'].apply(list)

res = [[k, *v] for k, v in grouped.items()]

结果:

[['00x1',
  [2.1710500000000001, 41.31353],
  [1.9359830000000002, 41.04712],
  [10.21912, 43.515990000000002]],
 ['00x2',
  [-5.3812850000000001, 36.11647],
  [0.83071700000000004, 42.198349999999998]]]

答案 1 :(得分:0)

您可以使用csv.DictReader来处理文件,并使用另一个字典来保存数据。将字典转换为列表末尾。 我认为这比使用熊猫要容易得多。

import csv
d = {}
with open('testdat.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        d[row['id']] = d.get(row['id'], []) + [[row['LON'], row['LAT']]]
res = [[key] + value for key, value in d.items()]
print(res)
相关问题