我想更改以下代码的输出格式。
import pandas as pd
x= pd.read_csv('x.csv')
y= pd.read_csv('y.csv')
z= pd.read_csv('z.csv')
list = pd.merge(x, y, how='left', on=['xx'])
list = pd.merge(list, z, how='left', on=['xx'])
columns_to_keep = ['yy','zz', 'uu']
list = list.set_index(['xx'])
list = list[columns_to_keep]
list = list.sort_index(axis=0, level=None, ascending=True, inplace=False,
sort_remaining=True, by=None)
with open('write.csv','w') as f:
list.to_csv(f,header=True, index=True, index_label='xx')
来自:
id date user_id user_name
1 8/13/2007 1 a1
2 1/8/2007 2 a2
2 1/8/2007 3 a3
3 12/14/2007 4 a4
4 3/6/2008 5 a5
4 4/14/2009 6 a6
4 5/30/2008 7 a7
4 5/30/2008 8 a8
5 6/17/2007 9 a9
到此:
id date user_id user_name
1 8/13/2007 1 a1
2 1/8/2007 2; 3 a2; a3
3 12/14/2007 4 a4
4 3/6/2008 5; 6; 7; 8 a5; a6; a7; a8
5 6/17/2007 9 a9
答案 0 :(得分:0)
我认为以下内容应该适用于最终的数据帧(列表),但我建议不要使用" list"作为一个名称,因为它是python中的内置函数,您可能希望在其他地方使用该函数。因此,在我的代码中,我将使用" df"而不是" list":
ind = list(set(df.index.get_values()))
finaldf = pd.DataFrame(columns = list(df.columns))
for val in ind:
tempDF = df.loc[val]
print tempDF
for i in range(tempDF.shape[0]):
for jloc,j in enumerate(list(df.columns)):
if i != 0 and j != 'date':
finaldf.loc[val,j] += (";"+str(tempDF.iloc[i,jloc]))
elif i == 0:
finaldf.loc[val,j] = str(tempDF.iloc[i,jloc])
print finaldf