我有以下数据集:
>>topic_article_Dists = pandas.DataFrame(topicDists)
>>topic_article_Dists.head(10)
0 ... 19
0 (0, 0.00012594461) ... (19, 0.00012594461)
1 (0, 0.00013192612) ... (19, 0.00013192612)
2 (0, 0.00018656717) ... (19, 0.004974284)
3 (0, 0.00012594466) ... (19, 0.00012594466)
4 (0, 0.024151485) ... (19, 9.2936825e-05)
5 (0, 0.00013262601) ... (19, 0.00013262601)
6 (0, 0.00018796993) ... (19, 0.0050261705)
7 (0, 0.00026737968) ... (19, 0.00026737968)
8 (0, 0.00013698627) ... (19, 0.00013698627)
9 (0, 0.00029239763) ... (19, 0.00029239766)
我只想为每列保存(在CVS文件中)逗号后的数字,以获得以下结果:
0 ... 19
0 0.00012594461 ... 0.00012594461
1 0.00013192612 ... 0.00013192612
2 0.00018656717 ... 0.004974284
3 0.00012594466 ... 0.00012594466
4 0.024151485 ... 9.2936825e-05
5 0.00013262601 ... 0.00013262601
6 0.00018796993 ... 0.0050261705
7 0.00026737968 ... 0.00026737968
8 0.00013698627 ... 0.00013698627
9 0.00029239763 ... 0.00029239766
我已经尝试过使用此命令。而且我想知道是否应该使用正则表达式来完成这项工作。
topic_article_Dists.to_csv("Article-Topic-Distri.csv")
答案 0 :(得分:1)
将concat
用于列表理解,并通过索引选择元组的第二个值:
#import ast
#print (type(df.iloc[0,0]))
#<class 'str'>
#if necessary
#df = df.applymap(ast.literal_eval)
print (type(df.iloc[0,0]))
<class 'tuple'>
df = pd.concat([df[x].str[1] for x in df.columns], axis=1)
print (df)
0 19
0 0.000126 0.000126
1 0.000132 0.000132
2 0.000187 0.004974
3 0.000126 0.000126
4 0.024151 0.000093
5 0.000133 0.000133
6 0.000188 0.005026
7 0.000267 0.000267
8 0.000137 0.000137
9 0.000292 0.000292
如果要使用string
:
print (type(df.iloc[0,0]))
<class 'str'>
df = pd.concat([df[x].str.split(',').str[1].str.rstrip(')') for x in df.columns], axis=1)