Question

我有以下数据集：

>>topic_article_Dists = pandas.DataFrame(topicDists)
>>topic_article_Dists.head(10)

                   0          ...                            19
0  (0, 0.00012594461)         ...           (19, 0.00012594461)
1  (0, 0.00013192612)         ...           (19, 0.00013192612)
2  (0, 0.00018656717)         ...             (19, 0.004974284)
3  (0, 0.00012594466)         ...           (19, 0.00012594466)
4    (0, 0.024151485)         ...           (19, 9.2936825e-05)
5  (0, 0.00013262601)         ...           (19, 0.00013262601)
6  (0, 0.00018796993)         ...            (19, 0.0050261705)
7  (0, 0.00026737968)         ...           (19, 0.00026737968)
8  (0, 0.00013698627)         ...           (19, 0.00013698627)
9  (0, 0.00029239763)         ...           (19, 0.00029239766)

我只想为每列保存（在CVS文件中）逗号后的数字，以获得以下结果：

              0          ...                      19
0  0.00012594461         ...           0.00012594461
1  0.00013192612         ...           0.00013192612
2  0.00018656717         ...           0.004974284
3  0.00012594466         ...           0.00012594466
4  0.024151485           ...           9.2936825e-05
5  0.00013262601         ...           0.00013262601
6  0.00018796993         ...           0.0050261705
7  0.00026737968         ...           0.00026737968
8  0.00013698627         ...           0.00013698627
9  0.00029239763         ...           0.00029239766

我已经尝试过使用此命令。而且我想知道是否应该使用正则表达式来完成这项工作。

topic_article_Dists.to_csv("Article-Topic-Distri.csv")

Answer 1

将concat用于列表理解，并通过索引选择元组的第二个值：

#import ast

#print (type(df.iloc[0,0]))
#<class 'str'>

#if necessary
#df = df.applymap(ast.literal_eval)

print (type(df.iloc[0,0]))
<class 'tuple'>

df = pd.concat([df[x].str[1] for x in df.columns], axis=1)
print (df)
          0        19
0  0.000126  0.000126
1  0.000132  0.000132
2  0.000187  0.004974
3  0.000126  0.000126
4  0.024151  0.000093
5  0.000133  0.000133
6  0.000188  0.005026
7  0.000267  0.000267
8  0.000137  0.000137
9  0.000292  0.000292

如果要使用string：

print (type(df.iloc[0,0]))
<class 'str'>

df = pd.concat([df[x].str.split(',').str[1].str.rstrip(')') for x in df.columns], axis=1)

将数据帧保存到csv文件Python 3中

1 个答案: