Question

我有一个pandas数据帧。我有一个列可能有空值或其中的字符串值数组。但我在解决如何在此列中存储值时遇到问题。

这是我现在的代码：

boost::python::exec

但这有两个问题：

当df_completed = df[df.completed] df['links'] = None for i, row in df_completed.iterrows(): results = get_links(row['nct_id']) if results: df[df.nct_id == row['nct_id']].links = results print df[df.nct_id == row['nct_id']].links是长度为1的数组时，打印输出为None，而不是数组，所以我认为我必须保存值错误
当results是一个较长的数组时，我保存该值的行会产生错误：results

我做错了什么？

Answer 1

我不确定尝试在这样的pandas中存储数组是否明智，您是否考虑过尝试序列化数组内容然后存储？

如果存储数组是您正在追求的数据，那么您可以尝试使用set_value()方法，这样（确保您处理列nct_id的dtype）：

In [35]: df = pd.DataFrame(data=np.random.rand(5,5), columns=list('ABCDE'))

In [36]: df
Out[36]: 
          A         B         C         D         E
0  0.741268  0.482689  0.742200  0.210650  0.351758
1  0.798070  0.929576  0.522227  0.280713  0.168999
2  0.413417  0.481230  0.304180  0.894934  0.327243
3  0.797061  0.561387  0.247033  0.330608  0.294618
4  0.494038  0.065731  0.538588  0.095435  0.397751

In [38]: df.dtypes
Out[38]: 
A    float64
B    float64
C    float64
D    float64
E    float64
dtype: object

In [39]: df.A = df.A.astype(object)

In [40]: df.dtypes
Out[40]: 
A     object
B    float64
C    float64
D    float64
E    float64
dtype: object

In [41]: df.set_value(0, 'A', ['some','values','here'])
Out[41]: 
                      A         B         C         D         E
0  [some, values, here]  0.482689  0.742200  0.210650  0.351758
1               0.79807  0.929576  0.522227  0.280713  0.168999
2              0.413417  0.481230  0.304180  0.894934  0.327243
3              0.797061  0.561387  0.247033  0.330608  0.294618
4              0.494038  0.065731  0.538588  0.095435  0.397751

我希望这有帮助！

在pandas的列中存储字符串值数组？

1 个答案: