Question

所以我知道我可以像这样在Pandas中简单地添加一个新列：

df
=====
  A
1 5
2 6
3 7

df['new_col'] = "text"

df
====
  A    new_col
1 5    text
2 6    text
3 7    text

我还可以基于对现有列的操作来设置新列。

def times_two(x):
    return x * 2

df['newer_col'] = time_two(df.a)
df
====
  A    new_col   newer_col
1 5    text      10
2 6    text      12
3 7    text      14

但是，当我尝试对文本列进行操作时，会出现意外的AttributeError。

df['new_text'] = df['new_col'].upper()
AttributeError: 'Series' object has no attribute 'upper'

现在它将值视为一个序列，而不是该“单元格”中的值。

为什么这会发生在文本上而不是数字上？如何在现有文本列的基础上用新列更新DF？

Answer 1

这是因为*运算符被实现为mul运算符，而upper没有为Series定义。您必须使用dtype为str.upper的{{1}}实现的Series：

str

这里没有魔术。

对于In[53]: df['new_text'] = df['new_col'].str.upper() df Out[53]: A new_col new_text 1 5 text TEXT 2 6 text TEXT 3 7 text TEXT，这只是分配一个标量值并符合df['new_col']规则，其中该标量沿短轴广播到df的长度，请参见以下说明： What does the term "broadcasting" mean in Pandas documentation?

在Pandas DataFrame中添加新列时结果不一致。是系列还是值？

1 个答案: