pandas - 根据另一列更改列中的值

时间:2017-11-02 15:47:37

标签: python pandas dataframe

假设我有一个数据框all_data,例如:

Id  Zone        Neighb
1   NaN         IDOTRR
2   RL          Veenker
3   NaN         IDOTRR
4   RM          Crawfor
5   NaN         Mitchel

我想在' Zone'中输入缺失的值。专栏,这样的地方就是Neighb'是' IDOTRR'我设置了Zone' Zone'成为' RM',而在哪里' Neighb'是' Mitchel'我设置了' RL'。

all_data.loc[all_data.MSZoning.isnull() 
             & all_data.Neighborhood == "IDOTRR", "MSZoning"] = "RM"
all_data.loc[all_data.MSZoning.isnull() 
             & all_data.Neighborhood == "Mitchel", "MSZoning"] = "RL"

我明白了:

  

TypeError:无效的类型比较

     

C:\用户\ pprun \ Anaconda3 \ lib中\站点包\大熊猫\核心\ ops.py:798:   FutureWarning:元素比较失败;返回标量   相反,但将来会进行元素比较   result = getattr(x,name)(y)

我确信这应该很简单,但我已经把它弄乱了太久了。请帮忙。

3 个答案:

答案 0 :(得分:3)

使用np.select即

df['Zone'] = np.select([df['Neighb'] == 'IDOTRR',df['Neighb'] == 'Mitchel'],['RM','RL'],df['Zone'])
   Id Zone   Neighb
0   1   RM   IDOTRR
1   2   RL  Veenker
2   3   RM   IDOTRR
3   4   RM  Crawfor
4   5   RL  Mitchel

在您遇到条件的情况下,您可以使用

# Boolean mask of condition 1 
m1 = (all_data.MSZoning.isnull()) & (all_data.Neighborhood == "IDOTRR")
# Boolean mask of condition 2
m2 = (all_data.MSZoning.isnull()) & (all_data.Neighborhood == "Mitchel")

np.select([m1,m2],['RM','RL'],all_data["MSZoning"])

答案 1 :(得分:2)

df.Zone=df.Zone.fillna(df.Neighb.replace({'IDOTRR':'RM','Mitchel':'RL'}))
df
Out[784]: 
   Id Zone   Neighb
0   1   RM   IDOTRR
1   2   RL  Veenker
2   3   RM   IDOTRR
3   4   RM  Crawfor
4   5   RL  Mitchel

答案 2 :(得分:1)

在Python中,&优先于==

http://www.annedawson.net/Python_Precedence.htm

所以,当你执行all_data.MSZoning.isnull() & all_data.Neighborhood == "Mitchel"时,它被解释为(all_data.MSZoning.isnull() & all_data.Neighborhood) == "Mitchel",现在Python尝试AND一个带有str系列的布尔系列,看看是否&#39 ; s等于单个str "Mitchel"。解决方案是将测试括在括号中:(all_data.MSZoning.isnull()) & (all_data.Neighborhood == "Mitchel")。有时如果我有很多选择器,我会将它们分配给变量,然后AND它们,例如:

null_zoning = all_data.MSZoning.isnull()
Mitchel_neighb = all_data.Neighborhood == "Mitchel"
all_data.loc[null_zoning & Mitchel_neighb, "MSZoning"] = "RL"

这不仅会解决操作顺序问题,还意味着all_data.loc[null_zoning & Mitchel_neighb, "MSZoning"] = "RL"适合一行。