根据其他列和行添加新列

时间:2020-08-23 23:03:19

标签: python pandas numpy dataframe keyerror

我有一个大数据框。让我写一个示例数据框,让您理解我的问题。

A      B      C     
car    red    15
car    blue   20
car    grey   14
bike   red    6
bike   blue   8
phone  red    9
phone  blue   11
phone  grey   10

让我们说C列显示了价格。我想添加一个名为“ D”的列。这些专栏将回答:“读到的汽车比所有汽车的平均价格都贵吗?”。其他A值也有同样的问题。我的问题基本上就是这样。我想看这个:

A      B      C    D    
car    red    15   cheap
car    blue   20   expensive
car    grey   14   cheap
bike   red    6    cheap
bike   blue   8    expensive
phone  red    9    cheap
phone  blue   11   expensive
phone  grey   10   cheap

我写了太多的方法来完成这项任务。最后,我认为这段代码可以解决我的问题,但是没有解决。我在While循环中尝试了相同的操作,但始终收到键错误0。该怎么办?这是我尝试的代码:

df["D"] = "cheap"
A.values = df.A.unique()
for b in A.values:
    for i in range(len(df.loc[data.A== b])):
        if df.loc[df.A== b, "C"][i] >= df.loc[df.A== b, "C"].mean():
            df.loc[df.A== b, "D"][i] = "expensive"

2 个答案:

答案 0 :(得分:1)

transform检查mean,然后执行np.where

s = df.groupby('A').C.transform('mean')
df['D'] = np.where(df.C>s, 'expensive', 'cheap')
df
Out[158]: 
       A     B   C          D
0    car   red  15      cheap
1    car  blue  20  expensive
2    car  grey  14      cheap
3   bike   red   6      cheap
4   bike  blue   8  expensive
5  phone   red   9      cheap
6  phone  blue  11  expensive
7  phone  grey  10      cheap

答案 1 :(得分:0)

df['D']=np.where(df[['A', 'B', 'C']].groupby('A').apply(lambda x: (x['C'].mean()>=x['C'])),'cheap','expensive')


     A     B   C          D
0    car   red  15      cheap
1    car  blue  20  expensive
2    car  grey  14      cheap
3   bike   red   6  expensive
4   bike  blue   8      cheap
5  phone   red   9      cheap
6  phone  blue  11  expensive
7  phone  grey  10      cheap

工作方式

np.where(condition, if met answer, not met answer)


#Apply boolean select to get condition. In this statement we seek to return true if mean is greater than price

condition= df[['A', 'B', 'C']].groupby('A').apply(lambda x: (x['C'].mean()>=x['C']))


if met answer= 'cheap'

not me t answer='expensive'