熊猫分组模式和缺失值

时间:2019-06-30 16:29:01

标签: python pandas pandas-groupby

我的数据类似于下表:

Type  Size    Color   Color2
cat   small   white   white
cat   small   white   white
cat   large   brown   #N/A
cat   large   black   #N/A
dog   large   white   white
dog   small   black   black
cat   small   white   white
dog   small   brown   brown
dog   small   brown   brown
dog   small   brown   brown
cat   large   brown   #N/A
cat   large   brown   #N/A
dog   large   #N/A    brown
dog   large   white   white
dog   large   black   black
cat   large   white   #N/A
dog   large   brown   brown
cat   small   white   white
cat   small   white   white
dog   large   brown   brown
dog   large   white   white
dog   large   #N/A    brown
dog   small   black   black
cat   small   white   white
dog   small   white   white
dog   small   white   white
cat   small   white   white
dog   small   black   black
dog   small   black   black
dog   large   brown   brown
dog   large   brown   brown
cat   large   black   #N/A
cat   small   white   white

目标是使用以类型和大小为条件的相应列的模式填充Color和Color2中的缺失值。

下面的代码段对于“颜色”列效果很好,而忽略了“颜色”列中缺少的值

df.groupby(['Type','Size'])['Color'].transform(lambda x: x.mode()[0])

但是,我的实际数据类似于正在发生的Color2列。在此列中,所有与cat large对应的Color2值都丢失了。因此,当我应用下面的代码片段时,我得到了超出范围的错误索引。

df.groupby(['Type','Size'])['Color2'].transform(lambda x: x.mode()[0]) 

如果特定分组仅具有缺失值,我希望能够返回NaN /#N / A,但是如果分组中存在非缺失值,则返回模式,同时忽略缺失值。

2 个答案:

答案 0 :(得分:1)

仅在命令中使用[0]而不是.get(0,'NaN/#N/A')。如果找不到密钥,它将选择默认值'NaN/#N/A'

df['new_color'] = df.groupby(['Type','Size'])['Color2'] \
                    .transform(lambda x: x.mode().get(0,'NaN/#N/A'))

Out[1246]:
   Type   Size  Color Color2 new_color
0   cat  small  white  white     white
1   cat  small  white  white     white
2   cat  large  brown    NaN  NaN/#N/A
3   cat  large  black    NaN  NaN/#N/A
4   dog  large  white  white     brown
5   dog  small  black  black     black
6   cat  small  white  white     white
7   dog  small  brown  brown     black
8   dog  small  brown  brown     black
9   dog  small  brown  brown     black
10  cat  large  brown    NaN  NaN/#N/A
11  cat  large  brown    NaN  NaN/#N/A
12  dog  large    NaN  brown     brown
13  dog  large  white  white     brown
14  dog  large  black  black     brown
15  cat  large  white    NaN  NaN/#N/A
16  dog  large  brown  brown     brown
17  cat  small  white  white     white
18  cat  small  white  white     white
19  dog  large  brown  brown     brown
20  dog  large  white  white     brown
21  dog  large    NaN  brown     brown
22  dog  small  black  black     black
23  cat  small  white  white     white
24  dog  small  white  white     black
25  dog  small  white  white     black
26  cat  small  white  white     white
27  dog  small  black  black     black
28  dog  small  black  black     black
29  dog  large  brown  brown     brown
30  dog  large  brown  brown     brown
31  cat  large  black    NaN  NaN/#N/A
32  cat  small  white  white     white

答案 1 :(得分:0)

使用value_counts

进行检查
df.fillna(df.groupby(['Type','Size']).transform(lambda x : x.value_counts(dropna=False).index[0]),inplace=True)

或者在0.24中,您也可以在dropna=False中传递mode

df.groupby(['Type','Size'])['Color2'].transform(lambda x: x.mode(dropna=False)[0])