Question

我正在使用UCI网站上找到的automotive.csv。我想在归一化损失属性中替换一些NaN。我认为更好的方法是根据符号来计算平均值，因为符号会影响归一化损失的值。

因此，如果NaN的符号为3，则我只想要平均值为3的其他归一化损失的均值。我该如何实现？

示例表格：

symb    norm    other attrs
1        100  8017  2
1        90  5019  2
-1       20   8017  1
-1       20    8870  1
1        NaN    8305  3
0        10   8305  3
3        200  8221  3

所以对于NaN，我只想要其他行中具有相同符号的均值

如果我使用

automobile['normalizedlosses'].fillna(automobile['normalizedlosses'].mean(axis=0), inplace=True)

这会将所有NaN替换为我不想要的值

Answer 1

您可以将GroupBy.transform与mean一起使用，以返回Series，其大小与原始DataFrame相同，因此可以通过此Series使用Series.fillna：

s = automobile.groupby('symb')['norm'].transform('mean')
automobile['norm'] = automobile['norm'].fillna(s)

print (automobile)
   symb   norm  other  attrs
0     1  100.0   8017      2
1     1   90.0   5019      2
2    -1   20.0   8017      1
3    -1   20.0   8870      1
4     1   95.0   8305      3
5     0   10.0   8305      3
6     3  200.0   8221      3

如何计算列的均值，但仅包括某些行？

1 个答案: