Pandas:无法在MultiIndex列上使用布尔掩码指定NaN

时间:2016-11-03 03:34:20

标签: pandas slice nan hierarchical-data

首先,创建此DataFrame:

df = pd.DataFrame([[1,-2,3],[4,5,-6],[-7,8,9]],
    columns=pd.MultiIndex.from_tuples(
        [('foo', 'bar'), ('foo', 'baz'), ('ignore', 'other')]))

那是:

  foo     ignore
  bar baz  other
0   1  -2      3
1   4   5     -6
2  -7   8      9

现在,尝试将foo下的负值替换为NAN:

df.foo[df.foo < 0] = np.nan

除了打印警告之外什么都不做:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

好的,让我们这样做:

df.loc[:,'foo'][df.foo < 0] = np.nan

那不会打印警告,但它也什么都不做!

但如果我们使用非NAN值,它就有效:

df.loc[:,'foo'][df.foo < 0] = 666

现在我有:

   foo      ignore
   bar  baz  other
0    1  666      3
1    4    5     -6
2  666    8      9

但是我想用NAN填充,而不是666.有一种简单的方法吗?

1 个答案:

答案 0 :(得分:0)

您可以slicers使用DataFrame.mask

idx = pd.IndexSlice
sliced = df.loc[:, idx['foo',:]]
print (sliced)
  foo    
  bar baz
0   1  -2
1   4   5
2  -7   8

df.loc[:, idx['foo',:]] = sliced.mask(sliced < 0)
print (df) 
   foo      ignore
   bar  baz  other
0  1.0  NaN      3
1  4.0  5.0     -6
2  NaN  8.0      9

concat的另一种解决方案:

idx = pd.IndexSlice
df1 = df.loc[:, idx['foo',:]]
print (df1)
  foo    
  bar baz
0   1  -2
1   4   5
2  -7   8

df1 = df1.mask(df1 < 0)
print (df1) 
   foo     
   bar  baz
0  1.0  NaN
1  4.0  5.0
2  NaN  8.0

print (pd.concat([df1, df.drop('foo', axis=1, level=0)], axis=1))
   foo      ignore
   bar  baz  other
0  1.0  NaN      3
1  4.0  5.0     -6
2  NaN  8.0      9
相关问题