熊猫中具有多索引的高级横截面

时间:2014-06-17 08:09:46

标签: python pandas

我有以下数据框:

lb = [('A','a',1), ('A','a',2), ('A','a',3), ('A','b',1), ('A','b',2), ('A','b',3), ('B','a',1), ('B','a',2), ('B','a',3), ('B', 'b',1), ('B','b',2) ,('B','b',3)]
col = pd.MultiIndex.from_tuples(lb, names=['first','second','third'])
df = pd.DataFrame(randn(5,12), columns=col)

first          A                                                           B  \
second         a                             b                             a   
third          1         2         3         1         2         3         1   
0       1.597958  2.054695  0.449745 -0.990393  0.780978 -0.590558 -0.691706   
1      -0.093841 -1.203769  1.779555 -0.299931 -0.411360  0.122852 -0.250156   
2       0.025183  0.514480 -0.420666  1.574669  0.962010  1.278237 -0.976286   
3      -1.028288 -0.506581  0.880370  1.513487 -0.066479 -0.100231  0.785042   
4      -1.635642  0.464074 -0.335941 -0.034194  0.412519 -0.672058  0.113886   

first                                                     
second                             b                      
third          2         3         1         2         3  
0       1.954769  0.705860 -1.712058  1.015807  1.245232  
1      -2.037299 -0.120649 -0.114652 -0.686707 -0.993540  
2       0.918084 -0.892378 -0.741131 -2.547121  0.797637  
3       0.000077  2.123063  0.903571  1.972190 -1.179325  
4      -1.145241 -1.773182  0.407046 -0.301640 -0.173261  

我想获得2和3的所有列,也就是说,像

一样
df.xs([2,3], level='third', axis=1, drop_level=False)

但这不起作用。我该怎么办?

2 个答案:

答案 0 :(得分:4)

这是0.14.0中的新功能,请参阅whatsnew here。这有效地取代了对.xs

的需求
In [8]: idx = pd.IndexSlice

In [9]: df.loc[:,idx[:,:,[2,3]]]
Out[9]: 
first          A                                       B                              
second         a                   b                   a                   b          
third          2         3         2         3         2         3         2         3
0       1.770120 -0.362269 -0.804352  1.549652  0.069858 -0.274113  0.570410 -0.460956
1      -0.982169  2.044497  0.571353  0.310634 -1.865966 -0.862613  0.124413  0.645419
2      -1.412519  0.168448  0.081467 -0.220464  1.033748  1.561429  0.094363  0.254768
3      -0.653458 -0.978661  0.158708 -0.818675 -1.122577  0.026941  2.678548  0.864817
4      -0.555179 -0.155564  1.148956  1.438523 -1.254660  0.609254 -0.970612  1.519028

要减去这是非常重要的。

[107]: df = pd.DataFrame(np.arange(5*12).reshape(-1,12), columns=col)

In [108]: df
Out[108]: 
first    A                       B                    
second   a           b           a           b        
third    1   2   3   1   2   3   1   2   3   1   2   3
0        0   1   2   3   4   5   6   7   8   9  10  11
1       12  13  14  15  16  17  18  19  20  21  22  23
2       24  25  26  27  28  29  30  31  32  33  34  35
3       36  37  38  39  40  41  42  43  44  45  46  47
4       48  49  50  51  52  53  54  55  56  57  58  59

Pandas想要对齐rhs一边(毕竟你要为不同的索引做个子选择), 所以你需要手动播放这个。以下是一个问题:https://github.com/pydata/pandas/issues/7475

In [109]: df.loc[:,idx[:,:[2,3]]] = df.loc[:,idx[:,:,[2,3]]]-np.tile(df.loc[:,idx[:,:,1]].values,2)
Out[109]: 
first   A           B         
second  a     b     a     b   
third   2  3  2  3  2  3  2  3
0       1 -1 -2 -4  7  5  4  2
1       1 -1 -2 -4  7  5  4  2
2       1 -1 -2 -4  7  5  4  2
3       1 -1 -2 -4  7  5  4  2
4       1 -1 -2 -4  7  5  4  2

答案 1 :(得分:0)

似乎您不能将xs - 函数与多个密钥一起使用。可能存在一个更好的切片,但我会尽可能地保持它并生成一个符合我需要的部分多索引对象:

cols = df.columns
thirdlvl = cols.get_level_values('third')

partialcols = [col for col, third in zip(cols, thirdlvl) if third in [2,3]]

使用这些列,您可以获得所需的部分数据框:

print df[partialcolumns]

first          A                                       B                              
second         a                   b                   a                   b          
third          2         3         2         3         2         3         2         3
0       1.103063  1.036151 -0.018996  1.436792 -0.956119  1.587688  2.262837 -1.059619
1       0.950664  1.847895 -1.172043  0.752676 -0.091956 -0.431509 -0.653317 -0.545843
2       0.165655 -0.180710 -1.844222 -0.836338  1.687806 -0.469707 -0.374222  0.132809
3      -0.275194  0.141292  1.021046 -0.010747  1.725614  0.530589  0.106327  0.138661
4       0.371840  0.455063 -2.643567  0.406322 -0.717277  0.667969  0.660701 -1.324643

编辑:下面的简单代码也会找到正确的列,当然

 partialcols = [col for col in cols if col[2] in [2,3]]