切片后如何重置MultiIndex

时间:2016-06-14 20:07:05

标签: python pandas slice multi-index

我经常将数据帧特定级别的值作为我应该做的指南。在这种情况下,我正在使用pd.IndexSlice切割数据帧并引用结果数据帧的索引。问题是结果数据帧的索引与原始索引相同。我需要它作为原始索引的一个子集,它尊重我所做的切片。

设置

import pandas as pd

def produce_df(rows, columns, row_names=None, column_names=None):
    """rows is a list of lists that will be used to build a MultiIndex
    columns is a list of lists that will be used to build a MultiIndex"""
    row_index = pd.MultiIndex.from_product(rows, names=row_names)
    col_index = pd.MultiIndex.from_product(columns, names=column_names)
    return pd.DataFrame(index=row_index, columns=col_index)

df = produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']],
                row_names=['alpha1', 'alpha2'], column_names=['number1', 'number2'])

print df

number1          1         2     
number2          3    4    3    4
alpha1 alpha2                    
a      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN
b      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN

索引如下:

print df.index

MultiIndex(levels=[[u'a', u'b'], [u'c', u'd']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'alpha1', u'alpha2'])

然后我切片:

islc = pd.IndexSlice[['a'], :]
df2 = df.loc[islc, :]
print df2

number1          1         2     
number2          3    4    3    4
alpha1 alpha2                    
a      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN

这是预期的切片。索引是什么样的:

MultiIndex(levels=[[u'a', u'b'], [u'c', u'd']],
           labels=[[0, 0], [0, 1]],
           names=[u'alpha1', u'alpha2'])

df.index.levels[0]仍然有'b'

问题:切片后如何重置MultiIndex

1 个答案:

答案 0 :(得分:0)

这有效,但很笨拙。我觉得这应该是一个我不看的地方。

df2.index = pd.MultiIndex.from_tuples(df2.index.to_series().values, names=df.index.names)

print df2

number1          1         2     
number2          3    4    3    4
alpha1 alpha2                    
a      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN

print df2.index

MultiIndex(levels=[[u'a'], [u'c', u'd']],
           labels=[[0, 0], [0, 1]],
           names=[u'alpha1', u'alpha2'])

'b'已离开df2.index.levels[0]