对内层多索引列进行操作

时间:2021-03-12 12:56:57

标签: python pandas dataframe multi-index

假设我有一个多索引列的数据框,

       TSLA                                MSFT                   
Year   revenues other_revenues expenses    revenues other_revenues   expenses
2019        851             10      110         200             13        213
2018        725             11      111         150             14        214

如何添加内列来获取

       TSLA                                    MSFT                   
Year   revenues other_revenues expenses  sum    revenues other_revenues   expenses  sum
2019        851             10      110  971        200             13        213   426
2018        725             11      111  847        150             14        214   378

其次,一般来说,在使用多索引列时,我应该注意哪些常用功能?使用多索引列时有什么思路吗?我很舒服地思考正常(单级索引)但不习惯多索引。谢谢!

2 个答案:

答案 0 :(得分:2)

首先创建由 sumMultiIndex 填充到 df1 的新 DataFrame:

sub = ['revenues', 'other_revenues', 'expenses']


df1 = df.sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])

然后使用 concat 连接在一起:

df = pd.concat([df, df1], axis=1)

添加自定义订单的 lasr reindex:

mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
         MSFT                                  TSLA                          \
     revenues other_revenues expenses  sum revenues other_revenues expenses   
Year                                                                          
2019      200             13      213  426      851             10      110   
2018      150             14      214  378      725             11      111   

           
      sum  
Year       
2019  971  
2018  847  

编辑:您可以使用 slicers 进行查看(但我认为这里有必要排序 MultiIndex):

idx = pd.IndexSlice
print (df.loc[:, idx[:, ['revenues','other_revenues']]])
         TSLA     MSFT           TSLA           MSFT
     revenues revenues other_revenues other_revenues
2019      851      200             10             13
2018      725      150             11             14

# df.index.name = 'Year'
sub = ['revenues', 'other_revenues', 'expenses']


df1 = df.loc[:, idx[:, ['revenues','other_revenues']]].sum(level=0, axis=1)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['sum']])
df = pd.concat([df, df1], axis=1)

mux = pd.MultiIndex.from_product([df.columns.levels[0], sub + ['sum']])
df = df.reindex(mux, axis=1)
print (df)
         MSFT                                  TSLA                          \
     revenues other_revenues expenses  sum revenues other_revenues expenses   
2019      200             13      213  213      851             10      110   
2018      150             14      214  164      725             11      111   

           
      sum  
2019  861  
2018  736  

答案 1 :(得分:1)

您可以使用 stackunstack

>>> df2 = df.stack(1).unstack(0)

>>> df2.loc['sum', :] = df2.sum()

>>> df2.stack(1).unstack(0).reindex(df.index).reindex(
        columns=df.columns.levels[0], level=0
    )

         TSLA                                    MSFT                               
     expenses other_revenues revenues    sum expenses other_revenues revenues    sum
Year                                                                                
2019    110.0           10.0    851.0  971.0    213.0           13.0    200.0  426.0
2018    111.0           11.0    725.0  847.0    214.0           14.0    150.0  378.0

对特定列求和:

>>> df2 = df.stack(1).unstack(0)
>>> df2.loc['sum', :] = df2.loc[['revenues', 'other_revenues'], :].sum()
>>> df2.stack(1).unstack(0).reindex(df.index).reindex(
        columns=df.columns.levels[0], level=0
    )

         TSLA                                    MSFT                               
     expenses other_revenues revenues    sum expenses other_revenues revenues    sum
Year                                                                                
2019    110.0           10.0    851.0  861.0    213.0           13.0    200.0  213.0
2018    111.0           11.0    725.0  736.0    214.0           14.0    150.0  164.0

或者沿join使用sumaxis=1, level=0

>>> cols = pd.MultiIndex.from_product([df.columns.levels[0], ['sum']])
>>> df.join(
        df.sum(axis=1, level=0).set_axis(
            cols,
            axis=1
        )
    ).reindex(columns=df.columns.levels[0], level=0)

         TSLA                                  MSFT                             
     revenues other_revenues expenses  sum revenues other_revenues expenses  sum
Year                                                                            
2019      851             10      110  971      200             13      213  426
2018      725             11      111  847      150             14      214  378

对于自定义列:

>>> df.join(
        df.loc[:, (slice(None), ['revenues', 'other_revenues'])]
          .sum(axis=1, level=0).set_axis(
            cols,
            axis=1
        )
    ).reindex(columns=df.columns.levels[0], level=0)

         TSLA                                  MSFT                             
     revenues other_revenues expenses  sum revenues other_revenues expenses  sum
Year                                                                            
2019      851             10      110  861      200             13      213  213
2018      725             11      111  736      150             14      214  164
相关问题