每个cols的Pandas Pivot表小计

时间:2013-04-14 09:01:59

标签: python pandas

我可以使用pandas中的pivot_table实现我的Desired Output(如下所示)或类似的以下数据集。我正在尝试做类似的事情:

pivot_table(df, rows=['region'], cols=['area','distributor','salesrep'], 
            aggfunc=np.sum, margins=True).stack(['area','distributor','salesrep'])

但我只是在每个区域获得小计,如果我将区域从cols移动到行,那么我将只获得每个区域的小计。

数据集:

region   area            distributor     salesrep       sales    invoice_count
Central  Butterworth     HIN MARKETING   TLS            500      25
Central  Butterworth     HIN MARKETING   TLS            500      25
Central  Butterworth     HIN MARKETING   OSE            500      25
Central  Butterworth     HIN MARKETING   OSE            500      25
Central  Butterworth     KWANG HENGG     TCS            500      25
Central  Butterworth     KWANG HENGG     TCS            500      25
Central  Butterworth     KWANG HENG      LBH            500      25
Central  Butterworth     KWANG HENG      LBH            500      25
Central  Ipoh            SGH EDERAN      CHAN           500      25
Central  Ipoh            SGH EDERAN      CHAN           500      25
Central  Ipoh            SGH EDERAN      KAMACHI        500      25
Central  Ipoh            SGH EDERAN      KAMACHI        500      25
Central  Ipoh            CORE SYN        LILIAN         500      25
Central  Ipoh            CORE SYN        LILIAN         500      25
Central  Ipoh            CORE SYN        TEOH           500      25
Central  Ipoh            CORE SYN        TEOH           500      25
East     JB              LEI WAH         NF05           500      25
East     JB              LEI WAH         NF05           500      25
East     JB              LEI WAH         NF06           500      25
East     JB              LEI WAH         NF06           500      25
East     JB              WONDER F&B      SEREN          500      25
East     JB              WONDER F&B      SEREN          500      25
East     JB              WONDER F&B      MONC           500      25
East     JB              WONDER F&B      MONC           500      25
East     PJ              PENGEDAR        NORM           500      25
East     PJ              PENGEDAR        NORM           500      25
East     PJ              PENGEDAR        SIMON          500      25
East     PJ              PENGEDAR        SIMON          500      25
East     PJ              HEBAT           OGI            500      25
East     PJ              HEBAT           OGI            500      25
East     PJ              HEBAT           MIGI           500      25
East     PJ              HEBAT           MIGI           500      25

期望的输出:

region       area          distributor       salesrep             invoice_count sales
Grand Total                                                                 800 16000
Central      Central Total                                                  400  8000
Central      Butterworth   Butterworth Total                                200  4000
Central      Butterworth   HIN MARKETING     HIN MARKETING Total            100  2000
Central      Butterworth   HIN MARKETING     OSE                             50  1000
Central      Butterworth   HIN MARKETING     TLS                             50  1000
Central      Butterworth   KWANG HENG        KWANG HENG Total               100  2000
Central      Butterworth   KWANG HENG        LBH                             50  1000
Central      Butterworth   KWANG HENG        TCS                             50  1000
Central      Ipoh          Ipoh Total                                       200  4000
Central      Ipoh          CORE SYN          CORE SYN Total                 100  2000
Central      Ipoh          CORE SYN          LILIAN                          50  1000
Central      Ipoh          CORE SYN          TEOH                            50  1000
Central      Ipoh          SGH EDERAN        SGH EDERAN Total               100  2000
Central      Ipoh          SGH EDERAN        CHAN                            50  1000
Central      Ipoh          SGH EDERAN        KAMACHI                         50  1000
East         East Total                                                     400  8000
East         JB            JB Total                                         200  4000
East         JB            LEI WAH           LEI WAH Total                  100  2000
East         JB            LEI WAH           NF05                            50  1000
East         JB            LEI WAH           NF06                            50  1000
East         JB            WONDER F&B        WONDER F&B Total               100  2000
East         JB            WONDER F&B        MONC                            50  1000
East         JB            WONDER F&B        SEREN                           50  1000
East         PJ            PJ Total                                         200  4000
East         PJ            HEBAT             HEBAT Total                    100  2000
East         PJ            HEBAT             MIGI                            50  1000
East         PJ            HEBAT             OGI                             50  1000
East         PJ            PENGEDAR          PENDEGAR Total                 100  2000
East         PJ            PENGEDAR          NORM                            50  1000
East         PJ            PENGEDAR          SIMON                           50  1000

2 个答案:

答案 0 :(得分:1)

我们可以使用groupby代替pivot_table

import numpy as np
import pandas as pd


def label(ser):
    return '{s} Total'.format(s=ser)

filename = 'data.txt'
df = pd.read_table(filename, delimiter='\t')

total = pd.DataFrame({'region': ['Grand Total'],
                      'invoice_count': df['invoice_count'].sum(),
                      'sales': df['sales'].sum()})
total['total_rank'] = 1

region_total = df.groupby(['region'], as_index=False).sum()
region_total['area'] = region_total['region'].apply(label)
region_total['region_rank'] = 1

area_total = df.groupby(['region', 'area'], as_index=False).sum()
area_total['distributor'] = area_total['area'].apply(label)
area_total['area_rank'] = 1

dist_total = df.groupby(
    ['region', 'area', 'distributor'], as_index=False).sum()
dist_total['salesrep'] = dist_total['distributor'].apply(label)

rep_total = df.groupby(
    ['region', 'area', 'distributor', 'salesrep'], as_index=False).sum()

# UNION the DataFrames into one DataFrame
result = pd.concat([total, region_total, area_total, dist_total, rep_total])

# Replace NaNs with empty strings
result.fillna({'region': '', 'area': '', 'distributor': '', 'salesrep':
              ''}, inplace=True)

# Reorder the rows
sorter = np.lexsort((
    result['distributor'].rank(),
    result['area_rank'].rank(),
    result['area'].rank(),
    result['region_rank'].rank(),
    result['region'].rank(),
    result['total_rank'].rank()))
result = result.take(sorter)
result = result.reindex(
    columns=['region', 'area', 'distributor', 'salesrep', 'invoice_count', 'sales'])
print(result.to_string(index=False))

产量

      region           area        distributor             salesrep  invoice_count  sales
 Grand Total                                                                   800  16000
     Central  Central Total                                                    400   8000
     Central    Butterworth  Butterworth Total                                 200   4000
     Central    Butterworth      HIN MARKETING  HIN MARKETING Total            100   2000
     Central    Butterworth      HIN MARKETING                  OSE             50   1000
     Central    Butterworth      HIN MARKETING                  TLS             50   1000
     Central    Butterworth         KWANG HENG     KWANG HENG Total            100   2000
     Central    Butterworth         KWANG HENG                  LBH             50   1000
     Central    Butterworth         KWANG HENG                  TCS             50   1000
     Central           Ipoh         Ipoh Total                                 200   4000
     Central           Ipoh           CORE SYN       CORE SYN Total            100   2000
     Central           Ipoh           CORE SYN               LILIAN             50   1000
     Central           Ipoh           CORE SYN                 TEOH             50   1000
     Central           Ipoh         SGH EDERAN     SGH EDERAN Total            100   2000
     Central           Ipoh         SGH EDERAN                 CHAN             50   1000
     Central           Ipoh         SGH EDERAN              KAMACHI             50   1000
        East     East Total                                                    400   8000
        East             JB           JB Total                                 200   4000
        East             JB            LEI WAH        LEI WAH Total            100   2000
        East             JB            LEI WAH                 NF05             50   1000
        East             JB            LEI WAH                 NF06             50   1000
        East             JB         WONDER F&B     WONDER F&B Total            100   2000
        East             JB         WONDER F&B                 MONC             50   1000
        East             JB         WONDER F&B                SEREN             50   1000
        East             PJ           PJ Total                                 200   4000
        East             PJ              HEBAT          HEBAT Total            100   2000
        East             PJ              HEBAT                 MIGI             50   1000
        East             PJ              HEBAT                  OGI             50   1000
        East             PJ           PENGEDAR       PENGEDAR Total            100   2000
        East             PJ           PENGEDAR                 NORM             50   1000
        East             PJ           PENGEDAR                SIMON             50   1000

答案 1 :(得分:0)

我不知道如何在表格中获取小计,但如果你运行

df.pivot_table(rows=['region','area','distributor','salesrep'],
  aggfunc=np.sum, margins=True)

你会得到

                                            invoice_count  sales
region  area        distributor   salesrep                      
Central Butterworth HIN MARKETING OSE                  50   1000
                                  TLS                  50   1000
                    KWANG HENG    LBH                  50   1000
                    KWANG HENGG   TCS                  50   1000
        Ipoh        CORE SYN      LILIAN               50   1000
                                  TEOH                 50   1000
                    SGH EDERAN    CHAN                 50   1000
                                  KAMACHI              50   1000
East    JB          LEI WAH       NF05                 50   1000
                                  NF06                 50   1000
                    WONDER F&B    MONC                 50   1000
                                  SEREN                50   1000
        PJ          HEBAT         MIGI                 50   1000
                                  OGI                  50   1000
                    PENGEDAR      NORM                 50   1000
                                  SIMON                50   1000
All                                                   800  16000

如果您想要基于说regionarea的总计,则可以运行

df.pivot_table(rows=['region', 'area'], aggfunc=np.sum, margins=True)

导致

                     invoice_count  sales
region  area                             
Central Butterworth            200   4000
        Ipoh                   200   4000
East    JB                     200   4000
        PJ                     200   4000
All                            800  16000