Python Pandas Pivot表按日期排序

时间:2016-02-27 18:19:26

标签: python-3.x pandas pivot-table

我有以下代码:

data_df  = pandas.read_csv(filename, parse_dates = True)
groupings = np.unique(data_df[['Ind']])
for group in groupings:
  data_df2 = data_df[data_df['Ind'] == group]
  table    = pandas.pivot_table(data_df2, values='Rev', index=['Ind', 'Month'], columns=['Type'], aggfunc=sum)  
  table    = table.sort_index(ascending=[0, 0])
  print(table) 

如何对数据透视表进行排序'表格'按月和年(例如,当我打印'表'我希望Dec-14成为每个组的第一行输出)?

以下是“data_df':

中的数据示例
   Ind   Type   Month    Rev
0    A  Voice  Dec-14  10.00
1    A  Voice  Jan-15   8.00
2    A  Voice  Feb-15  13.00
3    A  Voice  Mar-15   9.00
4    A  Voice  Apr-15  11.00
5    A  Voice  May-15  14.00
6    A  Voice  Jun-15   6.00
7    A  Voice  Jul-15   4.00
8    A  Voice  Aug-15  12.00
9    A  Voice  Sep-15   7.00
10   A  Voice  Oct-15   5.00
11   A   Elec  Dec-14   8.04
12   A   Elec  Jan-15   6.95
13   A   Elec  Feb-15   7.58
14   A   Elec  Mar-15   8.81
15   A   Elec  Apr-15   8.33
16   A   Elec  May-15   9.96
17   A   Elec  Jun-15   7.24
18   A   Elec  Jul-15   4.26
19   A   Elec  Aug-15  10.84
20   A   Elec  Sep-15   4.82
21   A   Elec  Oct-15   5.68
22   B  Voice  Dec-14  10.00
23   B  Voice  Jan-15   8.00
24   B  Voice  Feb-15  13.00
25   B  Voice  Mar-15   9.00
26   B  Voice  Apr-15  11.00
27   B  Voice  May-15  14.00
28   B  Voice  Jun-15   6.00
29   B  Voice  Jul-15   4.00
..  ..    ...     ...    ...

输出是(我正在玩升序,但它只想对alpha进行排序):

Type        Elec  Voice
Ind Month               
A   Sep-15   4.82      7
    Oct-15   5.68      5
    May-15   9.96     14
    Mar-15   8.81      9
    Jun-15   7.24      6
    Jul-15   4.26      4
    Jan-15   6.95      8
    Feb-15   7.58     13
    Dec-14   8.04     10
    Aug-15  10.84     12
    Apr-15   8.33     11

我希望输出按日期排序:

Type        Elec  Voice
Ind Month               
A   Dec-14   8.04     10
    Jan-15   6.95      8
    Feb-15   7.58     13
    ...

3 个答案:

答案 0 :(得分:1)

您需要转换您的“月份”'从CSV文件创建DataFrame后的列到日期时间:

df['Month'] = pd.to_datetime(df['Month'], format="%b-%y")

因为目前它是一个字符串......

或者您可以使用以下技巧(date_parser)来解析日期,在" read_csv":

from __future__ import print_function

import pandas as pd

dateparser = lambda x: pd.datetime.strptime(x, '%b-%y')

df = pd.read_csv('data.csv', delimiter=r'\s+', parse_dates=['Month'], date_parser=dateparser)

print(df.sort_values(['Month']))

PS我不知道您首选的输出日期格式...

答案 1 :(得分:1)

我认为您可以先转换Month to_datetime列,然后to_period

data_df['Month'] = pd.to_datetime(data_df['Month'], format='%b-%y').dt.to_period('M')

   Ind   Type   Month    Rev
0    A  Voice 2014-12  10.00
1    A  Voice 2015-01   8.00
2    A  Voice 2015-02  13.00
3    A  Voice 2015-03   9.00
4    A  Voice 2015-04  11.00
5    A  Voice 2015-05  14.00
6    A  Voice 2015-06   6.00
7    A  Voice 2015-07   4.00
8    A  Voice 2015-08  12.00
9    A  Voice 2015-09   7.00
10   A  Voice 2015-10   5.00
11   A   Elec 2014-12   8.04
12   A   Elec 2015-01   6.95
13   A   Elec 2015-02   7.58
14   A   Elec 2015-03   8.81
15   A   Elec 2015-04   8.33
16   A   Elec 2015-05   9.96
17   A   Elec 2015-06   7.24
18   A   Elec 2015-07   4.26
19   A   Elec 2015-08  10.84
20   A   Elec 2015-09   4.82
21   A   Elec 2015-10   5.68
22   B  Voice 2014-12  10.00
23   B  Voice 2015-01   8.00
24   B  Voice 2015-02  13.00
25   B  Voice 2015-03   9.00
26   B  Voice 2015-04  11.00
27   B  Voice 2015-05  14.00
28   B  Voice 2015-06   6.00
29   B  Voice 2015-07   4.00

然后使用pivot_table,不需要排序:

data_df =  pd.pivot_table(data_df, values='Rev', index=['Ind', 'Month'], columns='Type', aggfunc=sum)  
print data_df
Type          Elec  Voice
Ind Month                
A   2014-12   8.04     10
    2015-01   6.95      8
    2015-02   7.58     13
    2015-03   8.81      9
    2015-04   8.33     11
    2015-05   9.96     14
    2015-06   7.24      6
    2015-07   4.26      4
    2015-08  10.84     12
    2015-09   4.82      7
    2015-10   5.68      5
B   2014-12    NaN     10
    2015-01    NaN      8
    2015-02    NaN     13
    2015-03    NaN      9
    2015-04    NaN     11
    2015-05    NaN     14
    2015-06    NaN      6
    2015-07    NaN      4

最后,您可以通过strftime

更改Datetimeindex中的Multiindex
new_index = zip(data_df.index.get_level_values('Ind'),data_df.index.get_level_values('Month').strftime('%b-%y'))
data_df.index = pd.MultiIndex.from_tuples(new_index, names = data_df.index.names)
print data_df
Type         Elec  Voice
Ind Month               
A   Dec-14   8.04     10
    Jan-15   6.95      8
    Feb-15   7.58     13
    Mar-15   8.81      9
    Apr-15   8.33     11
    May-15   9.96     14
    Jun-15   7.24      6
    Jul-15   4.26      4
    Aug-15  10.84     12
    Sep-15   4.82      7
    Oct-15   5.68      5
B   Dec-14    NaN     10
    Jan-15    NaN      8
    Feb-15    NaN     13
    Mar-15    NaN      9
    Apr-15    NaN     11
    May-15    NaN     14
    Jun-15    NaN      6
    Jul-15    NaN      4

或者您可以使用reset_indexdt.strftimeset_index

data_df = data_df.reset_index(level=1)
data_df['Month'] = data_df['Month'].dt.strftime('%b-%y')
data_df = data_df.set_index('Month', append=True)
print data_df
Type         Elec  Voice
Ind Month               
A   Dec-14   8.04     10
    Jan-15   6.95      8
    Feb-15   7.58     13
    Mar-15   8.81      9
    Apr-15   8.33     11
    May-15   9.96     14
    Jun-15   7.24      6
    Jul-15   4.26      4
    Aug-15  10.84     12
    Sep-15   4.82      7
    Oct-15   5.68      5
B   Dec-14    NaN     10
    Jan-15    NaN      8
    Feb-15    NaN     13
    Mar-15    NaN      9
    Apr-15    NaN     11
    May-15    NaN     14
    Jun-15    NaN      6
    Jul-15    NaN      4

答案 2 :(得分:0)

首先使用@jezrael的解决方案重新格式化Month列,然后执行此操作以获取数据透视表:

>>> df_data.pivot_table(values='Rev', index=['Ind', 'Month'], columns='Type')
Type          Elec  Voice
Ind Month                
A   2014-12   8.04     10
    2015-01   6.95      8
    2015-02   7.58     13
    2015-03   8.81      9
    2015-04   8.33     11
    2015-05   9.96     14
    2015-06   7.24      6
    2015-07   4.26      4
    2015-08  10.84     12
    2015-09   4.82      7
    2015-10   5.68      5
B   2014-12    NaN     10
    2015-01    NaN      8
    2015-02    NaN     13
    2015-03    NaN      9
    2015-04    NaN     11
    2015-05    NaN     14
    2015-06    NaN      6
    2015-07    NaN      4

或者groupby使用unstack

df.groupby(['Ind', 'Month', 'Type']).Rev.sum().unstack('Type')
相关问题