如何在熊猫中有效地分栏和分组?

时间:2018-09-26 21:56:36

标签: python pandas-groupby binning

我有以下数据框:

date  = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2015-02-04 00:00:00','2015-02-04 00:30:00','2015-02-04 01:00:00','2015-02-04 01:30:00','2015-02-04 02:00:00','2015-02-04 02:30:00','2015-02-04 03:00:00','2015-02-04 03:30:00','2015-02-04 04:00:00','2015-02-04 04:30:00','2015-02-04 05:00:00','2015-02-04 05:30:00','2015-02-04 06:00:00','2015-02-04 06:30:00','2015-02-04 07:00:00','2015-02-04 07:30:00','2015-02-04 08:00:00','2015-02-04 08:30:00','2015-02-04 09:00:00','2015-02-04 09:30:00','2015-02-04 10:00:00','2015-02-04 10:30:00','2015-02-04 11:00:00','2015-02-04 11:30:00','2015-02-04 12:00:00','2015-02-04 12:30:00','2015-02-04 13:00:00','2015-02-04 13:30:00','2015-02-04 14:00:00','2015-02-04 14:30:00','2015-02-04 15:00:00','2015-02-04 15:30:00','2015-02-04 16:00:00','2015-02-04 16:30:00','2015-02-04 17:00:00','2015-02-04 17:30:00','2015-02-04 18:00:00','2015-02-04 18:30:00','2015-02-04 19:00:00','2015-02-04 19:30:00','2015-02-04 20:00:00','2015-02-04 20:30:00','2015-02-04 21:00:00','2015-02-04 21:30:00','2015-02-04 22:00:00','2015-02-04 22:30:00','2015-02-04 23:00:00','2015-02-04 23:30:00']
value = [33.24  , 31.71  , 34.39  , 34.49  , 34.67  , 34.46  , 34.59  , 34.83  , 35.78  , 33.03  , 35.49  , 33.79  , 36.12  , 37.09  , 39.54  , 41.19  , 45.99  , 50.23  , 46.72  , 47.47  , 48.46  , 48.38  , 48.40  , 48.13  , 38.35  , 38.19  , 38.12  , 38.05  , 38.06  , 37.83  , 37.49  , 37.41 , 41.84  , 42.26 , 44.09  , 48.85  , 50.07 , 50.94  , 51.09  , 50.60  , 47.39  , 45.57  , 45.03  , 44.98  , 41.32  , 40.37  , 41.12  , 39.33  , 35.38  , 33.44  ]
df = pd.DataFrame({'value':value,'index':date})
df.index = pd.to_datetime(df['index'],format='%Y-%m-%d %H:%M')
df.drop(['index'],axis=1,inplace=True)
print(df)    

                     value
index                     
2015-02-03 23:00:00  33.24
2015-02-03 23:30:00  31.71
2015-02-04 00:00:00  34.39
2015-02-04 00:30:00  34.49
2015-02-04 01:00:00  34.67
2015-02-04 01:30:00  34.46

我想有效地执行以下操作:

  • 每年,计算值的出现百分比,严格在0以下,包括0到严格低于20,然后大于20包括

我知道cut和groupby函数,但我想不出一种将两者合并以优雅地实现的功能。

预期结果如下:

                   inf0        supequal0_inf20         supequal20                                                    
2015               0.2                0.6                  0.2
2016               0.7                0.1                  0.2
2017               0.1                0.8                  0.1

非常感谢您的帮助,

1 个答案:

答案 0 :(得分:1)

考虑到您的df,这应该可以使用,但是我对优雅并不了解:

# altered bins for demonstration purposes
binned = pd.cut(x=df.value, bins=[-np.inf, 40, 50, np.inf], right=False, labels=['low', 'mid', 'high'])
grouped = binned.groupby([pd.Grouper(freq='Y'), binned]).count() / binned.groupby(pd.Grouper(freq='Y')).count()

print(grouped)的结果:

index       value
2015-12-31  low      0.520000
            mid      0.380000
            high     0.100000