计算总数,总无效值,均值和中位数

时间:2018-06-29 20:32:47

标签: python python-3.x pandas pandas-groupby

让我们说我有一个数据框,其中有一列称为值,并且要为此列计算每组的总观察值,总空观察值,均值和中位数。

mydf.groupby(['date_ym','category']).agg(['count', 'mean', 'median']).reset_index()
Out[135]: 
   date_ym category values             
                     count  mean median
0  2018-01        A      2  4.55   4.55
1  2018-01        B      0   NaN    NaN
2  2018-02        A      1  6.20   6.20
3  2018-02        B      0   NaN    NaN
4  2018-03        B      0   NaN    NaN

如果我使用groupby和agg,则会得到以下输出:

   date_ym category values             
                     count  countNAs mean median
0  2018-01        A      2  1        4.55   4.55
1  2018-01        B      0  1        NaN    NaN
2  2018-02        A      1  0        6.20   6.20
3  2018-02        B      0  1        NaN    NaN
4  2018-03        B      0  1        NaN    NaN

但是我真正想要的输出如下:

{{1}}

2 个答案:

答案 0 :(得分:1)

您可以使用

def countNAs(x): return x.isnull().sum()
mydf.groupby(['date_ym','category']).agg(['count',countNAs, 'mean', 'median']).reset_index()
Out[647]: 
   date_ym category values                      
                     count countNAs  mean median
0  2018-01        A      2      1.0  4.55   4.55
1  2018-01        B      0      1.0   NaN    NaN
2  2018-02        A      1      0.0  6.20   6.20
3  2018-02        B      0      1.0   NaN    NaN
4  2018-03        B      0      1.0   NaN    NaN

答案 1 :(得分:0)

这不是直截了当的方法,但是可以做到。

data2 = data.frame('population by age' = seq(5, 11, by = 1), 
                     '2008' = c(145391,
                                140621,
                                136150,
                                131944,
                                198933,
                                182182,
                                159103
                     ),  
                     '2009' = c(148566,
                                143943,
                                139367,
                                135083,
                                212196,
                                196398,
                                155033
                     ), 
                     '2010' = c(152330,
                                147261,
                                142555,
                                138172,
                                218701,
                                161330,
                                142190
                     ),  
                     '2011' = c(156630,
                                151387,
                                146491,
                                141905,
                                119397,
                                116093,
                                112666
                     ),
                     '2012' = c(133545,
                                129737,
                                126124,
                                122678,
                                120213,
                                116826,
                                113381
                      ),
                     '2013' = c(119397,
                                116093,
                                112666,
                                109174,
                                106871,
                                103659,
                                100398)) 



                 data1 <- data.frame('2008'= c(7,
                                               8,
                                               9,
                                               10),
                                     '2009' = c(7,
                                                8,
                                                9,
                                                10),
                                     '2010' = c(7,
                                                8,
                                                9,
                                                10),
                                     '2011' = c(6,
                                                7,
                                                8,
                                                9),
                      '2012' = c(6,
                                 7,
                                 8,
                                 9),
                      '2013' = c(6,
                                 7,
                                 8,
                                 9)
                      )