熊猫群累计总和

时间:2014-03-26 03:17:43

标签: python pandas

我想在我的Pandas数据框中添加累积和列,以便:

name | day       | no
-----|-----------|----
Jack | Monday    | 10
Jack | Tuesday   | 20
Jack | Tuesday   | 10
Jack | Wednesday | 50
Jill | Monday    | 40
Jill | Wednesday | 110

变为:

Jack | Monday     | 10  | 10
Jack | Tuesday    | 30  | 40
Jack | Wednesday  | 50  | 90
Jill | Monday     | 40  | 40
Jill | Wednesday  | 110 | 150

我尝试了df.groupbydf.agg(lambda x: cumsum(x))的各种组合无济于事。

6 个答案:

答案 0 :(得分:53)

这应该这样做,需要groupby()两次。

In [52]:

print df
   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110
In [53]:

print df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()
                 no
name day           
Jack Monday      10
     Tuesday     40
     Wednesday   90
Jill Monday      40
     Wednesday  150

注意,生成的DataFrameMultiIndex

答案 1 :(得分:35)

这适用于pandas 0.16.2

In[23]: print df
        name          day   no
0      Jack       Monday    10
1      Jack      Tuesday    20
2      Jack      Tuesday    10
3      Jack    Wednesday    50
4      Jill       Monday    40
5      Jill    Wednesday   110
In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
In[25]: print df
        name          day   no  no_cumulative
0      Jack       Monday    10             10
1      Jack      Tuesday    20             30
2      Jack      Tuesday    10             40
3      Jack    Wednesday    50             90
4      Jill       Monday    40             40
5      Jill    Wednesday   110            150

答案 2 :(得分:15)

修改@ Dmitry的回答。这更简单,适用于pandas 0.19.0:

workingDir = 'frames';
filename = [sprintf('%d',1) '.jpg'];
fullname = fullfile(workingDir,'imagesV',filename);
a2=imread(fullname); 
a2(1,1)=1;
imwrite(a2,fullname);
a1=imread(fullname);
if a1==a2
    disp(23)
else
    disp(45)
end

答案 3 :(得分:5)

而不是df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum() (见上文)你也可以做df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()

  • df.groupby(by=['name','day']).sum()实际上只是将两列都移动到MultiIndex
  • as_index=False表示您之后无需再调用reset_index

答案 4 :(得分:3)

你应该使用

df['cum_no'] = df.no.cumsum()

http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html

另一种方法

import pandas as pd
df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
           'C2' : [1,2,3,4,5]})
df['cumsum'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.cumsum())
df

enter image description here

答案 5 :(得分:0)

data.csv:

name,day,no
Jack,Monday,10
Jack,Tuesday,20
Jack,Tuesday,10
Jack,Wednesday,50
Jill,Monday,40
Jill,Wednesday,110

代码:

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')
print(df)
df = df.groupby(['name', 'day'])['no'].sum().reset_index()
print(df)
df['cumsum'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
print(df)

输出:

   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110
   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   30
2  Jack  Wednesday   50
3  Jill     Monday   40
4  Jill  Wednesday  110
   name        day   no  cumsum
0  Jack     Monday   10      10
1  Jack    Tuesday   30      40
2  Jack  Wednesday   50      90
3  Jill     Monday   40      40
4  Jill  Wednesday  110     150