按Sum分组为新列名

时间:2017-07-16 04:29:35

标签: python pandas

我正在执行函数,我按ID分组并使用以下代码汇总与这些ID相关联的$值:

df = df.groupby([' Id'], as_index=False, sort=False)[["Amount"]].sum();

但它没有重命名列。因此我尝试这样做:

`df = df.groupby([' Id'], as_index=False, sort=False)`[["Amount"]].sum();.reset_index(name ='Total Amount')

但它给了我错误,TypeError:reset_index()得到了一个意外的关键字参数'name'

所以我最后在这篇文章后尝试这样做:Python Pandas Create New Column with Groupby().Sum()

df = df.groupby(['Id'])[["Amount"]].transform('sum'); 

但它仍然没有用。

我做错了什么?

2 个答案:

答案 0 :(得分:7)

我认为您需要删除参数as_index=False并使用Series.reset_index,因为此参数返回df,然后DataFrame.reset_index参数name失败:

df = df.groupby('Id', sort=False)["Amount"].sum().reset_index(name ='Total Amount')

rename列首先:

d = {'Amount':'Total Amount'}
df = df.rename(columns=d).groupby('Id', sort=False, as_index=False)["Total Amount"].sum()

样品:

df = pd.DataFrame({'Id':[1,2,2],'Amount':[10, 30,50]})
print (df)
   Amount  Id
0      10   1
1      30   2
2      50   2

df1 = df.groupby('Id', sort=False)["Amount"].sum().reset_index(name ='Total Amount')
print (df1)
   Id  Total Amount
0   1            10
1   2            80

d = {'Amount':'Total Amount'}
df1 = df.rename(columns=d).groupby('Id', sort=False, as_index=False)["Total Amount"].sum()
print (df1)
   Id  Total Amount
0   1            10
1   2            80

但是,如果需要原始sum中包含df的新列,请使用transform并将输出分配给新列:

df['Total Amount'] = df.groupby('Id', sort=False)["Amount"].transform('sum')
print (df)
   Amount  Id  Total Amount
0      10   1            10
1      30   2            80
2      50   2            80

答案 1 :(得分:0)

import pandas as pd

# set up dataframe
df = pd.DataFrame({'colA':['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd'], 
                   'colB':['cat', 'cat', 'dog', 'cat', 'dog', 'cat', 'cat', 'dog'],
                   'colC':[1,2,3,4,4,5,6,7], })

print(df)

  colA colB  colC
0    a  cat     1
1    a  cat     2
2    a  dog     3
3    b  cat     4
4    b  dog     4
5    c  cat     5
6    c  cat     6
7    d  dog     7 



# group on vals in column A
# get min (within groups) for column B 
# get avg (within groups) for column C
df_agg = ( df.groupby(by=['colA'])
          .agg({'colB':'min', 'colC':'mean'})
          .rename(columns={'colB':'colB_grp_min', 'colC':'colC_grp_avg'})
          )

print(df_agg)

     min_colB  avg_colC
colA                   
a         cat       2.0
b         cat       4.0
c         cat       5.5
d         dog       7.0



# if you want multiple aggregations on the same column, pass a list
#   this will return a multiindex
# group on vals in column A
# get min (within groups) for column B 
# get avg and max (within groups) for column C
df_agg2 = ( df.groupby(by=['colA'])
          .agg({'colB':'min', 'colC':['mean', 'max']})
          .rename(columns={'colB':'colB_grp_min', 'colC':'colC_grp_multi_index'})
          )
print(df_agg2)

     colB_grp_min colC_grp_multi_index    
              min                 mean max
colA                                      
a             cat                  2.0   3
b             cat                  4.0   4
c             cat                  5.5   6
d             dog                  7.0   7