向 Pandas 添加汇总行

时间:2021-01-25 21:26:32

标签: pandas summary

我正在尝试总结降雨的总和,但只需添加平均温度:

data = [{'year':2020,'area': 'new-hills', 'rainfall': 100, 'temperature': 20}, 
    {'year':2021,'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
    {'year':2019,'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
     {'year':2020, 'area': 'cape-town',  'rainfall': 70, 'temperature': 25}, 
      {'year':2021,'area': 'cape-town',  'rainfall': 80, 'temperature': 23},
      {'year':2019,'area': 'cape-town',  'rainfall': 75, 'temperature': 24},
    {'year':2019, 'area': 'mumbai',  'rainfall': 200,  'temperature': 37 },
     {'year':2020, 'area': 'mumbai',  'rainfall': 170,  'temperature': 39 },
    {'year':2021, 'area': 'mumbai',  'rainfall': 180,  'temperature': 38 },
   ] 

这有效,但我还需要显示平均温度,但我不知道如何将其添加并保留在相同摘要行中。这只是一个例子,但我需要在现实世界的项目中使用相同的安排。

df = pd.DataFrame.from_dict(data)
container = []
for label, _df in df.groupby(['area']):
    _df.loc['summary'] = _df[['rainfall']].sum() # <-How do I add 2nd column that's not another 'sum' 
    container.append(_df)

df_summary = pd.concat(container)
df = (df_summary.fillna(''))

enter image description here

我需要的示例图片(我已填充绿色值以显示我需要代码执行的操作)。

谢谢。

如果你想使用它,我的代码作为 jupyter notebook 在 GitHub 上。 Pandas Summary Jupyter Notebook

1 个答案:

答案 0 :(得分:3)

你可以试试这个:

import pandas as pd

data = [{'year':2020,'area': 'new-hills', 'rainfall': 100, 'temperature': 20},
        {'year':2021,'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
        {'year':2019,'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
        {'year':2020, 'area': 'cape-town', 'rainfall': 70, 'temperature': 25},
        {'year':2021,'area': 'cape-town',  'rainfall': 80, 'temperature': 23},
        {'year':2019,'area': 'cape-town',  'rainfall': 75, 'temperature': 24},
        {'year':2019, 'area': 'mumbai',  'rainfall': 200,  'temperature': 37},
        {'year':2020, 'area': 'mumbai',  'rainfall': 170,  'temperature': 39},
        {'year':2021, 'area': 'mumbai',  'rainfall': 180,  'temperature': 38 }]

df = pd.DataFrame.from_dict(data)
container = []
for label, _df in df.groupby(['area']):
    _df.loc['summary'] = _df.agg({'rainfall': 'sum', 'temperature': 'mean'})
    container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))

df

输出:

pandas dataframe with summary rows

编辑

根据后续请求用常数替换平均温度,这里是修改后的代码:

import pandas as pd

data = [{'year': 2020, 'area': 'new-hills', 'rainfall': 100, 'temperature': 20},
        {'year': 2021, 'area': 'new-hills', 'rainfall': 110, 'temperature': 20},
        {'year': 2019, 'area': 'new-hills', 'rainfall': 111, 'temperature': 19},
        {'year': 2020, 'area': 'cape-town', 'rainfall': 70, 'temperature': 25},
        {'year': 2021, 'area': 'cape-town', 'rainfall': 80, 'temperature': 23},
        {'year': 2019, 'area': 'cape-town', 'rainfall': 75, 'temperature': 24},
        {'year': 2019, 'area': 'mumbai', 'rainfall': 200, 'temperature': 37},
        {'year': 2020, 'area': 'mumbai', 'rainfall': 170, 'temperature': 39},
        {'year': 2021, 'area': 'mumbai', 'rainfall': 180, 'temperature': 38}]

my_constants = [10, 20, 30]

def map_constant(x, v):
    x.mean()
    return v

df = pd.DataFrame.from_dict(data)
container = []
for i, group in enumerate(df.groupby(['area'])):
    label, _df = group
    _df.loc['summary'] = _df.agg({'rainfall': 'sum', 'temperature': (lambda x: map_constant(x, my_constants[i]))})
    container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))

df

输出:

enter image description here

相关问题