Pandas,每行数据帧的多次计算

时间:2018-05-29 12:08:51

标签: python pandas iteration rows

我有以下数据框:

df = pd.DataFrame({'date': ['31/12/2015','31/12/2016','31/12/2017','31/12/2018',
                            '31/12/2019','31/12/2020','31/12/2015','31/12/2016',
                            '31/12/2017','31/12/2018','31/12/2019','31/12/2020'], 
                   'season': ['S1','S1','S1','S1','S1','S1','S2','S2','S2','S2','S2','S2'], 
                   'total' : [1,0,0,0,0.022313421,0.053791041,0,0,0.307783314,0,0,0] })
df.date=  pd.to_datetime(df.date)
print(df)  

         date season         total
0  2015-12-31     S1      1.000000
1  2016-12-31     S1      0.000000
2  2017-12-31     S1      0.000000
3  2018-12-31     S1      0.000000
4  2019-12-31     S1      0.022313
5  2020-12-31     S1      0.053791
6  2015-12-31     S2      0.000000
7  2016-12-31     S2      0.000000
8  2017-12-31     S2      0.307783
9  2018-12-31     S2      0.000000
10 2019-12-31     S2      0.000000
11 2020-12-31     S2      0.000000

我想根据列'总计'中包含的值对每行进行多次计算。以下列格式获取数据帧(第一行的示例):

         date season         total   calculation id       result
0  2015-12-31     S1      1.000000                1           x1
0  2015-12-31     S1      1.000000                2           x2
0  2015-12-31     S1      1.000000                3           x3  
0  2015-12-31     S1      1.000000                4           x4
0  2015-12-31     S1      1.000000                5           x5   

基本上类似于:

for index, row in df.iterrows():
    for i, a in enumerate(np.linspace(0,getattr(row,'total'),6)):
          assing the result of the calculation to the column result

关于我如何做到这一点的任何想法?为了示例,可以在循环中将结果列计算为a*5

感谢您的帮助,

皮尔

2 个答案:

答案 0 :(得分:0)

完成这项工作的一种方法,"复制"该行首先为df中的每一行创建一个列list_results:

df['list_result'] = df['total'].apply(lambda a: np.linspace(0,a,6)*5)

在此列中,您可以使用stack为列表中的每个值创建一个包含行的系列,并通过首先设置索引,您可以直接在系列上工作:

df_output = (df.set_index(['date', 'season','total'])['list_result'] 
               # set index and work on the column list_result
                 .apply(pd.Series).stack() #will expand the lists of results as rows
                 .reset_index()) # to get back the column 'date', 'season','total'
#you can rename the column
df_output.columns = ['date', 'season','total', 'calculation_id', 'result']

df_output的第一行是:

         date season     total  calculation_id    result
0  2015-12-31     S1  1.000000               0  0.000000
1  2015-12-31     S1  1.000000               1  1.000000
2  2015-12-31     S1  1.000000               2  2.000000
3  2015-12-31     S1  1.000000               3  3.000000
4  2015-12-31     S1  1.000000               4  4.000000
5  2015-12-31     S1  1.000000               5  5.000000

请注意,它并不是您期望的结果,但是通过使用np.linspace(0,getattr(row,'total'),6)它将获得的内容,您可以在创建list_result时更改此功能。

答案 1 :(得分:0)

您可以尝试:

import pandas as pd

df = pd.DataFrame({'date' : ['31/12/2015','31/12/2016','31/12/2017','31/12/2018','31/12/2019','31/12/2020', '31/12/2015','31/12/2016','31/12/2017','31/12/2018','31/12/2019','31/12/2020'], 'season':['S1','S1','S1','S1','S1','S1','S2','S2','S2','S2','S2','S2'], 'total' : [1,0,0,0,0.022313421,0.053791041,0,0,0.307783314,0,0,0]  })

df.date=  pd.to_datetime(df.date)

df['key'] = 1 #add key for merge

ids = pd.DataFrame({'calculation_id': [1, 2, 3, 4, 5], 'key': 1})

df = pd.merge(df, ids, on = 'key').drop('key', 1) #cartesian product

df['result'] = df['total']*df['calculation_id']

print(df)

我们的想法是创建另一个包含计算ID的数据框。然后"交叉加入"与您的原始数据帧。最后,将总计乘以计算ID以找到结果。