Question

我有以下数据框：

df = pd.DataFrame({'date': ['31/12/2015','31/12/2016','31/12/2017','31/12/2018',
                            '31/12/2019','31/12/2020','31/12/2015','31/12/2016',
                            '31/12/2017','31/12/2018','31/12/2019','31/12/2020'], 
                   'season': ['S1','S1','S1','S1','S1','S1','S2','S2','S2','S2','S2','S2'], 
                   'total' : [1,0,0,0,0.022313421,0.053791041,0,0,0.307783314,0,0,0] })
df.date=  pd.to_datetime(df.date)
print(df)  

         date season         total
0  2015-12-31     S1      1.000000
1  2016-12-31     S1      0.000000
2  2017-12-31     S1      0.000000
3  2018-12-31     S1      0.000000
4  2019-12-31     S1      0.022313
5  2020-12-31     S1      0.053791
6  2015-12-31     S2      0.000000
7  2016-12-31     S2      0.000000
8  2017-12-31     S2      0.307783
9  2018-12-31     S2      0.000000
10 2019-12-31     S2      0.000000
11 2020-12-31     S2      0.000000

我想根据列＆＃39;总计＆＃39;中包含的值对每行进行多次计算。以下列格式获取数据帧（第一行的示例）：

         date season         total   calculation id       result
0  2015-12-31     S1      1.000000                1           x1
0  2015-12-31     S1      1.000000                2           x2
0  2015-12-31     S1      1.000000                3           x3  
0  2015-12-31     S1      1.000000                4           x4
0  2015-12-31     S1      1.000000                5           x5

基本上类似于：

for index, row in df.iterrows():
    for i, a in enumerate(np.linspace(0,getattr(row,'total'),6)):
          assing the result of the calculation to the column result

关于我如何做到这一点的任何想法？为了示例，可以在循环中将结果列计算为a*5。

感谢您的帮助，

皮尔

Answer 1

完成这项工作的一种方法，＆＃34;复制＆＃34;该行首先为df中的每一行创建一个列list_results：

df['list_result'] = df['total'].apply(lambda a: np.linspace(0,a,6)*5)

在此列中，您可以使用stack为列表中的每个值创建一个包含行的系列，并通过首先设置索引，您可以直接在系列上工作：

df_output = (df.set_index(['date', 'season','total'])['list_result'] 
               # set index and work on the column list_result
                 .apply(pd.Series).stack() #will expand the lists of results as rows
                 .reset_index()) # to get back the column 'date', 'season','total'
#you can rename the column
df_output.columns = ['date', 'season','total', 'calculation_id', 'result']

df_output的第一行是：

         date season     total  calculation_id    result
0  2015-12-31     S1  1.000000               0  0.000000
1  2015-12-31     S1  1.000000               1  1.000000
2  2015-12-31     S1  1.000000               2  2.000000
3  2015-12-31     S1  1.000000               3  3.000000
4  2015-12-31     S1  1.000000               4  4.000000
5  2015-12-31     S1  1.000000               5  5.000000

请注意，它并不是您期望的结果，但是通过使用np.linspace(0,getattr(row,'total'),6)它将获得的内容，您可以在创建list_result时更改此功能。

Answer 2

您可以尝试：

import pandas as pd

df = pd.DataFrame({'date' : ['31/12/2015','31/12/2016','31/12/2017','31/12/2018','31/12/2019','31/12/2020', '31/12/2015','31/12/2016','31/12/2017','31/12/2018','31/12/2019','31/12/2020'], 'season':['S1','S1','S1','S1','S1','S1','S2','S2','S2','S2','S2','S2'], 'total' : [1,0,0,0,0.022313421,0.053791041,0,0,0.307783314,0,0,0]  })

df.date=  pd.to_datetime(df.date)

df['key'] = 1 #add key for merge

ids = pd.DataFrame({'calculation_id': [1, 2, 3, 4, 5], 'key': 1})

df = pd.merge(df, ids, on = 'key').drop('key', 1) #cartesian product

df['result'] = df['total']*df['calculation_id']

print(df)

我们的想法是创建另一个包含计算ID的数据框。然后＆＃34;交叉加入＆＃34;与您的原始数据帧。最后，将总计乘以计算ID以找到结果。

Pandas，每行数据帧的多次计算

2 个答案: