如何计算熊猫数据框中的运行总计

时间:2020-06-12 14:01:55

标签: python pandas

我有一个数据框,其中包含如下所示的降水数据

Date Time, Raw Measurement, Site ID, Previous Raw Measurement, Raw - Previous
2020-05-06 14:15:00,12.56,8085,12.56,0.0
2020-05-06 14:30:00,12.56,8085,12.56,0.0
2020-05-06 14:45:00,12.56,8085,12.56,0.0
2020-05-06 15:00:00,2.48,8085,12.56,-10.08
2020-05-06 15:30:00,2.48,8085,2.47,0.01
2020-05-06 15:45:00,2.48,8085,2.48,0.0
2020-05-06 16:00:00,2.50,8085,2.48,0.02
2020-05-06 16:15:00,2.50,8085,2.50,0.0
2020-05-06 16:30:00,2.50,8085,2.50,0.0
2020-05-06 16:45:00,2.51,8085,2.50,0.01
2020-05-06 17:00:00,2.51,8085,2.51,0.0

我想使用最后一列“原始-上一列”(Raw-Previous),它只是最新观察值与先前观察值之间的差异,以创建运行中的正变化的总计,以构成一个累积列。我有时会不时清空雨量计,因此“ Raw-Previous”在发生时会为负数,我想将其从df中滤除,同时保持总计的累计。我遇到了使用 df.sum() 但据我所知,它们仅提供整列的总和,而不是每行之后的运行总和。

我的总体目标是拥有这样的东西

Date Time, Raw Measurement, Site ID, Previous Raw Measurement, Raw - Previous, Total Accumulation
2020-05-06 14:15:00,12.56,8085,12.56,0.0,12.56
2020-05-06 14:30:00,12.56,8085,12.56,0.0,12.56
2020-05-06 14:45:00,12.56,8085,12.56,0.0,12.56
2020-05-06 15:00:00,2.48,8085,12.56,-10.08,12.56
2020-05-06 15:15:00,2.47,8085,2.48,-0.01,12.56
2020-05-06 15:30:00,2.48,8085,2.47,0.01,12.57
2020-05-06 15:45:00,2.48,8085,2.48,0.0,12.57
2020-05-06 16:00:00,2.50,8085,2.48,0.02,12.59
2020-05-06 16:15:00,2.50,8085,2.50,0.0,12.59
2020-05-06 16:30:00,2.50,8085,2.50,0.0,12.59
2020-05-06 16:45:00,2.51,8085,2.50,0.01,12.60
2020-05-06 17:00:00,2.51,8085,2.51,0.0,12.60

编辑:更改标题以更好地反映问题变成了什么

2 个答案:

答案 0 :(得分:2)

np.where也会做。

import pandas as pd, numpy as np
df['Total Accumulation'] = np.where((df['Raw - Previous'] > 0), df['Raw - Previous'], 0).cumsum() + df.iloc[0,3]
df

输出:

    Date Time   Raw Measurement Site ID Previous Raw Measurement    Raw - Previous  Total Accumulation
0   2020-05-06 14:15:00 12.56   8085    12.56   0.00    12.56
1   2020-05-06 14:30:00 12.56   8085    12.56   0.00    12.56
2   2020-05-06 14:45:00 12.56   8085    12.56   0.00    12.56
3   2020-05-06 15:00:00 2.48    8085    12.56   -10.08  12.56
4   2020-05-06 15:30:00 2.48    8085    2.47    0.01    12.57
5   2020-05-06 15:45:00 2.48    8085    2.48    0.00    12.57
6   2020-05-06 16:00:00 2.50    8085    2.48    0.02    12.59
7   2020-05-06 16:15:00 2.50    8085    2.50    0.00    12.59
8   2020-05-06 16:30:00 2.50    8085    2.50    0.00    12.59
9   2020-05-06 16:45:00 2.51    8085    2.50    0.10    12.69
10  2020-05-06 17:00:00 2.51    8085    2.51    0.00    12.69

答案 1 :(得分:1)

您可以使用clip()来裁剪负值,然后使用cumsum来消除差异的总和:

df['Total'] = df['Raw - Previous'].clip(lower=0).cumsum() + df['Raw Measurement'].iloc[0]

输出:

0     12.56
1     12.56
2     12.56
3     12.56
4     12.56
5     12.57
6     12.57
7     12.59
8     12.59
9     12.59
10    12.60
11    12.60
Name: Raw - Previous, dtype: float64
相关问题