按 15 分钟的时间间隔按大熊猫数据框分组,但一整天

时间:2021-06-28 12:57:56

标签: python python-3.x pandas dataframe

我有以下数据框,我想按 15 分钟的 bin 分组并对 Q 列求和,但我想整天使用这些 bin。

 time                   Q
 2019-12-07 09:13:00   10 
 2019-12-07 09:33:00    1 
 2019-12-07 09:41:00    1 
 2019-12-07 10:03:00    6 
 2019-12-07 10:15:00    5 
 2019-12-07 10:37:00    3 
 2019-12-07 10:48:00   15 
 2019-12-07 11:05:00    3 
 2019-12-07 11:16:00    8 
 2019-12-07 11:34:00    5 
 2019-12-07 11:48:00   10 
 2019-12-07 12:01:00    6 
 2019-12-07 12:18:00    7 

所以我想以这样的豆子为例:

time                  SUM(Q)
 2019-12-07 00:00:00               
 2019-12-07 00:15:00
 2019-12-07 00:30:00
 2019-12-07 00:45:00
 2019-12-07 01:00:00
               .
               .
               .
2019-12-07 23:00:00
2019-12-07 23:15:00
2019-12-07 23:30:00
2019-12-07 23:45:00

我试过了

 df.groupby(df.time.dt.floor('15T'))["Q"].sum() 

 df.groupby(pd.Grouper(key="time", freq="15Min"))['Q'].sum()

但它们都只按列中的可用时间分组,而不是从一天开始(00:00:00 或 00:15:00)到一天结束(23:45:00)

2 个答案:

答案 0 :(得分:2)

00:00:00 添加到最小 time 并将 23:45:00 添加到最大,因此在输出中都是预期值:

s = df['time'].agg(['min','max']).dt.normalize().copy()
s['max'] = s['max'] + pd.DateOffset(hours=23, minutes=45)

df = df.append(s.to_frame().assign(Q = 0), ignore_index=True)   
print (df)
                  time   Q
0  2019-12-07 09:13:00  10
1  2019-12-07 09:33:00   1
2  2019-12-07 09:41:00   1
3  2019-12-07 10:03:00   6
4  2019-12-07 10:15:00   5
5  2019-12-07 10:37:00   3
6  2019-12-07 10:48:00  15
7  2019-12-07 11:05:00   3
8  2019-12-07 11:16:00   8
9  2019-12-07 11:34:00   5
10 2019-12-07 11:48:00  10
11 2019-12-07 12:01:00   6
12 2019-12-07 12:18:00   7
13 2019-12-07 00:00:00   0
14 2019-12-07 23:45:00   0

然后使用您的解决方案,例如:

df.groupby(pd.Grouper(key="time", freq="15Min"))['Q'].sum()

如果需要分别处理每个日期 - 首先使用您的解决方案,然后通过 Series.reindex 添加错误的 Datetimes

print (df)
                  time   Q
0  2019-12-07 09:13:00  10
1  2019-12-07 09:33:00   1
2  2019-12-07 09:41:00   1
3  2019-12-07 10:03:00   6
4  2019-12-07 10:15:00   5
5  2019-12-07 10:37:00   3
6  2019-12-07 10:48:00  15
7  2019-12-07 11:05:00   3
8  2019-12-09 11:16:00   8
9  2019-12-09 11:34:00   5
10 2019-12-09 11:48:00  10
11 2019-12-09 12:01:00   6
12 2019-12-09 12:18:00   7


dates = [y for x in df.time.dt.normalize().drop_duplicates() 
           for y in pd.date_range(x, x + pd.DateOffset(hours=23, minutes=45), freq='15T')]
print (dates[:2])
[Timestamp('2019-12-07 00:00:00', freq='15T'), Timestamp('2019-12-07 00:15:00', freq='15T')]

df = df.groupby(df.time.dt.floor('15T'))["Q"].sum().reindex(dates, fill_value=0)
print (df)
time
2019-12-07 00:00:00    0
2019-12-07 00:15:00    0
2019-12-07 00:30:00    0
2019-12-07 00:45:00    0
2019-12-07 01:00:00    0
                      ..
2019-12-09 22:45:00    0
2019-12-09 23:00:00    0
2019-12-09 23:15:00    0
2019-12-09 23:30:00    0
2019-12-09 23:45:00    0
Name: Q, Length: 192, dtype: int64

答案 1 :(得分:0)

鉴于您目前的最终结果是“缺失的时间戳”(例如使用 df.resample('15T').sum()),您可以按如下方式添加这些缺失的时间戳:

idx = pd.date_range('2019-12-07','2019-12-08',closed='left',freq='15T')  # generates an index of timestamps every 15 minutes
df2 = df.reindex(idx, fill_value=0)

有关如何在先前索引中没有值的位置填充值的更多详细信息,请参阅 reindex

相关问题