我有以下数据框,我想按 15 分钟的 bin 分组并对 Q 列求和,但我想整天使用这些 bin。
time Q
2019-12-07 09:13:00 10
2019-12-07 09:33:00 1
2019-12-07 09:41:00 1
2019-12-07 10:03:00 6
2019-12-07 10:15:00 5
2019-12-07 10:37:00 3
2019-12-07 10:48:00 15
2019-12-07 11:05:00 3
2019-12-07 11:16:00 8
2019-12-07 11:34:00 5
2019-12-07 11:48:00 10
2019-12-07 12:01:00 6
2019-12-07 12:18:00 7
所以我想以这样的豆子为例:
time SUM(Q)
2019-12-07 00:00:00
2019-12-07 00:15:00
2019-12-07 00:30:00
2019-12-07 00:45:00
2019-12-07 01:00:00
.
.
.
2019-12-07 23:00:00
2019-12-07 23:15:00
2019-12-07 23:30:00
2019-12-07 23:45:00
我试过了
df.groupby(df.time.dt.floor('15T'))["Q"].sum()
和
df.groupby(pd.Grouper(key="time", freq="15Min"))['Q'].sum()
但它们都只按列中的可用时间分组,而不是从一天开始(00:00:00 或 00:15:00)到一天结束(23:45:00)
答案 0 :(得分:2)
将 00:00:00
添加到最小 time
并将 23:45:00
添加到最大,因此在输出中都是预期值:
s = df['time'].agg(['min','max']).dt.normalize().copy()
s['max'] = s['max'] + pd.DateOffset(hours=23, minutes=45)
df = df.append(s.to_frame().assign(Q = 0), ignore_index=True)
print (df)
time Q
0 2019-12-07 09:13:00 10
1 2019-12-07 09:33:00 1
2 2019-12-07 09:41:00 1
3 2019-12-07 10:03:00 6
4 2019-12-07 10:15:00 5
5 2019-12-07 10:37:00 3
6 2019-12-07 10:48:00 15
7 2019-12-07 11:05:00 3
8 2019-12-07 11:16:00 8
9 2019-12-07 11:34:00 5
10 2019-12-07 11:48:00 10
11 2019-12-07 12:01:00 6
12 2019-12-07 12:18:00 7
13 2019-12-07 00:00:00 0
14 2019-12-07 23:45:00 0
然后使用您的解决方案,例如:
df.groupby(pd.Grouper(key="time", freq="15Min"))['Q'].sum()
如果需要分别处理每个日期 - 首先使用您的解决方案,然后通过 Series.reindex
添加错误的 Datetimes
:
print (df)
time Q
0 2019-12-07 09:13:00 10
1 2019-12-07 09:33:00 1
2 2019-12-07 09:41:00 1
3 2019-12-07 10:03:00 6
4 2019-12-07 10:15:00 5
5 2019-12-07 10:37:00 3
6 2019-12-07 10:48:00 15
7 2019-12-07 11:05:00 3
8 2019-12-09 11:16:00 8
9 2019-12-09 11:34:00 5
10 2019-12-09 11:48:00 10
11 2019-12-09 12:01:00 6
12 2019-12-09 12:18:00 7
dates = [y for x in df.time.dt.normalize().drop_duplicates()
for y in pd.date_range(x, x + pd.DateOffset(hours=23, minutes=45), freq='15T')]
print (dates[:2])
[Timestamp('2019-12-07 00:00:00', freq='15T'), Timestamp('2019-12-07 00:15:00', freq='15T')]
df = df.groupby(df.time.dt.floor('15T'))["Q"].sum().reindex(dates, fill_value=0)
print (df)
time
2019-12-07 00:00:00 0
2019-12-07 00:15:00 0
2019-12-07 00:30:00 0
2019-12-07 00:45:00 0
2019-12-07 01:00:00 0
..
2019-12-09 22:45:00 0
2019-12-09 23:00:00 0
2019-12-09 23:15:00 0
2019-12-09 23:30:00 0
2019-12-09 23:45:00 0
Name: Q, Length: 192, dtype: int64
答案 1 :(得分:0)
鉴于您目前的最终结果是“缺失的时间戳”(例如使用 df.resample('15T').sum()
),您可以按如下方式添加这些缺失的时间戳:
idx = pd.date_range('2019-12-07','2019-12-08',closed='left',freq='15T') # generates an index of timestamps every 15 minutes
df2 = df.reindex(idx, fill_value=0)
有关如何在先前索引中没有值的位置填充值的更多详细信息,请参阅 reindex
。