我有一个包含时间和位置的数据框,并且我希望合并具有相同日期和位置的行,因此最大时间将移至“至”列,而我刚刚使用其时间值的行将被删除。
此外,如果时间差超过3小时,则合并不会发生。
date from location to
01 16:25 A
02 17:15 B
02 19:11 C
02 19:19 C
02 17:48 B
03 16:20 F
05 08:30 G
05 09:09 D
05 09:11 G
预期输出:
date from location to
01 16:25 A 16:25
02 17:15 B 17:48
02 19:11 C 19:19
02 19:19 C #this line will delete
02 17:48 B #this line will delete
03 16:20 F 16:20
05 08:30 G 08:30
05 09:09 D 09:09
05 09:11 G 09:11
我用double for循环尝试了它,但是我确定有更好的pythonic方法。 有什么想法吗?
答案 0 :(得分:-2)
# sample dataframe
df = pd.DataFrame(
{
"date": ["01", "01", "01", "01", "02", "02"],
"time": ["01:02", "02:03", "04:05", "06:07", "08:09", "12:10"],
"location": ["A", "A", "B", "B", "C", "C"],
}
)
# convert time column to datetime
df["time"] = pd.to_datetime(df["time"], format="%H:%M")
# aggregate by date and location
df = df.groupby(["date", "location"]).agg(["min", "max"]).reset_index()
# rename columns
df.columns = ["date", "location", "to", "from"]
# sort out where diff > 3
df = df[(df['to'] - df['from']).astype('timedelta64[h]').abs() <= 3]
# convert to and from to datetime.time
df['to'] = df['to'].dt.time
df['from'] = df['from'].dt.time
date location to from
0 01 A 01:02:00 02:03:00
1 01 B 04:05:00 06:07:00