根据两列值合并和删除行

时间:2020-01-02 15:20:11

标签: python pandas

我有一个包含时间和位置的数据框,并且我希望合并具有相同日期和位置的行,因此最大时间将移至“至”列,而我刚刚使用其时间值的行将被删除。

此外,如果时间差超过3小时,则合并不会发生。

date    from    location    to
    01  16:25       A   
    02  17:15       B   
    02  19:11       C   
    02  19:19       C
    02  17:48       B   
    03  16:20       F   
    05  08:30       G   
    05  09:09       D   
    05  09:11       G   

预期输出:

date    from    location    to
    01  16:25       A       16:25
    02  17:15       B       17:48   
    02  19:11       C       19:19
    02  19:19       C      #this line will delete
    02  17:48       B      #this line will delete
    03  16:20       F       16:20   
    05  08:30       G       08:30   
    05  09:09       D       09:09
    05  09:11       G       09:11

我用double for循环尝试了它,但是我确定有更好的pythonic方法。 有什么想法吗?

1 个答案:

答案 0 :(得分:-2)

# sample dataframe
df = pd.DataFrame(
    {
        "date": ["01", "01", "01", "01", "02", "02"],
        "time": ["01:02", "02:03", "04:05", "06:07", "08:09", "12:10"],
        "location": ["A", "A", "B", "B", "C", "C"],
    }
)

# convert time column to datetime
df["time"] = pd.to_datetime(df["time"], format="%H:%M")

# aggregate by date and location
df = df.groupby(["date", "location"]).agg(["min", "max"]).reset_index()

# rename columns
df.columns = ["date", "location", "to", "from"]

# sort out where diff > 3
df = df[(df['to'] - df['from']).astype('timedelta64[h]').abs() <= 3]

# convert to and from to datetime.time
df['to'] = df['to'].dt.time
df['from'] = df['from'].dt.time
date location        to      from
0   01        A  01:02:00  02:03:00
1   01        B  04:05:00  06:07:00
相关问题