我有一个存储日期、汽车品牌、颜色和城市的数据框:
date car_brand color city
"2020-01-01" porsche red paris
"2020-01-02" prosche red paris
"2020-01-03" porsche red london
"2020-01-04" porsche red paris
"2020-01-05" porsche red london
"2020-01-01" audi blue munich
"2020-01-02" audi red munich
"2020-01-03" audi red london
"2020-01-04" audi red london
"2020-01-05" audi red london
我现在想通过以下方式从该数据帧创建: 将连续几天汽车品牌、颜色和城市匹配的行合并在一起。所以在这个例子中,我想以一个数据帧结束
date car_brand color city
["2020-01-01","2020-01-02"] porsche red paris
["2020-01-03"] porsche red london
["2020-01-04"] porsche red paris
["2020-01-05"] porsche red london
["2020-01-01"] audi blue munich
["2020-01-02"] audi red munich
["2020-01-03","2020-01-05"] audi red london
我怎样才能做到这一点?我尝试使用 pd.concat 和 pd.merge 但到目前为止没有任何效果。谢谢!
答案 0 :(得分:0)
如果连续很重要可以检查列表理解。这是从组上的 list
函数获取 lambda
的技术的扩展。
df = pd.read_csv(io.StringIO(""" date car_brand color city
"2020-01-01" porsche red paris
"2020-01-02" porsche red paris
"2020-01-03" porsche red london
"2020-01-04" porsche red paris
"2020-01-05" porsche red london
"2020-01-01" audi blue munich
"2020-01-02" audi red munich
"2020-01-03" audi red london
"2020-01-04" audi red london
"2020-01-05" audi red london"""), sep="\s+")
df["date"] = pd.to_datetime(df["date"])
df = (
df
.groupby([c for c in df.columns if c!="date"])["date"]
# only include if first date or if it's a consequetive date
.agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
.reset_index()
)
car_brand color city date
audi blue munich [2020-01-01 00:00:00]
audi red london [2020-01-03 00:00:00, 2020-01-04 00:00:00, 2020-01-05 00:00:00]
audi red munich [2020-01-02 00:00:00]
porsche red london [2020-01-03 00:00:00]
porsche red paris [2020-01-01 00:00:00, 2020-01-02 00:00:00]