我正在尝试根据通用值merge
来复制两个数据帧。问题是存在重复值。我试图基于首次出现合并值。我想合并Col B
和Col C
import pandas as pd
df = pd.DataFrame({
'A' : ['10:00:05','11:00:05','12:00:05','13:00:05','14:00:05'],
'B' : ['ABC','DEF','XYZ','ABC','DEF'],
'C' : [1,1,1,1,2],
})
df1 = pd.DataFrame({
'A' : ['10:00:00','11:00:00','12:00:00','13:00:00','14:00:00'],
'B' : ['ABC','DEF','XYZ','ABC','DEF'],
'C' : [1,1,1,2,2],
})
如果我尝试:
df2 = pd.merge(df, df1, on = ["B", "C"])
输出:
A_x B C A_y
0 10:00:05 ABC 1 10:00:00
1 13:00:05 ABC 1 10:00:00
2 11:00:05 DEF 1 11:00:00
3 12:00:05 XYZ 1 12:00:00
4 14:00:05 DEF 2 14:00:00
我的预期输出是:
A B C D
0 10:00:05 ABC 1 10:00:00
1 11:00:05 DEF 1 11:00:00
2 12:00:05 XYZ 1 12:00:00
3 13:00:05 ABC 1
4 14:00:05 DEF 2 14:00:00
答案 0 :(得分:1)
您可以先使用merge
,然后再使用duplicated
+ loc
更新合并列:
merge_cols = ['B', 'C']
df2 = pd.merge(df, df1, on=merge_cols)
df2.loc[df2[merge_cols].duplicated(), 'A_y'] = ''
print(df2)
A_x B C A_y
0 10:00:05 ABC 1 10:00:00
1 13:00:05 ABC 1
2 11:00:05 DEF 1 11:00:00
3 12:00:05 XYZ 1 12:00:00
4 14:00:05 DEF 2 14:00:00