合并多个通用值的数据框

时间:2018-08-08 00:41:21

标签: python pandas merge

我正在尝试根据通用值merge来复制两个数据帧。问题是存在重复值。我试图基于首次出现合并值。我想合并Col BCol C

中的值
import pandas as pd

df = pd.DataFrame({          
    'A' : ['10:00:05','11:00:05','12:00:05','13:00:05','14:00:05'],
    'B' : ['ABC','DEF','XYZ','ABC','DEF'],          
    'C' : [1,1,1,1,2],            
    })

df1 = pd.DataFrame({          
    'A' : ['10:00:00','11:00:00','12:00:00','13:00:00','14:00:00'],
    'B' : ['ABC','DEF','XYZ','ABC','DEF'],         
    'C' : [1,1,1,2,2],          
    })

如果我尝试:

df2 = pd.merge(df, df1, on = ["B", "C"])

输出:

        A_x    B  C       A_y
0  10:00:05  ABC  1  10:00:00
1  13:00:05  ABC  1  10:00:00
2  11:00:05  DEF  1  11:00:00
3  12:00:05  XYZ  1  12:00:00
4  14:00:05  DEF  2  14:00:00

我的预期输出是:

          A    B  C         D
0  10:00:05  ABC  1  10:00:00
1  11:00:05  DEF  1  11:00:00
2  12:00:05  XYZ  1  12:00:00
3  13:00:05  ABC  1          
4  14:00:05  DEF  2  14:00:00

1 个答案:

答案 0 :(得分:1)

您可以先使用merge,然后再使用duplicated + loc更新合并列:

merge_cols = ['B', 'C']

df2 = pd.merge(df, df1, on=merge_cols)

df2.loc[df2[merge_cols].duplicated(), 'A_y'] = ''

print(df2)

        A_x    B  C       A_y
0  10:00:05  ABC  1  10:00:00
1  13:00:05  ABC  1          
2  11:00:05  DEF  1  11:00:00
3  12:00:05  XYZ  1  12:00:00
4  14:00:05  DEF  2  14:00:00