我有一个包含多个列的数据框,另一个包含其他多个列的较大数据框。
df1中的匹配列是symbol,m_date df2中的匹配列是符号,日期
我想获取从df2到df1的其他列的值,例如
if m_date = date matches for a particular symbol, then copy the other columns
if m_date != date for a particular symbol, then we find the closest date to m_date in df2 and use their values to be added to the df1
我尝试了
merged_left = pd.merge(left=df31, right=df2, how='left', left_on=['symbol','m_date'], right_on=['symbol','date'])
这将合并文件,但是不可用的日期具有对应的值为空白。任何人都可以建议如何实现相同的
谢谢
答案 0 :(得分:2)
使用merge_asof
,您可以使用by
先在符号上合并,然后再在列日期on
上查找nearest
。日期需要为日期时间并进行排序。
# some data similar to yours but simplified
df1 = pd.DataFrame({'symbol': {0: 'A', 1: 'A', 2: 'A', 3: 'A'},
'var1': {0: 34, 1: 45, 2: 43, 3: 67},
'm_date': {0: '11/25/19', 1: '8/14/19', 2: '5/14/19', 3: '2/20/19'}})
df2 = pd.DataFrame({'symbol': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A'},
'date': {0: '1/2/19', 1: '5/3/19', 2: '8/4/19', 3: '1/5/20', 4: '1/8/20'},
'Per1d': {0: 1, 1: 3, 2: 5, 3: 8, 4: 6},})
# create a column with the dates as datetime
df1['date_'] = pd.to_datetime(df1['m_date'])
df2['date_'] = pd.to_datetime(df2['date'])
# merge_asof
df3 = (pd.merge_asof(df1.sort_values('date_'),
df2.sort_values('date_'),
by=['symbol'], on=['date_'],
direction='nearest')
)
print (df3)
symbol var1 m_date date_ date Per1d
0 A 67 2/20/19 2019-02-20 1/2/19 1
1 A 43 5/14/19 2019-05-14 5/3/19 3
2 A 45 8/14/19 2019-08-14 8/4/19 5
3 A 34 11/25/19 2019-11-25 1/5/20 8