如何合并具有日期/时间格式的两个数据框列,并使用第二列中的值更新表

时间:2019-03-01 12:21:38

标签: python pandas dataframe pandas-groupby

值的数据框df1:

                0                        1         
0     2017-07-07 Sat 05:02:00 |  UNKNOWN    
1     2018-07-07 Sat 06:02:00 |   UNKNOWN
222500     2019-07-07 Sat 07:02:00 | UNKNOWN     

和其他df2值:

               0                        1         
0     2018-07-07 Sat 05:02:00 |  password3
1     2017-07-07 Sat 06:02:00 |   password1
222500     2019-07-07 Sat 07:02:00 |   password2 
Output I need is:
                0                        1         
0     2017-07-07 Sat 05:02:00 |  password1
1     2018-07-07 Sat 06:02:00 |   password3
222500     2019-07-07 Sat 07:02:00 |   password2   

我尝试了这个:     d = dict(zip(df2 [0],df2 [1]))

pat = (r'({})'.format('|'.join(d.keys())))
df1[1]=ABC[0].str.extract(pat,expand=False).map(d)
print(df1)

3 个答案:

答案 0 :(得分:1)

import pandas as pd


dd1 = {0: ['2017-07-07 Sat 05:02:00', '2018-07-07 Sat 06:02:00', '2019-07-07 Sat 07:02:00'],1: ['UNKNOWN', 'UNKNOWN' ,'UNKNOWN']}
dd2 = {0: ['2017-07-07 Sat 05:02:00', '2018-07-07 Sat 06:02:00', '2019-07-07 Sat 07:02:00'],1: ['password1', 'password2' ,'password3']}

df1 = pd.DataFrame(data=dd1)
df2 = pd.DataFrame(data=dd2)
df2 = df2.rename(columns={1:'new'})

df = pd.merge(left=df2, right=df1, left_on=0, right_on=0, how='left').drop(columns=[1])

答案 1 :(得分:0)

您可以尝试:

df1 = pd.DataFrame({'date': pd.date_range(start = '12-02-18 19:02:02', periods=4, freq= 'h'),
                  'value': ['UNKNOWN']*4})
df1
      date               value
0   2018-12-02 19:02:02 UNKNOWN
1   2018-12-02 20:02:02 UNKNOWN
2   2018-12-02 21:02:02 UNKNOWN
3   2018-12-02 22:02:02 UNKNOWN  



df2 = pd.DataFrame({'date': pd.date_range(start = '12-02-18 20:02:02', periods=4, freq= 'h'),
                  'value': ['password1', 'password2', 'password3', 'password4']})
df2

           date          value
0   2018-12-02 20:02:02 password1
1   2018-12-02 21:02:02 password2
2   2018-12-02 22:02:02 password3
3   2018-12-02 23:02:02 password4

df1 = df1.set_index('date')
df2 = df2.set_index('date')

df1['value'] = df2['value']
df1

                     value
date    
2018-12-02 19:02:02  NaN
2018-12-02 20:02:02  password1
2018-12-02 21:02:02  password2
2018-12-02 22:02:02  password3

您也可以使用df.join:

df1 = df1.set_index('date')
df2 = df2.set_index('date')

df1.join(df2, how='inner', lsuffix='_l', rsuffix='_r').drop(columns = ['value_l'])



                     value_r
date    
2018-12-02 20:02:02 password1
2018-12-02 21:02:02 password2
2018-12-02 22:02:02 password3

答案 2 :(得分:0)

这可能不是最优雅的解决方案,但是您可以这样做:

df1 = pd.DataFrame({'date': pd.date_range('2017-07-07', '2019-07-07', 3), 'val': [1, 1, 1]})
df2 = pd.DataFrame({'date': pd.date_range('2017-07-07', '2019-07-07', 3), 'val': [10, 10, 10]})

df = df1.merge(df2, on='date', how='inner', suffixes=('_left', '_right'))
df.drop(columns='val_left', inplace=True)
df.rename(columns={'val_right': 'val'}, inplace=True)

基于date列合并两个DataFrame,并保持df2中的列为空。结果是:

        date  val
0 2017-07-07   10
1 2018-07-07   10
2 2019-07-07   10
相关问题