如何合并这两个数据集?

时间:2019-10-15 09:15:43

标签: python pandas merge

我有两个数据集。我想使用索引进行合并。

第一个数据集:

index       A   B   C
01/01/2010  15  20  30
15/01/2010  12  15  25
17/02/2010  14  13  35
19/02/2010  11  10  22

2nt数据集:

index  year  month     price
0      2010  january   70
1      2010  february  80

我希望他们像这样加入:

index       A   B   C   price
01/01/2010  15  20  30  70
15/01/2010  12  15  25  70
17/02/2010  14  13  35  80
19/02/2010  11  10  22  80

问题在于如何使用两列(第二数据集的yearmonth)创建临时日期时间index

2 个答案:

答案 0 :(得分:2)

尝试此操作,方法是从.dt.year中提取.month_name()和year(df1)并将其与df2合并

>>> df1      
        index   A   B   C
0  01/01/2010  15  20  30
1  15/01/2010  12  15  25
2  17/02/2010  14  13  35
3  19/02/2010  11  10  22
>>> df2              
   index  year     month  price
0      0  2010   january     70
1      1  2010  february     80

# merging df1 and df2 by month and year.
>>> df1.merge(df2,
              left_on = [pd.to_datetime(df1['index']).dt.year,
                 pd.to_datetime(df1['index']).dt.month_name().str.lower()],
              right_on = ['year', 'month'])

输出:

      index_x   A   B   C  index_y  year     month  price
0  01/01/2010  15  20  30        0  2010   january     70
1  15/01/2010  12  15  25        0  2010   january     70
2  17/02/2010  14  13  35        1  2010  february     80
3  19/02/2010  11  10  22        1  2010  february     80

答案 1 :(得分:0)

这是愚蠢的答案!我敢肯定,您可以做得更好:)但这可行,因为您的表是一个字典列表(您可以轻松地以这种格式转换SQL表)。我知道这不是一个干净的解决方案,但是您要求一个简单的解决方案,可能是最容易理解的方法:)

months = {'january': "01",
          'february': "02",
          'march': "03",
          'april':"04",
          'may': "05",
          'june': "06",
          'july': "07",
          'august': "08",
          'september': "09",
          'october': "10",
          'november': "11",
          'december': "12"}

table1 = [{'index': '01/01/2010', 'A': 15, 'B': 20, 'C': 30},
          {'index': '15/01/2010', 'A': 12, 'B': 15, 'C': 25},
          {'index': '17/02/2010', 'A': 14, 'B': 13, 'C': 35},
          {'index': '19/02/2010', 'A': 11, 'B': 10, 'C': 22}]

table2 = [{'index': 0, 'year': 2010, 'month': 'january', 'price':70},
          {'index': 1, 'year': 2010, 'month': 'february', 'price':80}]

def joiner(table1, table2):
    for row in table2:
        row['tempDate'] = "{0}/{1}".format(months[row['month']], str(row['year']))
    for row in table1:
        row['tempDate'] = row['index'][3:]

    table3 = []
    for row1 in table1:
        row3 = row1.copy()
        for row2 in table2:
            if row2['tempDate'] == row1['tempDate']:
                row3['price'] = row2['price']
                break
        table3.append(row3)
    return(table3)

table3 = joiner(table1, table2)
print(table3)