熊猫:合并两个密钥不同的数据框

时间:2016-02-08 03:36:45

标签: python pandas

我有两个pandas数据帧,一个存储值,另一个存储值数据帧的权重键:[Symbol,Date,Hour],权重数据帧为[Symbol,Date]。

In [8]: value_df = pd.DataFrame({'Symbol':['S1','S1','S1','S1','S2','S2','S3'],
             'Date' : [20150101,20150101, 20150101, 20150102,20150101,20150102,20150103],
             'Hour' : [8,9,10,8,8,8,8],
             'value' : [10,10.1,10.2,11,100,101,300]})

In [9]: value_df
Out[9]: 
       Date  Hour Symbol  value
0  20150101     8     S1   10.0
1  20150101     9     S1   10.1
2  20150101    10     S1   10.2
3  20150102     8     S1   11.0
4  20150101     8     S2  100.0
5  20150102     8     S2  101.0
6  20150103     8     S3  300.0

In [10]: weight_df = pd.DataFrame({'Symbol': ['S1','S1','S1','S2','S2','S2','S3','S3','S3'], 'Date':[20150101,20150102,20150103] * 3,'Weight': [0.8,0.9,1,1,1,1,0.5,0.5,0.5]})

In [11]: weight_df
Out[11]: 
       Date Symbol  Weight
0  20150101     S1     0.8
1  20150102     S1     0.9
2  20150103     S1     1.0
3  20150101     S2     1.0
4  20150102     S2     1.0
5  20150103     S2     1.0
6  20150101     S3     0.5
7  20150102     S3     0.5
8  20150103     S3     0.5

我想合并这两个表,并在value_df中添加权重列它应该是一个笛卡尔积。对于实例:


       Date  Hour Symbol  value weight
0  20150101     8     S1   10.0 0.8
1  20150101     9     S1   10.1 0.8
2  20150101    10     S1   10.2 0.8 
3  20150102     8     S1   11.0 0.9
4  20150101     8     S2  100.0 1.0
5  20150102     8     S2  101.0 1.0 
6  20150103     8     S3  300.0 0.5

这里的挑战是因为额外的'小时'柱。

1 个答案:

答案 0 :(得分:2)

我不确定我是否理解“挑战”。简单的合并已经提供了您想要的输出:

>>> pandas.merge(value_df, weight_df, on=['Date', 'Symbol'])
       Date  Hour Symbol  value  Weight
0  20150101     8     S1   10.0     0.8
1  20150101     9     S1   10.1     0.8
2  20150101    10     S1   10.2     0.8
3  20150102     8     S1   11.0     0.9
4  20150101     8     S2  100.0     1.0
5  20150102     8     S2  101.0     1.0
6  20150103     8     S3  300.0     0.5