Python pandas asof加入群组

时间:2015-11-10 02:40:10

标签: python join pandas

我有两个pandas数据框X和Y,每个都包含过去一个月的日内价格和时间数据。我希望在Y上连接Y,即我们在每次看到X的更新时都采用Y的现行价格。我想进行日内分析(因为隔夜效应)

我现在的代码是

Y_asof = Y.groupby('Date').apply(lambda x: x.asof(X.index))

但是,这会返回错误,说明

AttributeError: 'DataFrame' object has no attribute 'asof'

当我跑步时它正在工作

Y_asof = Y.apply(lambda x: x.asof(X.index))

X的示例数据:

                                 Mid        Date
Time                                            
2015-09-15 13:02:03.000049  7.575392  2015-09-15
2015-09-15 13:02:06.000049  7.575521  2015-09-15
2015-09-15 13:02:08.000049  7.575392  2015-09-15
2015-09-15 13:02:14.000049  7.575521  2015-09-15
2015-09-15 13:02:15.000048  7.575649  2015-09-15

Y的样本数据:

                                 Mid        Date
Time                                            
2015-09-15 12:00:00.443000  4.650894  2015-09-15
2015-09-15 12:00:00.443000  4.650899  2015-09-15
2015-09-15 12:00:06.321000  4.650894  2015-09-15
2015-09-15 12:00:06.322000  4.650884  2015-09-15
2015-09-15 12:00:10.839000  4.650894  2015-09-15

有人可以帮忙吗?非常感谢!

3 个答案:

答案 0 :(得分:2)

asof是一个Series方法,而不是DataFrame方法。它适用于时间列:

In [11]: Y.groupby('Date').apply(lambda x: x["Time"].asof(X.index))
Out[11]:
Time                                 0                           1                           2                           3                           4
Date
2015-09-15  2015-09-15 12:00:00.443000  2015-09-15 12:00:00.443000  2015-09-15 12:00:06.321000  2015-09-15 12:00:06.322000  2015-09-15 12:00:10.839000

当您执行申请时,它跨越每一行(这是一个系列)。

答案 1 :(得分:0)

我相信pandas会抛出错误,因为Y.groupby('Date')会创建一个没有方法GroupBy的{​​{1}}对象。如果您只是使用asof作为按日期排序的方式,则可以改为groupby

答案 2 :(得分:0)

pandas 0.19 has an asof join。由于您希望每个In [1]: import datetime In [2]: from operator import itemgetter In [3]: from itertools import groupby, combinations In [4]: l = [ ...: (19L, datetime.datetime(2015, 2, 11, 12, 3, 43)), ...: (19L, datetime.datetime(2015, 2, 12, 16, 28, 48)), ...: (19L, datetime.datetime(2014, 9, 17, 11, 58, 19)), ...: (80L, datetime.datetime(2014, 9, 15, 12, 54, 36)), ...: (80L, datetime.datetime(2014, 9, 15, 14, 16, 39)), ...: (80L, datetime.datetime(2014, 2, 6, 8, 58, 39)), ...: (80L, datetime.datetime(2014, 9, 8, 14, 21, 48)), ...: (90L, datetime.datetime(2016, 8, 2, 18, 14, 31)), ...: (90L, datetime.datetime(2016, 8, 2, 21, 14, 23)), ...: (90L, datetime.datetime(2014, 1, 5, 16, 35, 34)) ] In [5]: for user_id, dates in groupby(l, itemgetter(0)): ...: dates = [date[1] for date in dates] ...: differences = [abs((d1 - d2).days) for d1, d2 in zip(dates[0::2], dates[1::2])] ...: print(user_id, sum(differences) / len(differences)) ...: (19L, 2) (80L, 108) (90L, 1) 的最新Y

X
相关问题