Question

我有一个包含多索引['timestamp'，'symbol']的DataFrame，其中包含时间序列数据。我将这些数据与其他样本合并，我使用asof的apply函数类似于：

df.apply(lambda x: df2.xs(x['symbol'], level='symbol').index.asof(x['timestamp'])), axis=1)

我认为过滤符号的实际xs是导致它如此慢的原因，所以我创建了一个'符号'的字典 - ＆gt; df其中值已经过滤，所以我可以直接调用index.asof。我是以错误的方式接近这个吗？

示例：

df = pd.read_csv(StringIO("ts,symbol,bid,ask\n2014-03-03T09:30:00,A,54.00,55.00\n2014-03-03T09:30:05,B,34.00,35.00"), parse_dates='ts', index_col=['ts', 'symbol'])
df2 = pd.read_csv(StringIO("ts,eventId,symbol\n2014-03-03T09:32:00,1,A\n2014-03-03T09:33:05,2,B"), parse_dates='ts', index_col=['ts', 'symbol'])

# find ts to join with and use xs so we can use indexof
df2['event_ts'] = df2.apply(lambda x: df.xs(x['symbol'], level='symbol').index.asof(x['ts'])), axis=1)
# merge in fields 
df2 = pd.merge(df2, df, left_on=['event_ts', 'symbol'], right_index=True)

Pandas xs对DataFrame.apply来说很慢

0 个答案: