Question

我使用answer to this question尝试在我的数据帧上创建一个类似的切片。但它似乎不起作用，因为我的行索引是一个TimeSeries。我不知道如何重新切片。

我使用的df有一个TimeSeries索引，列是两级MultiIndex。我试图在任意行中返回一系列由每个主要列的“px”子列组成的系列。

第一次尝试：df.loc[0,(slice(None), 'px')]抛出TypeError，

TypeError: cannot do index indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [0] of <type 'int'>

所以我也尝试为索引提供DateTime，而不是int：

useIndex = sdf.index[0]
return df.loc[useIndex,(slice(None), 'px')]

给出了：

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

...后记

如果我只是做一个简单的，

useIndex = sdf.index [0] useIndex sdf.iloc [useIndex]

我得到了失败：

TypeError: cannot do label indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2015-10-08 00:00:00] of <class 'pandas.tslib.Timestamp'>

所以问题可能是我真的没有将有效索引传递给MultiIndex切片吗？

=====

以下是两个例子：第一个df（'df'）我能够提取出我想要的数据。第二个df，（'df2'）抛出类型错误。

import pandas as pd
import numpy as np

cols = [['col_1', 'col_2'], ['delta', 'px']]
multi_idx = pd.MultiIndex.from_product(cols, names= ["level_0", "level_1"])
df = pd.DataFrame(np.random.rand(20).reshape(5, 4), index=range(5), columns=multi_idx)

row_number =1 

print df.loc[df.index[row_number], pd.IndexSlice[:, 'px']]

rng = pd.date_range('1/1/2011', periods=5, freq='H')
df2 = pd.DataFrame(np.random.rand(20).reshape(5, 4), index=rng, columns=multi_idx)

#print df2.loc[df.index[row_number], pd.IndexSlice[:, 'px']]
useIndex = df2.index[0] 

print df2.loc[useIndex, pd.IndexSlice[:, 'px']]

Answer 1

使用IndexSlice应有助于获得所需的结果。请注意，列首先需要排序：

cols = [['col_1', 'col_2'], ['delta', 'px']]
multi_idx = pd.MultiIndex.from_product(cols, names= ["level_0", "level_1"])
df = pd.DataFrame(np.random.rand(20).reshape(5, 4), index=range(5), columns=multi_idx)

>>> df
level_0     col_1               col_2          
level_1     delta        px     delta        px
0        0.891758  0.071693  0.629897  0.693161
1        0.772542  0.022781  0.684584  0.892641
2        0.925957  0.794940  0.146950  0.134798
3        0.159558  0.842898  0.677927  0.028675
4        0.436268  0.989759  0.471879  0.101878

row_number = 3
>>> df.loc[df.index[row_number], pd.IndexSlice[:, 'px']]
level_0  level_1
col_1    px         0.842898
col_2    px         0.028675
Name: 3, dtype: float64

使用时间序列索引进行MultiIndex切片

1 个答案: