访问pandas列的最快方法

时间:2017-07-10 05:50:55

标签: pandas

我对访问pandas列的各种方式之间的性能差异感到困惑。

In [1]: df = pd.DataFrame([[1,1,1],[2,2,2]],columns=['a','b','c'])

In [2]: %timeit df['a']
The slowest run took 75.37 times longer than the fastest. This could
mean that an intermediate result is being cached.
100000 loops, best of 3: 3.12 µs per loop

In [3]: %timeit df.a
The slowest run took 5.14 times longer than the fastest. This could
mean that an intermediate result is being cached.
100000 loops, best of 3: 6.59 µs per loop

In [4]: %timeit df.loc[:,'a']
10000 loops, best of 3: 55 µs per loop

据我所知,最后一个变种速度较慢,因为它可以设置值,而不仅仅是访问。但为什么df.adf['a']慢?无论缓存中间结果如何,这似乎都是正确的。

1 个答案:

答案 0 :(得分:1)

Here是一个解释.访问权限和[]访问权限之间差异的链接。

另请参阅文档中的这些运算符的行为

getitem(适用于[])和getattr(适用于.)方法。

.似乎通过函数调用访问列,因此花费的时间少于作为字典键值访问的[]

相关问题