设置

Question

我一直在探索如何优化我的代码并运行<div class="menu"> <ul> <li><a href="#">Link 1</a></li> <li><a href="#">Link 2</a> <ul class="sub"> <li><a href="#">Sub 1</a> <ul class="sub2"> <li><a href="#">Sub 2</a></li> </ul> </li> </ul> </li> </ul> </div> pandas方法。根据{{3}}

基于标签的快速标量存取器

与loc类似，at提供基于标签的标量查找。您也可以使用这些索引器进行设置。

所以我跑了一些样品：

设置

.at

import pandas as pd import numpy as np from string import letters, lowercase, uppercase lt = list(letters) lc = list(lowercase) uc = list(uppercase) def gdf(rows, cols, seed=None): """rows and cols are what you'd pass to pd.MultiIndex.from_product()""" gmi = pd.MultiIndex.from_product df = pd.DataFrame(index=gmi(rows), columns=gmi(cols)) np.random.seed(seed) df.iloc[:, :] = np.random.rand(*df.shape) return df seed = [3, 1415] df = gdf([lc, uc], [lc, uc], seed) print df.head().T.head().T看起来像：

df

让我们使用a A B C D E a A 0.444939 0.407554 0.460148 0.465239 0.462691 B 0.032746 0.485650 0.503892 0.351520 0.061569 C 0.777350 0.047677 0.250667 0.602878 0.570528 D 0.927783 0.653868 0.381103 0.959544 0.033253 E 0.191985 0.304597 0.195106 0.370921 0.631576和.at并确保我得到同样的东西

.loc

使用print "using .loc", df.loc[('a', 'A'), ('c', 'C')] print "using .at ", df.at[('a', 'A'), ('c', 'C')] using .loc 0.37374090276 using .at 0.37374090276

测试速度

.loc

使用%%timeit df.loc[('a', 'A'), ('c', 'C')] 10000 loops, best of 3: 180 µs per loop

测试速度

.at

这看起来是一个巨大的速度提升。即使在缓存阶段，%%timeit df.at[('a', 'A'), ('c', 'C')] The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 8 µs per loop也比6.11 * 8

快很多

问题

180有哪些限制？我有动力去使用它。文档说它与.at类似，但它的行为并不相似。例如：

.loc

# small df sdf = gdf([lc[:2]], [uc[:2]], seed) print sdf.loc[:, :] A B a 0.444939 0.407554 b 0.460148 0.465239的结果为print sdf.at[:, :]

即使意图相似，显然也不一样。

那就是说谁可以提供有关TypeError: unhashable type方法可以做什么和不可以做什么的指导？

Answer 1

更新：自版本0.21.0起，不推荐使用df.get_value。使用df.at或df.iat是未来推荐的方法。

df.at一次只能访问一个值。

df.loc可以选择多行和/或列。

请注意，还有df.get_value，访问单个值的速度可能更快：

In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 187 µs per loop

In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
100000 loops, best of 3: 8.33 µs per loop

In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
100000 loops, best of 3: 3.62 µs per loop

在幕后，df.at[...] calls df.get_value，但它也在键上some type checking。

Answer 2

当你问到.at的限制时，我最近遇到过一件事（使用pandas 0.22）。让我们使用the documentation中的示例：

df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]], index=[4, 5, 6], columns=['A', 'B', 'C'])
df2 = df.copy()

    A   B   C
4   0   2   3
5   0   4   1
6  10  20  30

如果我现在

df.at[4, 'B'] = 100

结果看起来像预期的那样

    A    B   C
4   0  100   3
5   0    4   1
6  10   20  30

然而，当我尝试做

时

 df.at[4, 'C'] = 10.05

似乎.at尝试保留数据类型（此处为：int）：

    A    B   C
4   0  100  10
5   0    4   1
6  10   20  30

这似乎与.loc不同：

df2.loc[4, 'C'] = 10.05

产生所需的

    A   B      C
4   0   2  10.05
5   0   4   1.00
6  10  20  30.00

上面示例中的风险是它以静默方式发生（从float转换为int`）。当一个人用字符串尝试相同时会产生错误：

df.at[5, 'A'] = 'a_string'

ValueError：基数为10的int（）的无效文字：'a_string'

Answer 3

此外，at功能的熊猫documentation表示：

访问行/列标签对的单个值。

类似于loc，两者都提供基于标签的查找。用于您只需要在DataFrame或Series中获取或设置一个值。

对于设置数据，loc和at类似，例如：

df = pd.DataFrame({'A': [1,2,3], 'B': [11,22,33]}, index=[0,0,1])

loc和at都将产生相同的结果

df.at[0, 'A'] = [101,102]
df.loc[0, 'A'] = [101,102]

    A   B
0   101 11
0   102 22
1   3   33

df.at[0, 'A'] = 103
df.loc[0, 'A'] = 103

    A   B
0   103 11
0   103 22
1   3   33

此外，对于访问单个值，两者是相同的

df.loc[1, 'A']   # returns a single value (<class 'numpy.int64'>)
df.at[1, 'A']    # returns a single value (<class 'numpy.int64'>)

3

但是，当匹配多个值时，loc将从DataFrame返回一组行/列，而at将返回一个值数组

df.loc[0, 'A']  # returns a Series (<class 'pandas.core.series.Series'>)

0    103
0    103
Name: A, dtype: int64

df.at[0, 'A']   # returns array of values (<class 'numpy.ndarray'>)

array([103, 103])

更重要的是，loc可用于匹配一组行/列，并且仅可被赋予索引，而at必须接收列

df.loc[0]  # returns a DataFrame view (<class 'pandas.core.frame.DataFrame'>)

    A   B
0   103 11
0   103 22


# df.at[0]  # ERROR: must receive column

Answer 4

与.at相比，

.loc是一种优化的数据访问方法。

数据帧的

.loc选择其参数中给定indexed_rows和labeled_columns定位的所有元素。 Insetad .at选择位于给定indexed_row和labeled_column的数据帧的特定元素。

此外，.at采用一行和一列作为输入参数，而.loc可能采用多行和多列。使用.at的Oputput是单个元素，使用.loc的Oputput可能是Series或DataFrame。

pandas .at与.loc

设置

问题

4 个答案: