Question

我有一个数据框：

   a    b   c    country    e
0  5    7   11   Morocco    A
1  5    9   9    Nigeria    B
2  6    2   13   Spain      C

我想添加一列df['e'] = chr(ord('a') + df.index.astype(int))，它是与索引号对应的字母，例如：

TypeError: int() argument must be a string or a number, not 'Int64Index'

我该怎么做？我试过了：

$xmlDoc = [xml] (Get-Content -Raw "C:\Test\Test.xml")

# Initialize the results hashtable; make it ordered to preserve 
# input element order.
$ht = [ordered] @{}  

# Loop over all child elements of <Test> and create matching
# hashtable entries.
$xmlDoc.Test.ChildNodes.ForEach({ $ht[$_.Name] = $_.InnerText })

# Output the resulting hashtable.
$ht

但我明白了：

Name                           Value                                                                                                                                                
----                           -----                                                                                                                                                
TLC                            FWE                                                                                                                                                  
Crew3LC                        KMU                                                                                                                                                  
MyText                         Hello World

Answer 1

一种方法是将索引转换为Series，然后调用apply并传递lambda：

In[271]:
df['e'] = df.index.to_series().apply(lambda x: chr(ord('a') + x)).str.upper()
df

Out[271]: 
   a  b   c  country  e
0  5  7  11  Morocco  A
1  5  9   9  Nigeria  B
2  6  2  13    Spain  C

这里的错误基本上是df.index属于Int64Index类型且chr函数无法理解如何操作，因此请致电apply一个Series我们按行进行迭代转换。

我认为表现方面的列表理解会更快：

In[273]:
df['e'] = [chr(ord('a') + x).upper() for x in df.index]
df

Out[273]: 
   a  b   c  country  e
0  5  7  11  Morocco  A
1  5  9   9  Nigeria  B
2  6  2  13    Spain  C

<强>计时

%timeit df.index.to_series().apply(lambda x: chr(ord('a') + x)).str.upper()
%timeit [chr(ord('a') + x).upper() for x in df.index]
1000 loops, best of 3: 491 µs per loop
100000 loops, best of 3: 19.2 µs per loop

这里列表理解方法明显更快

Answer 2

这是另一种功能性解决方案。假设你的国家少于信件。

Unrecognized field &quot;content-available&quot; (class org.jboss.aerogear.unifiedpush.message.Message), not marked as ignorable

Answer 3

您可以使用map并从df.index获取值：

df['e'] = map(chr, ord('A') + df.index.values)

如果你进行速度比较：

# Edchum
%timeit df.index.to_series().apply(lambda x: chr(ord('A') + x))
10000 loops, best of 3: 135 µs per loop
%timeit [chr(ord('A') + x) for x in df.index]
100000 loops, best of 3: 7.38 µs per loop
# jpp
%timeit itemgetter(*df.index)(ascii_uppercase)
100000 loops, best of 3: 7.23 µs per loop
# Me
%timeit map(chr,ord('A') + df.index.values)
100000 loops, best of 3: 3.12 µs per loop

所以map似乎更快，但可能是因为数据样本的长度

pandas：将行值设置为与索引号对应的字母表的字母？

3 个答案: