df.set_index()中的浮点数与Int行为

时间:2012-07-19 14:49:41

标签: python pandas

set_index在最新的熊猫版本(0.8)中有显着变化吗?我无法按预期工作:

我原来的尝试尝试在'id'上设置索引

ipdb> merged2['id']
16    130809
25    130687
32    130686
9      41736
22    131913
7     130691
33    129993
13    130680
28    134295
29    130708

ipdb> merged2.set_index('id')
*** KeyError: 0
ipdb> [type(i) for i in merged2['id']]
[<type 'numpy.float64'>, <type 'numpy.float64'>, <type 'numpy.float64'>, <type 'numpy.float64'>, <type 'numpy.float64'>, <type 'numpy.float64'>, <type 'numpy.float64'>, <type 'numpy.float64'>, <type 'numpy.float64'>, <type 'numpy.float64'>]

当前索引为int:

ipdb> merged2.index
Int64Index([16, 25, 32,  9, 22,  7, 33, 13, 28, 29])

ipdb> [type(i) for i in merged2.index]
[<type 'numpy.int64'>, <type 'numpy.int64'>, <type 'numpy.int64'>, <type 'numpy.int64'>, <type 'numpy.int64'>, <type 'numpy.int64'>, <type 'numpy.int64'>, <type 'numpy.int64'>, <type 'numpy.int64'>, <type 'numpy.int64'>]

解决方法尝试构建新索引:

ndx=range(len(merged2))
[type(i) for i in ndx]
[<type 'int'>, <type 'int'>, <type 'int'>, <type 'int'>, <type 'int'>, <type 'int'>, <type 'int'>, <type 'int'>, <type 'int'>, <type 'int'>]


ipdb> merged2.set_index(ndx)
*** KeyError: 'no item named 0'

最后,将我的索引映射为int有效:

merged2['id']=map(lambda x: int(x), merged2['id']
merged2.set_index('id')

关于我做错了什么的想法?

1 个答案:

答案 0 :(得分:1)

似乎在0.8.1dev上对我有用。你可以发布堆栈跟踪和/或merged2看起来像什么?你确定你还在使用pandas 0.8吗?

In [50]: import pandas as pd

In [51]: idx = pd.Index([16, 25, 32, 9, 22, 7, 33, 13, 28, 29])

In [52]: idx
Out[52]: Int64Index([16, 25, 32,  9, 22,  7, 33, 13, 28, 29])

In [53]: df = DataFrame(np.random.randn(len(idx), 3), idx, ['id', 1, 2])

In [54]: df
Out[54]: 
          id         1         2
16  0.351188  2.082303 -0.143037
25  0.633243 -1.731306  0.749934
32 -0.337893 -0.264249 -0.549856
9  -0.728056  0.786955  1.103877
22  1.131559 -0.255439 -0.397913
7  -1.384519  0.397626 -0.421481
33  1.356455  2.863659 -2.060498
13 -0.355786 -0.051383 -0.609486
28 -0.056607  0.767800  1.433946
29 -0.288202 -0.437992  0.843746

In [55]: df.set_index('id')
Out[55]: 
                  1         2
id                           
 0.351188  2.082303 -0.143037
 0.633243 -1.731306  0.749934
-0.337893 -0.264249 -0.549856
-0.728056  0.786955  1.103877
 1.131559 -0.255439 -0.397913
-1.384519  0.397626 -0.421481
 1.356455  2.863659 -2.060498
-0.355786 -0.051383 -0.609486
-0.056607  0.767800  1.433946
-0.288202 -0.437992  0.843746

In [56]: pd.__version__
Out[56]: '0.8.1.dev-e2633d4'