Question

如果我的数据框看起来像

import pandas

d = pandas.DataFrame( data = {'col1':[100,101,102,103] } )
#   col1
#0   100
#1   101
#2   102
#3   103

我做了

d.set_value( 0,'col1', '200')

它演员＆＃39; 200＆＃39;到整数：

type( d.col1[0] )
#numpy.int64

但是，如果我这样做

d.set_value( 0,'col2', '200')

我得到了

type( d.col2[0] )
#str

正如所料。

更多谜团：

此外，请说我做以下

[ type(x) for x in d.col1 ]
#[numpy.int64, numpy.int64, numpy.int64, numpy.int64]
d.set_value( [0,1,2,3], 'col1', ['101', '102', '103', 200] )
[ type(x) for x in d.col1 ]
#[str, str, str, str]

因此即使d.col1最初是一个整数列，它现在也变成了一个字符串列。这种类型铸造整列的规则是什么？

我很好奇在操作pandas数据帧时自动类型转换的规则是什么。

Answer 1

pandas是列专用的，同一列中的每个元素必须具有相同的数据类型。

使用

创建数据框时

import pandas as pd
df = pd.DataFrame({'col':[100,101,102,103]})
df.col.dtype

Out[11]:
dtype('int64')

pandas自动推断所有这些输入都是数值和整数类型。因此，当您为此列col设置值时，您的所有输入都将自动投放到当前列dtype int64，因此以下内容将为您提供完全相同的输出

df.set_value(0, 'col', '200')  # cast string into int
df.set_value(0, 'col', 200)  # int input
df.set_value(0, 'col', 200.1)  # cast float64 into int64

但是当你尝试df.set_value(0, 'col1', '200')时，当前的df没有列col1，所以pandas首先创建一个名为col1的新列，它会尝试根据您的输入推断出此新列的dtype。

df.set_value(0, 'col1', '200')
df.col1.dtype  # dtype('O'), means object/string
df.set_value(0, 'col2', 200.1)
df.col2.dtype  # dtype('float64')

在python pandas DataFrames中，设置值时自动类型转换的规则是什么？

更多谜团：

1 个答案: