整数和浮点数异构向量的列类型推断[pandas]

时间:2013-12-20 19:15:03

标签: python pandas

我正在为许多不同的维基百科页面计算一些文章指标,例如文章长度和每个部分的引用。这些指标的类型是int或float。我把它们存储在一个词典中,并没有试图让它们成为熊猫来创建一些直方图和统计数据。当我尝试填充DataFrame时,即使我在所有度量值上调用float(),df列的类型仍然是对象而不是某些数字类型。当它不是数字类型时,我不能在列表上调用数值运算。如何让pandas将此列识别为数字?

arts = {"Q774": 
{"metrics": 
    {"fr": {"informativeness": 1.3500775193798449, "referencerate": 0.0026265931794695143, "completeness": 202.4, "numheadings": 19, "articlelength": 23224.0}, 
    "en": {"informativeness": 7.602386920360031, "referencerate": 0.003673816096835846, "completeness": 308.8, "numheadings": 36, "articlelength": 47090.0}, 
    "sw": {"informativeness": 0.0650467289719626, "referencerate": 0.0, "completeness": 18.400000000000002, "numheadings": 1, "articlelength": 232.0}} } }

df = pd.DataFrame(columns=['qid','lang','metric','val'])
for qid, attribdict in arts.iteritems():
        for attrib, langdict in attribdict.iteritems():
            if attrib == 'metrics':
                for lang, metrics in langdict.iteritems(): 
                    for metric_name, metric_val in metrics.iteritems():
                        df = df.append({'qid': qid, 'lang':lang, 'metric':metric_name,'val':float(metric_val)}, ignore_index=True)

In [258]: df['val']
Out [258]:
0        1.350078
1     0.002626593
2           202.4
3              19
4           23224
5        7.602387
6     0.003673816
7           308.8
8              36
9           47090
10     0.06504673
11              0
12           18.4
13              1
14            232
Name: val, dtype: object

1 个答案:

答案 0 :(得分:2)

你确定可以使用convert_objects

进行浮动
>>> df = df.convert_objects(convert_numeric=True)
>>> df[:2]
     qid lang           metric           val
0   Q774   fr  informativeness      1.350078
1   Q774   fr    referencerate      0.002627
>>> df.dtypes
qid        object
lang       object
metric     object
val       float64
相关问题