Python:按数量级对列表进行分类

时间:2013-05-30 14:44:56

标签: python

我有一个带有值的嵌套列表:

list = [
...
['Country1', 142.8576737907048, 207.69725105029553, 21.613192419863577, 15.129178465784218],
['Country2', 109.33326343550823, 155.6847323746669, 15.450489646386226, 14.131554442715336],
['Country3', 99.23033109735835, 115.37122637190915, 5.380298424850267, 5.422030104456135],
...]

我想按照数量级计算第二个索引/列中的值,从最低数量级开始,到最大数量结束......例如。

99.23033109735835 = 10 <= x < 100
142.8576737907048 = 100 <= x < 1000
             9432 = 1000 <= x < 10000

目的是输出一个简单的char(#)计数,表示每个类别中有多少个索引值,例如

  10 <= x < 100: ###
100 <= x < 1000: #########

我首先抓取索引的max()min()值,以便自动计算最大和较小的数量类别,但我不确定如何将每个值关联到列到一个数量级......如果有人能指出我正确的方向或给我一些想法,我将非常感激。

7 个答案:

答案 0 :(得分:15)

此函数会将您的double变为整数数量级:

>>> def magnitude(x):
...     return int(math.log10(x))
... 
>>> magnitude(99.23)
1
>>> magnitude(9432)
3

(所有10 ** magnitude(x) <= x <= 10 ** (1 + magnitude(x))所以x。)

只需将幅度用作键,并计算每个键的出现次数。 defaultdict在这里可能会有所帮助。


请注意,此幅度仅适用于10的正幂(因为int(double)截断向零舍入)。

使用

def magnitude(x):
    return int(math.floor(math.log10(x)))

相反,如果这对您的用例很重要。 (感谢larsmans指出这一点。)

答案 1 :(得分:2)

按数量级进行分类:

from math import floor, log10
from collections import Counter
counter =  Counter(int(floor(log10(x[1]))) for x in list)

1从10到小于100,2从100到小于1000。

print counter
Counter({2: 2, 1: 1})

然后只是将其打印出来

for x in sorted(counter.keys()):
    print "%d <= x < %d: %d" % (10**x, 10**(x+1), counter[x])

答案 2 :(得分:1)

如果x是您的某个号码,那么len(str(int(x)))是什么?

或者,如果您的数字小于0,那么int(math.log10(x))是什么?

(另请参阅log10的文档。另请注意,此处的int()舍入可能不是您想要的 - 请参阅ceilfloor,并注意您可能需要{{ 1}}或int(ceil(...))得到整数答案)

答案 3 :(得分:0)

import bisect
from collections import defaultdict
lis1 = [['Country1', 142.8576737907048, 207.69725105029553, 21.613192419863577, 15.129178465784218],
['Country2', 109.33326343550823, 155.6847323746669, 15.450489646386226, 14.131554442715336],
['Country3', 99.23033109735835, 115.37122637190915, 5.380298424850267, 5.422030104456135],
]
lis2 = [0, 100, 1000, 1000]

dic = defaultdict(int)

for x in lis1:
       x = x[1]
       ind=bisect.bisect(lis2,x) 
       if not (x >= lis2[-1] or x <= lis2[0]):
           sm, bi = lis2[ind-1], lis2[ind]
           dic ["{} <= {} <= {}".format(sm ,x, bi)] +=1
for k,v in dic.items():
    print k,'-->',v

<强>输出:

0 <= 99.2303310974 <= 100 --> 1
100 <= 142.857673791 <= 1000 --> 1
100 <= 109.333263436 <= 1000 --> 1

答案 4 :(得分:0)

如果你想要重叠范围或具有任意界限的范围(不坚持2 /任何其他可预测系列的数量级/次数):

from collections import defaultdict
lst = [
    ['Country1', 142.8576737907048, 207.69725105029553, 21.613192419863577, 15.129178465784218],
    ['Country2', 109.33326343550823, 155.6847323746669, 15.450489646386226, 14.131554442715336],
    ['Country3', 99.23033109735835, 115.37122637190915, 5.380298424850267, 5.422030104456135],
]

buckets = {
    '10<=x<100': lambda x: 10<=x<100,
    '100<=x<1000': lambda x: 100<=x<1000,
}

result = defaultdict(int)
for item in lst:
    second_column = item[1]
    for label, range_check in buckets.items():
        if range_check(second_column):
            result[label] +=1

print (result)

答案 5 :(得分:0)

另一种选择,使用bisect

import bisect
from collections import Counter
list0 = [
['Country1', 142.8576737907048, 207.69725105029553, 21.613192419863577, 15.129178465784218],
['Country2', 109.33326343550823, 155.6847323746669, 15.450489646386226, 14.131554442715336],
['Country3', 99.23033109735835, 115.37122637190915, 5.380298424850267, 5.422030104456135]
]

magnitudes = [10**x for x in xrange(5)]
c = Counter(bisect.bisect(magnitudes, x[1]) for x in list0)
for x in c:
  print x, '#'*c[x]

答案 6 :(得分:0)

Useless'答案扩展到所有实数,您可以使用:

import math

def magnitude (value):
    if (value == 0): return 0
    return int(math.floor(math.log10(abs(value))))

测试用例:

In [123]: magnitude(0)
Out[123]: 0

In [124]: magnitude(0.1)
Out[124]: -1

In [125]: magnitude(0.02)
Out[125]: -2

In [126]: magnitude(150)
Out[126]: 2

In [127]: magnitude(-5280)
Out[127]: 3