如何根据另一个列表中的阈值对一个列表中的值进行分类?

时间:2019-04-05 15:37:56

标签: python list sorting list-comprehension

我想根据另一个列表中的阈值对一个列表中的值进行分类。换句话说,我想将列表项与列表中的阈值一一比较,然后一一比较,以获得类别的输出列表。

input:
Values = [9999, 3000, 400, 9999, 1000] - it is variably long due to input data
Threshold = [10000, 5000, 1500, 800, 0] - also will be changed, so have to be variable - but always sorted descending to 0

Expected output (categories' numbers should be based on index):
cat = [0,1,3,0,2]

我相信这可以通过高级列表理解来完成,而我对此并不熟悉。所以我尝试了:

val_cat = []
thres_len = len(Threshold)
for item in Values:
     for vis in range(0,thres_len - 1):
          if Threshold[vis+1] < int(item) <= Threshold[vis]:
             val_cat = val_cat + [vis]
          else:
             pass

这种方式既不是pythonic的也不是函数式的,但是我可以尝试的最好的方法,因为几年前我只研究了VB的基础知识。

谢谢您的帮助!我相信这对这个社区是小菜一碟:-)

2 个答案:

答案 0 :(得分:1)

If you really mean to make it a list comprehension, here you go:

cat = [next(i-1 for i,t in enumerate(Threshold) if t <= v) for v in Values]

But you will expect this:

  • Last threshold Threshold[-1] must be less than everything in Values (i.e., absolute lower bound)
  • This is not fast: O(mn) for m the size of Values and n the size of Threshold. A more efficient algorithm will be to do binary search on Threshold for each value

Therefore, you might want to implement your own function to replace the next() above, to address the above two points.

答案 1 :(得分:0)

bisect模块可用于查找索引:

import bisect

Values = [9999, 3000, 400, 9999, 1000]
Threshold = [10000, 5000, 1500, 800, 0]

reversed_Threshold = list(reversed(Threshold))
len_Threshold = len(Threshold)

cat = [len_Threshold - bisect.bisect_left(reversed_Threshold, value) - 1 for value in Values]
print(cat)

它需要阈值直接排序。但是复杂度是O(len(Values)* log(len(Threshold)))。