确定高频间隔

时间:2012-03-10 17:21:43

标签: python

我正在尝试扩展我编写的函数,以找到“足够接近”的字典的前3个值也都低于阈值(此处N = 70)。 :

d = {
1: {0: 222, 2:44, 18: 44, 20: 22, 21:72, 105:22, 107:9, 115: 66},
2: {0: 61.0, 993: 65.0, 1133: 84.0, 1069: 48.0, 105:22, 107:9, 115: 24, 214:22, 206:9,       225: 241,412: 83.0, 364: 68.0, 682: 64.0, 172: 58.0} 
}#nested dictionary

def ff(d):
   G = []
   for k,v in sorted(d.iteritems()):
      G.append((k,v))
       #print G
   for i in range(len(G)-2):
      if (G[i+2][0] - G[i][0] < 20) & (G[i][1] <= 70) & (G[i+1][1] <=70) & (G[i+2][1]<=70):
          return i, G[i], G[i+1], G[i+2]

for idnum, ds in sorted(d.iteritems()):
    print ff(ds)

输出:

[(0, 222), (2, 44), (18, 44), (20, 22), (21, 72), (105, 22), (107, 9), (115, 66)]
(1, (2, 44), (18, 44), (20, 22))
[(0, 61.0), (105, 22), (107, 9), (115, 24), (172, 58.0), (206, 9), (214, 22), (225, 241),       (364, 68.0), (412, 83.0), (682, 64.0), (993, 65.0), (1069, 48.0), (1133, 84.0)]
(1, (105, 22), (107, 9), (115, 24)) #first interval fitting criteria

我想做的是,实际上找到长度为20的所有窗口,并跟踪它有多少值&lt; = 70。任何关于如何开始的想法都会很棒。我似乎无法弄清楚如何使用“i”移动条件:

if (G[i+2][0] - G[i][0] < 20) & (G[i][1] <= 70) & (G[i+1][1] <=70) & (G[i+2][1]<=70):

基于长度20而不是索引的东西?

最终,而不是“前三个”我想跟踪所有更高的频率,最小值为“至少3个值<= 70,连续订购* ,长度为20区间“。

所需的输出:

如果我们有

   d[3] = {0: 61.0, 993: 65.0, 1133: 84.0, 1069: 48.0, 105:22, 107:9, 115: 24, 117:22, 200:100, 225: 241,412: 83.0, 420: 68.0, 423: 64.0, 430: 58.0}

会产生输出:

[(105, 22), (107, 9), (115, 24),(117,22)], [(420, 68.0),(423,63),(430,58)] 
# These can be of any length as long as the overall interval of the list is <=20. 

2 个答案:

答案 0 :(得分:1)

这可能有助于您入门。它是基于循环的,甚至不使用zip(更不用说itertools.takewhile!),但希望有意义:

def find_windows(d, min_elements=3,upper_length=20,max_value=70):
    G = sorted(d.items())
    for start_index in range(len(G)):
        for width in range(min_elements, len(G)-start_index+1):
            window = G[start_index:start_index+width]
            if not all(v <= max_value for k,v in window):
                break
            if not window[-1][0] - window[0][0] < upper_length:
                break
            yield window

我使用“break”因为只要我们有任何值&gt; max_value或我们&gt; = upper_length从start_index开始没有更多可能的窗口。

如果之前没有见过yield,它会将函数转换为生成函数;它就像一个return,函数发回(产生)值,然后可以继续而不是停止。 (有关详细信息,请参阅this question的答案。)

>>> Ds = {
...     1: {0: 222, 2:44, 18: 44, 20: 22, 21:72, 105:22, 107:9, 115: 66},
...     2: {0: 61.0, 993: 65.0, 1133: 84.0, 1069: 48.0, 105:22, 107:9, 115: 24, 214:22, 206:9, 225: 241,412: 83.0, 364: 68.0, 682: 64.0, 172: 58.0} 
...     }
>>> 
>>> for idnum, d in sorted(Ds.items()):
...     print idnum, list(find_windows(d))
... 
1 [[(2, 44), (18, 44), (20, 22)], [(105, 22), (107, 9), (115, 66)]]
2 [[(105, 22), (107, 9), (115, 24)]]
>>> mydict = dict([(0,55),(1,55),(2,55),(3,55)])
>>> 
>>> for window in find_windows(mydict):
...     print window
... 
[(0, 55), (1, 55), (2, 55)]
[(0, 55), (1, 55), (2, 55), (3, 55)]
[(1, 55), (2, 55), (3, 55)]
>>> list(find_windows(mydict))
[[(0, 55), (1, 55), (2, 55)], [(0, 55), (1, 55), (2, 55), (3, 55)], [(1, 55), (2, 55), (3, 55)]]

我仍然不完全清楚你想要对重叠窗口做什么,但是目前它找到了所有这些,你可以在函数内或后处理中决定你想要如何处理它。

将代码修改为而不是测试是否所有值都是&lt; = max_value并改为计算它们应该是微不足道的,所以我将单独留下。

答案 1 :(得分:1)

我把问题分成了两部分。第一个生成器会将您的ds字典拆分为有序(key, value)列表,这样每个列表都没有值&gt; 70.与此同时,我丢弃了少于3个项目的块。

def split_iter(d, limit=70):
    G = list(sorted(d.iteritems()))
    start = 0
    for i, (k, v) in enumerate(G):
        if v > limit:
            if i - start >= 3:
                yield G[start:i]
            start = i + 1
    G_tail = G[start:]
    if len(G_tail) >= 3:
        yield G_tail

现在我将与bisect_right模块中的bisect一起使用,快速找到从每个项目开始的最大可能窗口:

from bisect import bisect_right

def ff(d):
    for chunk in split_iter(d):
        last_end_i = 0
        for i, (k, v) in enumerate(chunk):
            end_i = bisect_right(chunk, (k + 20, 0))
            if end_i - i < 3:
                continue
            if last_end_i != end_i:
                yield chunk[i:end_i]
                last_end_i = end_i
            if end_i == len(chunk):
                break

如你所见,我只会产生最大可能的窗口。现在我们把它放在一起:

for idnum, ds in sorted(d.iteritems()):
    for r in ff(ds):
        print idnum, repr(r)

希望我做对了。输出是这样的:

1 [(2, 44), (18, 44), (20, 22)]
1 [(105, 22), (107, 9), (115, 66)]
2 [(105, 22), (107, 9), (115, 24)]