找到给定位置的最近一组数字

时间:2015-08-10 14:49:25

标签: python dictionary

我有一本类似的字典:

exons = {'NM_015665': [(0, 225), (356, 441), (563, 645), (793, 861)], etc...}

和另一个位置如此的文件:

isoform    pos    
NM_015665    449

我想要做的是打印文件中的位置最接近的数字范围,然后在该值最接近的数字范围内打印数字。对于这种情况,我想打印(356, 441)然后441。我已成功找到一种方法来打印该值最接近的数字组中的数字,但下面的代码只考虑了所列数字两侧的10个值。有没有办法考虑到每组范围之间有不同数量的数字?

这是我到目前为止的代码:

with open('splicing_reinitialized.txt') as f:
    reader = csv.DictReader(f,delimiter="\t")
    for row in reader:
        pos = row['pos']
        name = row['isoform']
        ppos1 = int(pos)
        if name in exons:
            y = exons[name]
            for i, (low,high) in enumerate(exons[name]):
                if low -5 <= ppos1 <= high + 5:
                    values = (low,high)
                    closest = min((low,high), key = lambda x:abs(x-ppos1))

2 个答案:

答案 0 :(得分:1)

我会将其重写为最小距离搜索:

if name in exons:
    y = exons[name]
    minDist = 99999 # large number
    minIdx = None
    minNum = None
    for i, (low,high) in enumerate(y):
        dlow = abs(low - ppos1)
        dhigh = abs(high - ppos1)
        dist = min(dlow, dhigh)
        if dist < minDist:
            minDist = dist
            minIdx = i
            minNum = 0 if dlow < dhigh else 1
    print(y[minIdx])
    print(y[minIdx][minNum])

忽略搜索范围,只搜索最小距离对。

答案 1 :(得分:1)

功能替代:)。这可能会更快。它显然非常适合RAM,并且由于功能编程的特殊性,可以轻松实现并行化。我希望你会发现它足够有趣,可以学习。

from itertools import imap, izip, ifilter, repeat


def closest_point(position, interval):
    """:rtype: tuple[int, int]"""  # closest interval point, distance to it
    position_in_interval = interval[0] <= position <= interval[1]
    closest = min([(border, abs(position - border)) for border in interval], key=lambda x: x[1])
    return closest if not position_in_interval else (closest[0], 0)  # distance is 0 if position is inside an interval


def closest_interval(exons, pos):
    """:rtype: tuple[tuple[int, int], tuple[int, int]]"""
    return min(ifilter(lambda x: x[1][1], izip(exons, imap(closest_point, repeat(pos, len(exons)), exons))), 
               key=lambda x: x[1][1])


print(closest_interval(exons['NM_015665'], 449))

打印

((356, 441), (441, 8))

第一个元组是一个范围。第二个元组中的第一个整数是区间中的最近点,第二个整数是距离。