查找算法:使用从子序列列表中选择的脱节子序列的最小长度组合重建序列

时间:2017-04-27 09:11:10

标签: algorithm sequence

我不知道在这里问这个问题是否合适,如果不是,那就很抱歉。

我有一个序列ALPHA,例如:

A B D Z A B X

我得到了ALPHA的子序列列表,例如:

A B D
B D
A B
D Z
A
B
D
Z
X

我搜索找到重构ALPHA的脱节子序列的最小长度的算法,例如在我们的例子中:

{A B D} {Z} {A B} {X}

有什么想法吗?我的猜测已经存在。

1 个答案:

答案 0 :(得分:1)

您可以将此问题转换为在图表中查找最小路径。

节点将对应于字符串的前缀,包括空字符串的前缀。如果存在允许的子序列,则从节点A到节点B将存在边缘,当附加到字符串预置A时,结果是字符串预置B.

现在问题转化为从对应于空字符串的节点开始查找图中的最小路径,并以对应于整个输入字符串的节点结束。

您现在可以应用例如 BFS(因为边缘具有统一的成本),或者应用Dijkstra算法来查找此路径。

以下python代码是基于以上原则的实现:

def reconstruct(seq, subseqs):
    n = len(seq)

    d = dict()
    for subseq in subseqs:
        d[subseq] = True

    # in this solution, the node with value v will correspond
    # to the substring seq[0: v]. Thus node 0 corresponds to the empty string
    # and node n corresponds to the entire string

    # this will keep track of the predecessor for each node
    predecessors = [-1] * (n + 1)
    reached = [False] * (n + 1)
    reached[0] = True

    # initialize the queue and add the first node
    # (the node corresponding to the empty string)
    q = []
    qstart = 0
    q.append(0)

    while True:
        # test if we already found a solution
        if reached[n]:
            break

        # test if the queue is empty
        if qstart > len(q):
            break

        # poll the first value from the queue
        v = q[qstart]
        qstart += 1

        # try appending a subsequence to the current node
        for n2 in range (1, n - v + 1):
            # the destination node was already added into the queue
            if reached[v + n2]:
                continue

            if seq[v: (v + n2)] in d:
                q.append(v + n2)
                predecessors[v + n2] = v
                reached[v + n2] = True

    if not reached[n]:
        return []

    # reconstruct the path, starting from the last node
    pos = n
    solution = []
    while pos > 0:
        solution.append(seq[predecessors[pos]: pos])
        pos = predecessors[pos]
    solution.reverse()

    return solution


print reconstruct("ABDZABX", ["ABD", "BD", "AB", "DZ", "A", "B", "D", "Z", "X"])

我没有太多使用python的经验,这是我更喜欢坚持基础知识的主要原因(例如实现一个带有列表+开头索引的队列)。