Question

我有5个带有一些插入元素（数字）的数组：

1， 4 中，8,10
1,2,3， 4 中，11,15
2， 4 中，20,21
2 中，30个

我需要在这些数组中找到最常见的元素，并且每个元素应该一直持续到最后（参见下面的示例）。在这个例子中，它是粗体组合（或者是同一个，但最后是“30”，它是“相同的”）因为它包含最少数量的不同元素（只有两个，4和2/30）。 / p>

这种组合（见下文）并不好，因为如果我有前任。 “4”它必须“去”直到它结束（下一个数组必须完全不包含“4”）。所以组合必须一直持续到最后。

1， 4 中，8,10
1， 2 中，3,4,11,15
2 中，4,20,21
2 中，30个

EDIT2：OR

1， 4 中，8,10
1,2,3， 4 中，11,15
2 中，4,20,21
2 中，30个

或其他任何事情都不好。

是否有一些算法可以加快速度（如果我有数千个阵列，每个阵列中有数百个元素）？

要说清楚 - 解决方案必须包含最少数量的不同元素，并且必须将组（具有相同数字）从第一个 - 较大的组分组到最后一个 - 最小的元素。所以在上面的例子中，4,4,4,2比4,2,2,2好，因为在第一个例子中， 4的组大于2的组。

编辑：更具体。解决方案必须包含最小数量的不同元素，并且这些元素必须从头到尾分组。所以，如果我有三个像

这样的阵营

1,2,3
1,4,5
4,5,6

解决方案是1,1,4或1,1,5或1,1,6不是2,5,5，因为1有更大的组（其中两个）而不是2（只有一个）。

感谢。

EDIT3：我不能更具体:(

EDIT4：@spintheblack 1,1,1,2,4是正确的解决方案，因为第一次使用的数字（假设在位置1）以后不能使用（除了它在1的SAME组中）。我会说分组有“优先级”吗？另外，我没有提到它（对不起），但是数组中的数字没有以任何方式排序，我在这篇文章中就这样输入了，因为我更容易理解。

Answer 1

如果arrays是包含每个单独数组的数组，那么这是您要采用的方法。

从i = 0
current = arrays[i]
从i到i+1

len(arrays)-1

new = current & arrays[i]（设置交叉点，查找常用元素）
如果new中有任何元素，请执行步骤6，否则跳至7
current = new，返回步骤3（继续循环）
从当前current = arrays[i]打印或生成元素，返回步骤3（继续循环）

这是一个Python实现：

def mce(arrays):
  count = 1
  current = set(arrays[0])
  for i in range(1, len(arrays)):
    new = current & set(arrays[i])
    if new:
      count += 1
      current = new
    else:
      print " ".join([str(current.pop())] * count),
      count = 1
      current = set(arrays[i])
  print " ".join([str(current.pop())] * count)

>>> mce([[1, 4, 8, 10], [1, 2, 3, 4, 11, 15], [2, 4, 20, 21], [2, 30]])
4 4 4 2

Answer 2

如果所有都是数字列表，~~并且都已排序，~~则

转换为位图数组。
保持'和'位图直到你达到零。前一个值中1的位置表示第一个元素。
从下一个元素

Answer 3

这已经变成了一个扭曲的图形问题。

问题是停靠点之间连接的有向无环图，目标是最小化乘坐火车/有轨电车时的线路开关数量。

即。这个集合列表：

1,4,8,10           <-- stop A
1,2,3,4,11,15      <-- stop B
2,4,20,21          <-- stop C
2,30               <-- stop D, destination

他需要选择在他的出口站点可用的线路和他的到达站点，例如，他不能从A站点选择10，因为10不会停止B。

所以，这是可用行的集合以及它们停止的停止：

             A     B     C     D
line 1  -----X-----X-----------------
line 2  -----------X-----X-----X-----
line 3  -----------X-----------------
line 4  -----X-----X-----X-----------
line 8  -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----

如果我们认为正在考虑的线路必须至少连续2次停止，那么让我突出显示具有相同符号的线路的可能选择：

             A     B     C     D
line 1  -----X=====X-----------------
line 2  -----------X=====X=====X-----
line 3  -----------X-----------------
line 4  -----X=====X=====X-----------
line 8  -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----

然后，他需要选择一种方式将他从A传送到D，并使用最少的线路开关。

由于他解释说他想要最长的游乐设施，以下序列似乎是最好的解决方案：

从第A行到第4行停止C，然后从第C行切换到第2行

代码示例：

stops = [
    [1, 4, 8, 10],
    [1,2,3,4,11,15],
    [2,4,20,21],
    [2,30],
]

def calculate_possible_exit_lines(stops):
    """
    only return lines that are available at both exit
    and arrival stops, discard the rest.
    """

    result = []
    for index in range(0, len(stops) - 1):
        lines = []
        for value in stops[index]:
            if value in stops[index + 1]:
                lines.append(value)
        result.append(lines)
    return result

def all_combinations(lines):
    """
    produce all combinations which travel from one end
    of the journey to the other, across available lines.
    """

    if not lines:
        yield []
    else:
        for line in lines[0]:
            for rest_combination in all_combinations(lines[1:]):
                yield [line] + rest_combination

def reduce(combination):
    """
    reduce a combination by returning the number of
    times each value appear consecutively, ie.
    [1,1,4,4,3] would return [2,2,1] since
    the 1's appear twice, the 4's appear twice, and
    the 3 only appear once.
    """

    result = []
    while combination:
        count = 1
        value = combination[0]
        combination = combination[1:]
        while combination and combination[0] == value:
            combination = combination[1:]
            count += 1
        result.append(count)
    return tuple(result)

def calculate_best_choice(lines):
    """
    find the best choice by reducing each available
    combination down to the number of stops you can
    sit on a single line before having to switch,
    and then picking the one that has the most stops
    first, and then so on.
    """

    available = []
    for combination in all_combinations(lines):
        count_stops = reduce(combination)
        available.append((count_stops, combination))
    available = [k for k in reversed(sorted(available))]
    return available[0][1]

possible_lines = calculate_possible_exit_lines(stops)
print("possible lines: %s" % (str(possible_lines), ))
best_choice = calculate_best_choice(possible_lines)
print("best choice: %s" % (str(best_choice), ))

此代码打印：

possible lines: [[1, 4], [2, 4], [2]]
best choice: [4, 4, 2]

因为，正如我所说，我在停靠点之间列出行，并且上述解决方案可以计为行，您必须从每个停止或行退出你必须到达下一站。

所以路线是：

在A站跳到第4行，然后骑上它停止B，然后停止C

在C站跳上2号线，然后骑上去停止D

这里可能存在边缘情况，上述代码不起作用。

然而，我对这个问题并不感兴趣。 OP已经证明完全无法以清晰简洁的方式传达他的问题，我担心对上述文本和/或代码的任何更正以容纳最新的评论只会引起更多的评论，这导致另一个版本的无限的问题，等等。 OP已经竭尽全力避免回答直接问题或解释问题。

Answer 4

我将根据评论在这里采取行动，请随时进一步澄清澄清。

我们有N个数组，当我们从每个数组中选取一个值时，我们试图找到所有数组的“最常见”值。有几个约束1）我们想要最小数量的不同值2）最常见的是相似字母的最大分组（为了清晰起见，从上面改变）。因此，4 t和1 p击败3 x 2 y'

我认为这两个问题都不能贪婪地解决 - 这是一个反例[[1,4]，[1,2]，[1,2]，[2]，[3,4]] - 贪婪算法将选择[1,1,1,2,4]（3个不同的数字）[4,2,2,2,4]（两个不同的数字）

这看起来像是一个二分匹配问题，但我仍然想出这个公式......

编辑：忽略;这是一个不同的问题，但如果有人能搞清楚，我会非常感兴趣

编辑2 ：对于任何有兴趣的人，我误解的问题都可以表述为命中集问题的实例，请参阅http://en.wikipedia.org/wiki/Vertex_cover#Hitting_set_and_set_cover。基本上，二分图的左侧是数组，右侧是数字，在包含每个数字的数组之间绘制边。不幸的是，这是NP完全，但上面描述的贪婪解决方案基本上是最好的近似。

Answer 5

我假设“不同元素”不必实际上是不同的，它们可以在最终解决方案中重复。如果与[1], [2], [1]一起显示，则允许显而易见的答案[1, 2, 1]。但我们认为这有3个不同的元素。

如果是这样，那么这是一个Python解决方案：

def find_best_run (first_array, *argv):
    # initialize data structures.
    this_array_best_run = {}
    for x in first_array:
        this_array_best_run[x] = (1, (1,), (x,))

    for this_array in argv:
        # find the best runs ending at each value in this_array
        last_array_best_run = this_array_best_run
        this_array_best_run = {}

        for x in this_array:
            for (y, pattern) in last_array_best_run.iteritems():
                (distinct_count, lengths, elements) = pattern
                if x == y:
                    lengths = tuple(lengths[:-1] + (lengths[-1] + 1,))
                else :
                    distinct_count += 1
                    lengths = tuple(lengths + (1,))
                    elements = tuple(elements + (x,))

                if x not in this_array_best_run:
                    this_array_best_run[x] = (distinct_count, lengths, elements)
                else:
                    (prev_count, prev_lengths, prev_elements) = this_array_best_run[x]
                    if distinct_count < prev_count or prev_lengths < lengths:
                        this_array_best_run[x] = (distinct_count, lengths, elements)

    # find the best overall run
    best_count = len(argv) + 10 # Needs to be bigger than any possible answer.
    for (distinct_count, lengths, elements) in this_array_best_run.itervalues():
        if distinct_count < best_count:
            best_count = distinct_count
            best_lengths = lengths
            best_elements = elements
        elif distinct_count == best_count and best_lengths < lengths:
            best_count = distinct_count
            best_lengths = lengths
            best_elements = elements

    # convert it into a more normal representation.                
    answer = []
    for (length, element) in zip(best_lengths, elements):
        answer.extend([element] * length)

    return answer

# example
print find_best_run(
    [1,4,8,10],
    [1,2,3,4,11,15],
    [2,4,20,21],
    [2,30]) # prints [4, 4, 4, 30]

这是一个解释。 ...this_run词典具有作为当前数组中元素的键，它们具有元组(distinct_count, lengths, elements)的值。我们试图最小化distinct_count，然后最大化长度（长度是一个元组，所以这将更喜欢第一个点中具有最大值的元素）并且是结束的跟踪元素。在每个步骤中，我构造所有可能的运行，这些运行是前一个数组的运行与下一个顺序的元素的组合，并找出哪个最适合当前。当我走到最后，我选择最好的整体运行，然后将其转换为传统的表示并返回它。

如果您有N长度为M的数组，则需要O(N*M*M)时间才能运行。

在不同数组中查找“最常见元素”的算法

5 个答案: