Question

我一直在审查练习算法，我现在正在研究一种我非常喜欢的排列算法：

void permute(char* set, int begin, int end) {
    int range = end - begin;

    if (range == 1)
        cout << set << endl;
    else {
        for(int i = 0; i < range; ++i) {
            swap(&set[begin], &set[begin+i]);
            permute(set, begin+1, end);
            swap(&set[begin], &set[begin+i]);
        }
    }
}

我实际上想将此应用于会有许多重复字符的情况，所以我需要能够修改它以防止打印重复的排列。

如何检测我是否正在生成重复内容？我知道我可以将它存储在散列或类似的东西中，但这不是最佳解决方案 - 我更喜欢不需要额外存储的解决方案。有人可以给我一个建议吗？

PS：我不想使用STL置换机制，我不想在某处引用另一个“唯一置换算法”。我想了解用于防止重复的机制，以便在可能的情况下将其构建到学习中。

Answer 1

没有一般方法可以阻止任意函数生成重复项。当然，您可以随时过滤掉重复项，但您不希望这样做，并且有很好的理由。因此，您需要特殊方式才能生成非重复项。

一种方法是在增加字典顺序的情况下生成排列。然后你可以比较一个“新”的permatutation是否与最后一个相同，然后跳过它。它变得更好：在http://en.wikipedia.org/wiki/Permutations#Generation_in_lexicographic_order给出的增加字典顺序的生成排列的算法甚至根本不生成重复项！

然而，这不是您问题的答案，因为它是一种不同的算法（虽然也基于交换）。

所以，让我们看一下你的算法。一个关键的观察是：

将字符交换到begin位置后，它将保留在permute的所有嵌套调用中。

我们将这与以下关于排列的一般观察结合起来：

如果您置换字符串s，但仅限于具有相同字符的位置，s将保持不变。实际上，对于某些字符c的出现，所有重复的排列都有不同的顺序，其中c出现在相同的位置。

好的，所以我们要做的就是确保每个角色的出现次数始终与开头的顺序相同。代码如下，但是......我真的不会说C ++，所以我会使用Python并希望能够声称它是伪代码。

我们从您的原始算法开始，用'伪代码'重写：

def permute(s, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            s[begin], s[i] = s[i], s[begin]
            permute(s, begin + 1, end)
            s[begin], s[i] = s[i], s[begin]

和一个帮助调用它的辅助函数：

def permutations_w_duplicates(s):
    permute(list(s), 0, len(s)) # use a list, as in Python strings are not mutable

现在我们通过一些簿记来扩展permute函数，关于某个角色被交换到begin位置的次数（即已经已修复），我们还记得原始的每个字符（char_number）出现的顺序。我们尝试交换到begin位置的每个字符必须是原始顺序中的下一个字符，即字符的修复数量定义了下一个可能修复此字符的原始出现 - 我称之为next_fixable。

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]: 
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]

                s[begin], s[i] = s[i], s[begin]
                permute2(s, next_fixable, char_number, begin + 1, end)
                s[begin], s[i] = s[i], s[begin]

                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

同样，我们使用辅助函数：

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    permute2(list(s), next_fixable, char_number, 0, len(s))

就是这样！

几乎。如果您愿意，可以在此处停止并使用C ++重写，但如果您对某些测试数据感兴趣，请继续阅读。

我使用了稍微不同的代码进行测试，因为我不想打印所有的排列。在Python中，您将print替换为yield，将函数转换为生成函数，其结果可以使用for循环迭代，并且仅在以下情况下计算排列需要。这是我使用的真实代码和测试：

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        yield "".join(s) # join the characters to form a string
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]:
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                s[begin], s[i] = s[i], s[begin]
                for p in permute2(s, next_fixable, char_number, begin + 1, end):
                    yield p
                s[begin], s[i] = s[i], s[begin]
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    for p in permute2(list(s), next_fixable, char_number, 0, len(s)):
        yield p


s = "FOOQUUXFOO"
A = list(permutations_w_duplicates(s))
print("%s has %s permutations (counting duplicates)" % (s, len(A)))
print("permutations of these that are unique: %s" % len(set(A)))
B = list(permutations_wo_duplicates(s))
print("%s has %s unique permutations (directly computed)" % (s, len(B)))

print("The first 10 permutations       :", A[:10])
print("The first 10 unique permutations:", B[:10])

结果：

FOOQUUXFOO has 3628800 permutations (counting duplicates)
permutations of these that are unique: 37800
FOOQUUXFOO has 37800 unique permutations (directly computed)
The first 10 permutations       : ['FOOQUUXFOO', 'FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUXOOF', 'FOOQUUXOFO', 'FOOQUUFXOO', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX']
The first 10 unique permutations: ['FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX', 'FOOQUUOFXO', 'FOOQUUOFOX', 'FOOQUUOXFO', 'FOOQUUOXOF']

请注意，排列的计算顺序与原始算法的顺序相同，只是没有重复。注意37800 * 2！ * 2！ * 4！ = 3628800，就像你期望的那样。

Answer 2

如果交换代码交换两个相同的字符，则可以添加if语句以防止交换代码执行。然后是for循环

for(int i = 0; i < range; ++i) {
    if(i==0 || set[begin] != set[begin+i]) {
      swap(&set[begin], &set[begin+i]);
      permute(set, begin+1, end);
      swap(&set[begin], &set[begin+i]);
    }
}

允许案例i==0的原因是确保递归调用恰好发生一次，即使集合中的所有字符都相同。

Answer 3

选项1

一种选择是在堆栈上使用256位存储来存储您在for循环中尝试过哪些字符的位掩码，并且仅针对新字符进行递归。

选项2

第二种选择是使用评论中建议的方法（http://n1b-algo.blogspot.com/2009/01/string-permutations.html）并将for循环更改为：

else {
    char last=0;
    for(int i = 0; i < range; ++i) {
        if (last==set[begin+i])
            continue;
        last = set[begin+i];
        swap(&set[begin], &set[begin+i]);
        permute(set, begin+1, end);
        swap(&set[begin], &set[begin+i]);
    }
}

但是，要使用这种方法，您还必须在函数入口处对字符集[begin]，set [begin + 1]，... set [end-1]进行排序。

请注意，每次调用函数时都必须进行排序。（博客文章似乎没有提到这一点，但否则你将为输入字符串“aabbc”生成太多结果。问题是在使用交换后字符串不会保持排序。）

这仍然不是很有效率。例如，对于包含1'a'和N'b的字符串，这种方法最终会调整N次排序，总体复杂度为N ^ 2logN

选项3

对于包含大量重复的长字符串，一种更有效的方法是维护字符串“set”和一个字典，列出你要使用的每种类型字符的数量。 for循环将变为dictonary键上的循环，因为这些将是该位置允许的唯一字符。

这将具有与输出字符串数量相等的复杂性，并且只有非常小的额外存储空间来容纳字典。

Answer 4

一个简单的解决方案是将重复的字符随机更改为尚不存在的字符。然后在排列后，将字符更改回来。只有在字符符合规定的情况下才接受排列。

e.g。如果你有“a，b，b”

你会得到以下内容：

a b b
a b b
b a b
b a b
b b a
b b a

但是，如果我们从a，b，b开始并注意重复的b，那么我们可以将第二个b更改为c

现在我们有一个b c

a b c - accept because b is before c. change c back to b to get a b b
a c b - reject because c is before b
b a c - accept as b a b
b c a - accept as b b a
c b a - reject as c comes before b.
c a b - reject as c comes before b.

Answer 5

只需将每个元素插入到集合中。它会自动删除重复项。将set声明为全局变量。

set <string>s;
void permute(string a, int l, int r) {
    int i;
    if (l == r)
        s.insert(a);
    else
    {
        for (i = l; i <= r; i++)
        {
            swap((a[l]), (a[i]));
            permute(a, l+1, r);
            swap((a[l]), (a[i])); //backtrack
        }
    }
}

最后使用功能打印

void printr()
{
    set <string> ::iterator itr;
    for (itr = s.begin(); itr != s.end(); ++itr)
    {
        cout << '\t' << *itr;
    }
    cout << '\t' << *itr;
}

Answer 6

关键是不要交换相同的字符两次。因此，您可以使用unordered_set来记忆已交换的字符。

void permute(string& input, int begin, vector<string>& output) {
    if (begin == input.size()){
        output.push_back(input);
    }
    else {    
        unordered_set<char> swapped;
        for(int i = begin; i < input.size(); i++) {
            // Do not swap a character that has been swapped
            if(swapped.find(input[i]) == swapped.end()){
                swapped.insert(input[i]);
                swap(input[begin], input[i]);
                permute(input, begin+1, output);
                swap(input[begin], input[i]);
            }
        }
    }
}

您可以手动浏览原始代码，并且您会发现重复发生的情况是“与已交换的角色进行交换。”

例如：输入=“BAA”

index = 0，i = 0，input =“BAA”

----＆GT; index = 1，i = 1，input =“BAA”

----＆GT; index = 1，i = 2，input =“BAA”（重复）

index = 0，i = 1，input =“ABA”

----＆GT; index = 1，i = 1，input =“ABA”

----＆GT; index = 1，i = 2，input =“AAB”

index = 0，i = 2，input =“AAB”

----＆GT; index = 1，i = 1，input =“AAB”（重复）

----＆GT; index = 1，i = 2，input =“ABA”（重复）

修改排列算法以防止重复打印输出的策略

6 个答案: