Question

首先，定义两个整数N和K，其中N >= K都在编译时已知。例如：N = 8和K = 3。

接下来，定义一组整数[0, N)（或[1, N]，如果这使答案更简单）并将其称为S。例如：{0, 1, 2, 3, 4, 5, 6, 7}

具有S元素的K子集的数量由公式C(N, K)给出。实施例

我的问题是：为这些子集创建一个完美的最小哈希值。示例哈希表的大小为C(8, 3)或56。

我不关心排序，只关注哈希表中有56个条目，并且我可以从一组K整数中快速确定哈希。我也不关心可逆性。

哈希示例：hash({5, 2, 3}) = 42。（42号并不重要，至少不在这里）

是否有适用于N和K的任何值的通用算法？我无法通过搜索谷歌或我自己的天真努力找到一个。

Answer 1

有一种算法可以将组合编码和解码为所有组合的字典顺序中的数字，并且给定的固定K。对于组合的代码和解码，该算法与N呈线性关系。你对哪种语言感兴趣？

编辑：这是c ++中的示例代码（它在n个元素的所有组合的序列中找到了组合的词典编号，而不是具有k元素的组合的词典编号，但是非常好的起点）：

typedef long long ll;

// Returns the number in the lexicographical order of all combinations of n numbers
// of the provided combination. 
ll code(vector<int> a,int n)
{
    sort(a.begin(),a.end());
    int cur = 0;
    int m = a.size();

    ll res =0;
    for(int i=0;i<a.size();i++)
    {
        if(a[i] == cur+1)
        {
            res++;
            cur = a[i];
            continue;
        }
        else
        {
            res++;
            int number_of_greater_nums = n - a[i];
            for(int j = a[i]-1,increment=1;j>cur;j--,increment++)
                res += 1LL << (number_of_greater_nums+increment);
            cur = a[i];
        }
    }
    return res;
}
// Takes the lexicographical code of a combination of n numbers and returns the 
// combination
vector<int> decode(ll kod, int n)
{
    vector<int> res;
    int cur = 0;

    int left = n; // Out of how many numbers are we left to choose.
    while(kod)
    {
        ll all = 1LL << left;// how many are the total combinations
        for(int i=n;i>=0;i--)
        {
            if(all - (1LL << (n-i+1)) +1 <= kod)
            {
                res.push_back(i);
                left = n-i;
                kod -= all - (1LL << (n-i+1)) +1;
                break;
            }
        }
    }
    return res;
}

对不起，我有一个针对你现在要求的问题的算法，但我相信尝试理解我上面做的是一个很好的练习。事实上，这是我在“算法的设计和分析”课程中教授的算法之一，这就是我预先写好的算法。

Answer 2

这就是你（和我）所需要的：

hash()将k-tuples的{{1}}映射到集[1..n]。努力是1..C(n,k)\subset N减法（并且k无论如何都是下限，参见Strandjev上面的评论）：

O(k)

数学组合的完美最小哈希

2 个答案: