具有独特元素的数据结构,快速添加和删除

时间:2011-08-31 18:36:14

标签: c# algorithm data-structures

我需要一个具有以下属性的数据结构:

  • 结构的每个元素都必须是唯一的。
  • 添加:向数据结构添加一个元素,除非该元素已经存在 存在。
  • Pop:从数据结构中删除一个元素并返回该元素 除去。删除哪个元素并不重要。

此结构无需其他操作。具有列表的简单实现将需要几乎O(1)时间用于Pop和O(N)时间用于添加(因为必须检查整个列表以确保 独特性)。我目前正在使用红黑树来满足这种数据结构的需求,但我想知道我是否可以使用不那么复杂的东西来实现几乎相同的性能。

我更喜欢C#中的答案,但Java,Javascript和C ++也是可以接受的。

我的问题类似于this question,但我无需查找或删除最大值或最小值(或实际上任何特定类型的值),因此我希望在这方面会有所改进。但是,如果该问题中的任何结构适用于此,请告诉我。

那么,什么数据结构只允许使用独特的元素,支持快速添加和删除,并且比红黑树更简单?

3 个答案:

答案 0 :(得分:12)

内置HashSet<T>怎么样?

它只包含唯一元素。除非必须调整内部数组的大小,否则Remove(pop)为O(1)且Add为O(1)。

答案 1 :(得分:5)

正如Meta-Knight所说,HashSet是最快的数据结构。查找和删除需要持续O(1)时间(除非极少数情况下,您的哈希很糟糕,然后您需要多次重新哈希或使用桶哈希集)。对散列集的所有操作都需要O(1)时间,唯一的缺点是它需要更多内存,因为散列用作数组(或其他已分配的内存块)的索引。因此,除非你对内存非常严格,否则请使用HashSet。我只是解释你应该采用这种方法的原因,你应该接受Meta-Knights的答案,因为他是第一个。

使用散列是正常的,因为通常会覆盖HashCode()和Equals()函数。 HashSet在内部执行的操作是生成哈希,然后如果它相等则检查相等性(仅在哈希冲突的情况下)。如果它们不是,它必须调用一个方法来执行一个叫做rehashing的东西,它会产生一个新的哈希值,它通常与原始哈希值有一个奇数的素数偏移量(不确定.NET是否这样做,但其他语言是这样做的)并在必要时重复该过程

答案 2 :(得分:3)

从哈希集或字典中删除随机元素非常容易。 一切都是平均O(1),在现实世界中意味着O(1)。 例如:

public class MyNode
{
    ...
}

public class MyDataStructure
{
    private HashSet<MyNode> nodes = new HashSet<MyNode>();

    /// <summary>
    /// Inserts an element to this data structure. 
    /// If the element already exists, returns false.
    /// Complexity is averaged O(1).
    /// </summary>
    public bool Add(MyNode node)
    {
        return node != null && this.nodes.Add(node);
    }

    /// <summary>
    /// Removes a random element from the data structure.
    /// Returns the element if an element was found.
    /// Returns null if the data structure is empty.
    /// Complexity is averaged O(1).
    /// </summary>
    public MyNode Pop()
    {
        // This loop can execute 1 or 0 times.
        foreach (MyNode node in nodes)
        {
            this.nodes.Remove(node);
            return node;
        }
        return null;
    }
}

根据我的经验,几乎所有可以比较的东西也可以被哈希:)。 我想知道是否有人知道一些无法进行散列的事情。

根据我的经验,这也适用于一些使用特殊技术进行宽容度的浮点比较。

哈希表的哈希函数不需要是完美的,它只需要足够好。 此外,如果您的数据非常复杂,通常哈希函数不如红黑树或avl树复杂。 它们很有用,因为它们可以保持秩序,但你不需要这样做。

为了展示如何做一个简单的hashset,我将考虑一个带整数键的简单字典。 对于示例,这种实现非常快并且对于稀疏数组非常好。 我没有编写代码来增加存储桶表,因为它很烦人并且通常是大错误的来源,但由于这是一个概念证明,它应该就足够了。 我也没有写过迭代器。

我是从头开始写的,可能有错误。

public class FixedIntDictionary<T>
{
    // Our internal node structure.
    // We use structs instead of objects to not add pressure to the garbage collector.
    // We mantains our own way to manage garbage through the use of a free list.
    private struct Entry
    {
        // The key of the node
        internal int Key;

        // Next index in pEntries array.
        // This field is both used in the free list, if node was removed
        // or in the table, if node was inserted.
        // -1 means null.
        internal int Next;

        // The value of the node.
        internal T Value;
    }

    // The actual hash table. Contains indices to pEntries array.
    // The hash table can be seen as an array of singlt linked list.
    // We store indices to pEntries array instead of objects for performance
    // and to avoid pressure to the garbage collector.
    // An index -1 means null.
    private int[] pBuckets;

    // This array contains the memory for the nodes of the dictionary.
    private Entry[] pEntries;

    // This is the first node of a singly linked list of free nodes.
    // This data structure is called the FreeList and we use it to
    // reuse removed nodes instead of allocating new ones.
    private int pFirstFreeEntry;

    // Contains simply the number of items in this dictionary.
    private int pCount;

    // Contains the number of used entries (both in the dictionary or in the free list) in pEntries array.
    // This field is going only to grow with insertions.
    private int pEntriesCount;

    ///<summary>
    /// Creates a new FixedIntDictionary. 
    /// tableBucketsCount should be a prime number
    /// greater than the number of items that this
    /// dictionary should store.
    /// The performance of this hash table will be very bad
    /// if you don't follow this rule!
    /// </summary>
    public FixedIntDictionary<T>(int tableBucketsCount)
    {
        // Our free list is initially empty.
        this.pFirstFreeEntry = -1;

        // Initializes the entries array with a minimal amount of items.
        this.pEntries = new Entry[8];

        // Allocate buckets and initialize every linked list as empty.
        int[] buckets = new int[capacity];
        for (int i = 0; i < buckets.Length; ++i)
            buckets[i] = -1;

        this.pBuckets = buckets;
    }

    ///<summary>Gets the number of items in this dictionary. Complexity is O(1).</summary>
    public int Count
    {
        get { return this.pCount; }
    }

    ///<summary>
    /// Adds a key value pair to the dictionary.
    /// Complexity is averaged O(1).
    /// Returns false if the key already exists.
    /// </summary>
    public bool Add(int key, T value)
    {
        // The hash table can be seen as an array of linked list.
        // We find the right linked list using hash codes.
        // Since the hash code of an integer is the integer itself, we have a perfect hash.

        // After we get the hash code we need to remove the sign of it.
        // To do that in a fast way we and it with 0x7FFFFFFF, that means, we remove the sign bit.
        // Then we have to do the modulus of the found hash code with the size of our buckets array.

        // For this reason the size of our bucket array should be a prime number,
        // this because the more big is the prime number, the less is the chance to find an
        // hash code that is divisible for that number. This reduces collisions.

        // This implementation will not grow the buckets table when needed, this is the major
        // problem with this implementation.
        // Growing requires a little more code that i don't want to write now
        // (we need a function that finds prime numbers, and it should be fast and we
        // need to rehash everything using the new buckets array).

        int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length;
        int bucket = this.pBuckets[bucketIndex];

        // Now we iterate in the linked list of nodes.
        // Since this is an hash table we hope these lists are very small.
        // If the number of buckets is good and the hash function is good this will translate usually 
        // in a O(1) operation.

        Entry[] entries = this.pEntries;
        for (int current = entries[bucket]; current != -1; current = entries[current].Next)
        {
            if (entries[current].Key == key)
            {
                // Entry already exists.
                return false;
            }
        }

        // Ok, key not found, we can add the new key and value pair.

        int entry = this.pFirstFreeEntry;
        if (entry != -1)
        {
            // We found a deleted node in the free list.
            // We can use that node without "allocating" another one.
            this.pFirstFreeEntry = entries[entry].Next;
        }
        else
        {
            // Mhhh ok, the free list is empty, we need to allocate a new node.
            // First we try to use an unused node from the array.
            entry = this.pEntriesCount++;
            if (entry >= this.pEntries)
            {
                // Mhhh ok, the entries array is full, we need to make it bigger.
                // Here should go also the code for growing the bucket table, but i'm not writing it here.
                Array.Resize(ref this.pEntries, this.pEntriesCount * 2);
                entries = this.pEntries;
            }
        }

        // Ok now we can add our item.
        // We just overwrite key and value in the struct stored in entries array.

        entries[entry].Key = key;
        entries[entry].Value = value;

        // Now we add the entry in the right linked list of the table.

        entries[entry].Next = this.pBuckets[bucketIndex];
        this.pBuckets[bucketIndex] = entry;

        // Increments total number of items.
        ++this.pCount;

        return true;
    }

    /// <summary>
    /// Gets a value that indicates wether the specified key exists or not in this table.
    /// Complexity is averaged O(1).
    /// </summary>
    public bool Contains(int key)
    {
        // This translate in a simple linear search in the linked list for the right bucket.
        // The operation, if array size is well balanced and hash function is good, will be almost O(1).

        int bucket = this.pBuckets[(key & 0x7FFFFFFF) % this.pBuckets.Length];
        Entry[] entries = this.pEntries;
        for (int current = entries[bucket]; current != -1; current = entries[current].Next)
        {
            if (entries[current].Key == key)
            {
                return true;
            }
        }
        return false;
    }

    /// <summary>
    /// Removes the specified item from the dictionary.
    /// Returns true if item was found and removed, false if item doesn't exists.
    /// Complexity is averaged O(1).
    /// </summary>
    public bool Remove(int key)
    {
        // Removal translate in a simple contains and removal from a singly linked list.
        // Quite simple.

        int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length;
        int bucket = this.pBuckets[bucketIndex];
        Entry[] entries = this.pEntries;
        int next;
        int prev = -1;
        int current = entries[bucket];

        while (current != -1)
        {
            next = entries[current].Next;

            if (entries[current].Key == key)
            {
                // Found! Remove from linked list.
                if (prev != -1)
                    entries[prev].Next = next;
                else
                    this.pBuckets[bucketIndex] = next;

                // We now add the removed node to the free list,
                // so we can use it later if we add new elements.
                entries[current].Next = this.pFirstFreeEntry;
                this.pFirstFreeEntry = current;

                // Decrements total number of items.
                --this.pCount;

                return true;
            }

            prev = current;
            current = next;
        }
        return false;
    }

}

如果你喜欢这个实现是否好,那么.NET框架对Dictionary类的做法是非常类似的:)

要使其成为一个哈希集,只需删除T并且您有一个整数的哈希集。 如果需要获取通用对象的哈希码,只需使用x.GetHashCode或提供哈希码函数。

要编写迭代器,您需要修改几件事,但不要在这篇文章中添加太多其他内容:)