Question

我有一个带有 Id 属性的 Foo 类。我的目标是同时没有 Foo 的两个实例具有相同的 Id 。

所以我创建了一个工厂方法 CreateFoo ，它使用缓存来为相同的 Id 返回相同的实例。

static Foo CreateFoo(int id) {
    Foo foo;
    if (!cache.TryGetValue(id, out foo)) {
        foo = new Foo(id);
        foo.Initialize(...);
        cache.Put(id, foo);
    }
    return foo;
}

缓存实现为Dictionary＆lt; TKey，WeakReference＆gt;，基于 @JaredPar 的Building a WeakReference Hashtable：

class WeakDictionary<TKey, TValue> where TValue : class {
    private readonly Dictionary<TKey, WeakReference> items;
    public WeakDictionary() {
        this.items = new Dictionary<TKey, WeakReference>();
    }
    public void Put(TKey key, TValue value) {
        this.items[key] = new WeakReference(value);
    }
    public bool TryGetValue(TKey key, out TValue value) {
        WeakReference weakRef;
        if (!this.items.TryGetValue(key, out weakRef)) {
            value = null;
            return false;
        } else {
            value = (TValue)weakRef.Target;
            return (value != null);
        }
    }
}

问题是WeakReferences在其目标被垃圾收集后仍保留在字典中。这意味着需要一些策略如何手动“垃圾收集”死WeakReferences，正如What happens to a WeakReference after GC of WeakReference.Target中 @Pascal Cuoq 所解释的那样。

我的问题是：压缩WeakReference词典的最佳策略是什么？

我看到的选项是：

不要从字典中删除WeakReferences。 IMO这很糟糕，因为缓存是在我的应用程序的整个生命周期中使用的，并且很多的死WeakReferences会随着时间累积。
在每个 Put 和 TryGetValue 上遍历整个字典，并删除死WeakReferences。这有点失败了字典的目的，因为这两个操作都变成 O（n）。
在后台线程中定期浏览整个字典。鉴于我不知道 CreateFoo 的使用模式，会有什么好的间隔？
将每个插入的KeyValuePair附加到双端链表。每次调用 Put 和 TryGetValue 都会检查列表的头部。如果WeakReference处于活动状态，请将该对移动到列表的末尾。如果它已死，请从列表中删除该对，并从字典中删除WeakReference。
实现一个自定义哈希表，其差别在于，当存储桶已满时，首先从存储桶中删除死WeakReferences，然后照常继续。

还有其他策略吗？

最佳策略可能是具有摊销时间复杂度的算法。这样的策略是否存在？

Answer 1

如果您可以将托管对象切换为字典的键，则可以使用.Net 4.0的ConditionalWeakTable（名称空间System.Runtime.CompilerServices）。

根据Richter先生的说法，垃圾收集器会通知ConditionalWeakTable，而不是使用轮询线程。

    static ConditionalWeakTable<TabItem, TIDExec> tidByTab = new ConditionalWeakTable<TabItem, TIDExec>();

    void Window_Loaded(object sender, RoutedEventArgs e)
    {
        ...
        dataGrid.SelectionChanged += (_sender, _e) =>
        {
            var cs = dataGrid.SelectedItem as ClientSession;

            this.tabControl.Items.Clear();

            foreach (var tid in cs.GetThreadIDs())
            {
                tid.tabItem = new TabItem() { Header = ... };
                tid.tabItem.AddHandler(UIElement.MouseDownEvent,
                    new MouseButtonEventHandler((__sender, __e) =>
                    {
                        tabControl_SelectionChanged(tid.tabItem);
                    }), true);
                tidByTab.Add(tid.tabItem, tid);
                this.tabControl.Items.Add(tid.tabItem);
            }
        };
    }

    void tabControl_SelectionChanged(TabItem tabItem)
    {
        this.tabControl.SelectedItem = tabItem;
        if (tidByTab.TryGetValue(tabControl.SelectedItem as TabItem, out tidExec))
        {
            tidExec.EnsureBlocksLoaded();
            ShowStmt(tidExec.CurrentStmt);
        }
        else
            throw new Exception("huh?");
    }

这里重要的是引用TabItem对象的唯一方法是tabControls.Items集合和ConditionalWeakTable的键。 ConditionalWeakTable的键不计算在内。因此，当我们清除tabControl中的所有项目时，那些TabItems可以被垃圾收集（因为没有任何时间再引用它们，同样ConditionalWeakTable的键也不计算）。收集garabage时，将通知ConditionalWeakTable，并删除具有该键值的条目。所以我的庞大的TIDExec对象也在那时被垃圾收集（没有任何东西引用它们，除了ConditionalWeakTable的值）。

Answer 2

您的选项3（线程）的缺点是必须在所有Put / TryGetvalue操作上进行同步。如果您使用此方法，则您的间隔不是以毫秒为单位，而是每个N TryGet操作。

选项2，扫描字典，会产生严重的开销。您可以通过仅扫描1000个操作中的1个和/或通过观察GC运行的频率来改进。

但我会认真考虑选项1：什么都不做。您可能有“很多”死信条件，但另一方面它们非常小（并且可以回收）。可能不是服务器应用程序的选项，但对于客户端应用程序，我会尝试测量我们所讨论的每小时的条目数（kByte）。

经过一番讨论后：

这样的[n摊销]策略存在？

我猜不会。您的问题是GC的缩小版。您将不得不偶尔扫描整个事物。因此，只有选项2）和3）提供了真正的解决方案。它们既昂贵又可以通过一些启发式方法进行（大量）优化。选项2）仍然会给你偶尔的最坏情况。

Answer 3

方法＃5很有意思，但缺点是很难知道哈希表利用率的实际水平是什么，因此何时应该扩展哈希表。如果每当“看起来”像哈希表应该扩展时，可以克服这个困难，首先进行全表扫描以删除死表项。如果表中超过一半的条目已经死亡，请不要费心扩展它。这样的方法应该产生摊销的O（1）行为，因为一个人不会进行全表扫描，直到有人将已删除的条目添加回去。

一种更简单的方法，它也会产生O（1）摊销时间和每个最近生活元素的O（1）空间，用于保持在最后一次清除表之后有多少项存活，并且从那以后添加了多少元素。只要后者计数超过第一个，就进行全表扫描和清除。扫描和清除所需的时间将与清除之间添加的元素数量成比例，从而保留摊销的O（1）时间，并且集合中的总元素数量不会超过最近观察到的元素数量的两倍为了活着，所以死元素的数量不能超过最近生活元素的数量的两倍。

Answer 4

我遇到了同样的问题，并解决了这个问题（WeakDictionary是我试图清理的类）：

internal class CleanerRef
{
    ~CleanerRef()
    {
        if (handle.IsAllocated)
            handle.Free();
    }

    public CleanerRef(WeakDictionaryCleaner cleaner, WeakDictionary dictionary)
    {
        handle = GCHandle.Alloc(cleaner, GCHandleType.WeakTrackResurrection);
        Dictionary = dictionary;
    }

    public bool IsAlive
    {
        get {return handle.IsAllocated && handle.Target != null;}
    }

    public object Target
    {
        get {return IsAlive ? handle.Target : null;}
    }

    GCHandle handle;
    public WeakDictionary Dictionary;
}


internal class WeakDictionaryCleaner
{
    public WeakDictionaryCleaner(WeakDictionary dict)
    {
        refs.Add(new CleanerRef(this, dict));
    }

    ~WeakDictionaryCleaner()
    {
        foreach(var cleanerRef in refs)
        {
            if (cleanerRef.Target == this)
            {
                cleanerRef.Dictionary.ClearGcedEntries();
                refs.Remove(cleanerRef);
                break;
            }
        }
    }
    private static readonly List<CleanerRef> refs = new List<CleanerRef>();
}

这两个类试图实现的是“挂钩”GC。您可以通过在构建弱集合期间创建WeakDictionaryCleaner实例来激活此机制：

new WeakDictionaryCleaner(weakDictionary);

请注意，我没有创建对新实例的任何引用，因此GC将在下一个周期中处理它。在ClearGcedEntries（）方法中，我再次创建一个新实例，以便每个GC循环都有一个清理器来完成，然后执行收集压缩。你可以使CleanerRef.Dictionary也是一个弱引用，这样它就不会把字典保存在内存中。

希望这有帮助

Answer 5

我想这是一个正确的地方，即使它可能看起来像死灵法术。以防有人像我一样偶然发现这个问题。在.net中缺少专用的身份地图有点令人惊讶，我觉得最自然的工作方式如上一个选项所述：当桌子已满并且容量增加一倍时，它会检查是否有足够的死亡条目可以回收再利用，因此不需要增长。

static IdentityMap<int, Entity> Cache = new IdentityMap<int, Entity>(e => e.ID);
...
var entity = Cache.Get(id, () => LoadEntity(id));

该类只公开一个带有Get的公共方法key和可选的value参数，如果实体不在缓存中，则会懒惰地加载和缓存该实体。

using System;
class IdentityMap<TKey, TValue>
    where TKey : IEquatable<TKey>
    where TValue : class
{
    Func<TValue, TKey> key_selector;
    WeakReference<TValue>[] references;
    int[] buckets;
    int[] bucket_indexes;
    int tail_index;
    int entries_count;
    int capacity;

    public IdentityMap(Func<TValue, TKey> key_selector, int capacity = 10) {
        this.key_selector = key_selector;
        Init(capacity);
    }
    void Init(int capacity) {
        this.bucket_indexes = new int[capacity];
        this.buckets = new int[capacity];
        this.references = new WeakReference<TValue>[capacity];
        for (int i = 0; i < capacity; i++) {
            bucket_indexes[i] = -1;
            buckets[i] = i - 1;
        }
        this.tail_index = capacity - 1;
        this.entries_count = 0;
        this.capacity = capacity;
    }

    public TValue Get(TKey key, Func<TValue> value = null) {
        int bucket_index = Math.Abs(key.GetHashCode() % this.capacity);
        var ret = WalkBucket(bucket_index, true, key);
        if (ret == null && value != null) Add(bucket_index, ret = value());
        return ret;
    }

    void Add(int bucket_index, TValue value) {
        if (this.entries_count == this.capacity) {
            for (int i = 0; i < capacity; i++) WalkBucket(i, false, default(TKey));
            if (this.entries_count * 2 > this.capacity) {
                var old_references = references;
                Init(this.capacity * 2);
                foreach (var old_reference in old_references) {
                    TValue old_value;
                    if (old_reference.TryGetTarget(out old_value)) {
                        int hash = key_selector(value).GetHashCode();
                        Add(Math.Abs(hash % this.capacity), old_value);
                    }
                }
            }
        }
        int new_index = this.tail_index;
        this.tail_index = buckets[this.tail_index];
        this.entries_count += 1;
        buckets[new_index] = bucket_indexes[bucket_index];
        if (references[new_index] != null) references[new_index].SetTarget(value);
        else references[new_index] = new WeakReference<TValue>(value);
        bucket_indexes[bucket_index] = new_index;
    }

    TValue WalkBucket(int bucket_index, bool is_searching, TKey key) {
        int curr_index = bucket_indexes[bucket_index];
        int prev_index = -1;
        while (curr_index != -1) {
            TValue value;
            int next_index = buckets[curr_index];
            if (references[curr_index].TryGetTarget(out value)) {
                if (is_searching && key_selector(value).Equals(key)) return value;
                prev_index = curr_index;
            } else {
                if (prev_index != -1) buckets[prev_index] = next_index;
                else bucket_indexes[bucket_index] = next_index;

                buckets[curr_index] = this.tail_index;
                this.tail_index = curr_index;
                this.entries_count -= 1;
            }
            curr_index = next_index;
        }
        return null;
    }
}

Answer 6

您可以删除WeakReference中的“无效”TryGetValue：

[编辑] 我的错误，这些解决方案实际上没有做任何建议，因为Put方法无论如何都会将旧对象与新对象交换。只是忽略它。

public bool TryGetValue(TKey key, out TValue value) {
    WeakReference weakRef;
    if (!this.items.TryGetValue(key, out weakRef)) {
        value = null;
        return false;
    } else {
        value = (TValue)weakRef.Target;
        if (value == null)
            this.items.Remove(key);
        return (value != null);
    }
}

或者，您可以在需要时立即在字典中创建新实例：

public TValue GetOrCreate(TKey key, Func<Tkey, TValue> ctor) {

    WeakReference weakRef;
    if (!this.items.TryGetValue(key, out weakRef) {
        Tvalue result = ctor(key);
        this.Put(key, result);
        return result;
    } 

    value = (TValue)weakRef.Target;
    if (value == null)
    {
        Tvalue result = ctor(key);
        this.Put(key, result);
        return result;
    }

    return value;
}

然后你会像这样使用它：

static Foo CreateFoo(int id)
{
    return cache.GetOrCreate(id, id => new Foo(id));
}

[编辑]

根据windbg，WeakReference实例仅占用16个字节。对于100,000个收集的对象，这不会是一个如此严重的负担，因此您可以轻松地让它们生存。

如果这是一个服务器应用程序并且你相信你可以从收集中获益，我会考虑去一个后台线程，但是当你收集一个相对较小的时候，我会考虑一个简单的算法来增加等待时间对象数量。

Answer 7

一些专业知识：当目标类知道弱字典引用及其TKey值时，您可以从finalyzer调用中删除其条目。

public class Entry<TKey>
{
    TKey key;
    Dictionary<TKey, WeakReference> weakDictionary;

    public Entry(Dictionary<TKey, WeakReference> weakDictionary, TKey key)
    {
        this.key = key;
        this.weakDictionary = weakDictionary;
    }

    ~Entry()
    {
        weakDictionary.Remove(key);
    }
}

当缓存的对象是Entry<TKey>的子类时，没有空的WeakReference泄漏因为finalyzer是在实例被垃圾回收后调用的。

压缩WeakReference词典

7 个答案:

经过一番讨论后：