有没有人创建过“默认地图”数据结构,或者有什么想法?

时间:2009-01-06 15:47:04

标签: c# .net algorithm

我有一些配置数据,我想在代码中建模:

Key1,  Key2,  Key3,  Value
null,  null,  null,  1
1,     null,  null,  2
9,     null,  null,  21
1,     null,  3,     3
null,  2,     3,     4
1,     2,     3,     5

使用此配置集,我需要在bazillion(给予或接受){Key1,Key2,Key3}元组上进行查找以获得“有效”值。使用的有效值基于密钥/优先级总和,在此示例中为:

Key1 - Priority 10
Key2 - Priority 7
Key3 - Priority 5

因此,Key1 = null,Key2 = match和Key3 = match的配置条目的特定查询击败了Key1 = match,Key2 = null,Key3 = null,因为Key2 + Key3优先级> Key1优先级......这有意义吗?!

given a key of {1, 2, 3} the value should be 5.
given a key of {3, 2, 3} the value should be 4.
given a key of {8, 10, 11} the value should be 1.
given a key of {1, 10, 11} the value should be 2.
given a key of {9, 2, 3} the value should be 4.
given a key of {8, 2, 3} the value should be 4.
given a key of {9, 3, 3} the value should be 21.

是否有一种简单的方法来建模这种通用的数据结构和查找算法,因为#和键的类型是可变的,并且“真值表”(查找的顺序)可以动态定义?作为泛型而不是整数的类型将是完美的(浮点数,双精度数,ushorts等),并且很容易扩展到n个键也很重要!

估计“配置”表格大小:1,000行,查询估计的“数据”:1e14

这提供了关于预期的性能类型的想法。

我正在寻找C#中的想法或者可以轻松转换为C#的东西。

4 个答案:

答案 0 :(得分:3)

编辑:这段代码显然不是必需的,但我还是留下它,因为它很有意思。它基本上将Key1视为优先级,然后是Key2,然后是Key3等。我真的不了解预期的优先级系统是,但是当我这样做时,我会为此添加一个答案。

我建议使用三层字典 - 每层都有:

Dictionary<int, NextLevel> matches;
NextLevel nonMatch;

所以在第一级你会查找Key1 - 如果匹配,那就会给你下一级查询。否则,请使用与“不匹配”对应的下一个级别。

这有什么意义吗?这是一些示例代码(包括您给出的示例)。我对实际实现并不完全满意,但我认为数据结构背后的想法是合理的:

using System;
using System.Collections;
using System.Collections.Generic;

public class Test
{
    static void Main()
    {
        Config config = new Config
        {
            { null,  null,  null,  1 },
            { 1,     null,  null,  2 },
            { 1,     null,  3,     3 },
            { null,  2,     3,     4 },
            { 1,     2,     3,     5 }
        };

        Console.WriteLine(config[1, 2, 3]);
        Console.WriteLine(config[3, 2, 3]);
        Console.WriteLine(config[9, 10, 11]);
        Console.WriteLine(config[1, 10, 11]);
    }
}

// Only implement IEnumerable to allow the collection initializer
// Not really implemented yet - consider how you might want to implement :)
public class Config : IEnumerable
{
    // Aargh - death by generics :)
    private readonly DefaultingMap<int, 
                         DefaultingMap<int, DefaultingMap<int, int>>> map
        = new DefaultingMap<int, DefaultingMap<int, DefaultingMap<int, int>>>();

    public int this[int key1, int key2, int key3]
    {
        get
        {
            return map[key1][key2][key3];
        }
    }

    public void Add(int? key1, int? key2, int? key3, int value)
    {
        map.GetOrAddNew(key1).GetOrAddNew(key2)[key3] = value;
    }

    public IEnumerator GetEnumerator()
    {
        throw new NotSupportedException();
    }
}

internal class DefaultingMap<TKey, TValue>
    where TKey : struct 
    where TValue : new()
{
    private readonly Dictionary<TKey, TValue> mapped = new Dictionary<TKey, TValue>();
    private TValue unmapped = new TValue();

    public TValue GetOrAddNew(TKey? key)
    {
        if (key == null)
        {
            return unmapped;
        }
        TValue ret;
        if (mapped.TryGetValue(key.Value, out ret))
        {
            return ret;
        }
        ret = new TValue();
        mapped[key.Value] = ret;
        return ret;
    }

    public TValue this[TKey key]
    {
        get
        {
            TValue ret;
            if (mapped.TryGetValue(key, out ret))
            {
                return ret;
            }
            return unmapped;
        }
    }

    public TValue this[TKey? key]
    {
        set
        {
            if (key != null)
            {
                mapped[key.Value] = value;
            }
            else
            {
                unmapped = value;
            }
        }
    }
}

答案 1 :(得分:3)

要回答关于密钥数量和类型通用内容的问题 - 您无法使密钥的数量和类型动态使用泛型 - 泛型都是关于提供编译时信息。当然,您可以使用忽略静态类型并使其动态化 - 让我知道您是否希望我实现它。

会有多少条目,您需要多久查看一次?你可能最好只保留所有条目作为一个列表并迭代它们给每个匹配一个“得分”(并保持最佳匹配及其得分)。这是一个实现,包括你的测试数据 - 但这会使用具有优先级的密钥(然后对匹配进行求和),按照之前的评论......

using System;
using System.Collections;
using System.Collections.Generic;

public class Test
{
    static void Main()
    {
        Config config = new Config(10, 7, 5)
        {
            { new int?[]{null,  null,  null},  1},
            { new int?[]{1,     null,  null},  2},
            { new int?[]{9,     null,  null},  21},
            { new int?[]{1,     null,  3},     3 },
            { new int?[]{null,  2,     3},     4 },
            { new int?[]{1,     2,     3},     5 }
        };

        Console.WriteLine(config[1, 2, 3]);
        Console.WriteLine(config[3, 2, 3]);
        Console.WriteLine(config[8, 10, 11]);
        Console.WriteLine(config[1, 10, 11]);
        Console.WriteLine(config[9, 2, 3]);
        Console.WriteLine(config[9, 3, 3]);
    }
}

public class Config : IEnumerable
{
    private readonly int[] priorities;
    private readonly List<KeyValuePair<int?[],int>> entries = 
        new List<KeyValuePair<int?[], int>>();

    public Config(params int[] priorities)
    {
        // In production code, copy the array to prevent tampering
        this.priorities = priorities;
    }

    public int this[params int[] keys]
    {
        get
        {
            if (keys.Length != priorities.Length)
            {
                throw new ArgumentException("Invalid entry - wrong number of keys");
            }
            int bestValue = 0;
            int bestScore = -1;
            foreach (KeyValuePair<int?[], int> pair in entries)
            {
                int?[] key = pair.Key;
                int score = 0;
                for (int i=0; i < priorities.Length; i++)
                {
                    if (key[i]==null)
                    {
                        continue;
                    }
                    if (key[i].Value == keys[i])
                    {
                        score += priorities[i];
                    }
                    else
                    {
                        score = -1;
                        break;
                    }
                }
                if (score > bestScore)
                {
                    bestScore = score;
                    bestValue = pair.Value;
                }
            }
            return bestValue;
        }
    }

    public void Add(int?[] keys, int value)
    {
        if (keys.Length != priorities.Length)
        {
            throw new ArgumentException("Invalid entry - wrong number of keys");
        }
        // Again, copy the array in production code
        entries.Add(new KeyValuePair<int?[],int>(keys, value));
    }

    public IEnumerator GetEnumerator()
    {
        throw new NotSupportedException();
    }
}

以上允许可变数量的键,但只允许使用int(或null)。说实话,如果你修改键的数量,API会更容易使用......

答案 2 :(得分:1)

又一个解决方案 - 假设条目是null /非null的位模式。每个位模式有一个字典(即{1,null,null}和{9,null,null}在同一个字典中,但{1,2,3}在不同的字典中。每个字典有效地得分同样 - 密钥的非空部分的优先级总和。你最终将得到2 ^ n个字典,其中n是密钥中元素的数量。

您以反向分数顺序对字典进行排序,然后只需在每个字典中查找给定的密钥。每个字典都需要忽略键中不在其位模式中的值,这可以通过自定义IComparer<int[]>轻松完成。

好的,这是实施:

------------ Test.cs -----------------
using System;

sealed class Test
{
    static void Main()
    {
        Config config = new Config(10, 7, 5)
        {
            { null, null, null, 1 },
            {null,  null,  null,  1},
            {1,     null,  null,  2},
            {9,     null,  null,  21},
            {1,     null,  3,     3 },
            {null,  2,     3,     4 },
            {1,     2,     3,     5 }
        };

        Console.WriteLine(config[1, 2, 3]);
        Console.WriteLine(config[3, 2, 3]);
        Console.WriteLine(config[8, 10, 11]);
        Console.WriteLine(config[1, 10, 11]);
        Console.WriteLine(config[9, 2, 3]);
        Console.WriteLine(config[9, 3, 3]);
    }
}

--------------- Config.cs -------------------
using System;
using System.Collections;

sealed class Config : IEnumerable
{
    private readonly PartialMatchDictionary<int, int> dictionary;

    public Config(int priority1, int priority2, int priority3)
    {
        dictionary = new PartialMatchDictionary<int, int>(priority1, priority2, priority3);
    }

    public void Add(int? key1, int? key2, int? key3, int value)
    {
        dictionary[new[] { key1, key2, key3 }] = value;
    }

    public int this[int key1, int key2, int key3]
    {
        get
        {
            return dictionary[new[] { key1, key2, key3 }];
        }
    }

    // Just a fake implementation to allow the collection initializer
    public IEnumerator GetEnumerator()
    {
        throw new NotSupportedException();
    }
}

-------------- PartialMatchDictionary.cs -------------------
using System;
using System.Collections.Generic;
using System.Linq;

public sealed class PartialMatchDictionary<TKey, TValue> where TKey : struct
{
    private readonly List<Dictionary<TKey[], TValue>> dictionaries;
    private readonly int keyComponentCount;

    public PartialMatchDictionary(params int[] priorities)
    {
        keyComponentCount = priorities.Length;
        dictionaries = new List<Dictionary<TKey[], TValue>>(1 << keyComponentCount);
        for (int i = 0; i < 1 << keyComponentCount; i++)
        {
            PartialComparer comparer = new PartialComparer(keyComponentCount, i);
            dictionaries.Add(new Dictionary<TKey[], TValue>(comparer));
        }
        dictionaries = dictionaries.OrderByDescending(dict => ((PartialComparer)dict.Comparer).Score(priorities))
                                   .ToList();
    }

    public TValue this[TKey[] key]
    {
        get
        {
            if (key.Length != keyComponentCount)
            {
                throw new ArgumentException("Invalid key component count");
            }
            foreach (Dictionary<TKey[], TValue> dictionary in dictionaries)
            {
                TValue value;
                if (dictionary.TryGetValue(key, out value))
                {
                    return value;
                }
            }
            throw new KeyNotFoundException("No match for this key");
        }
    }

    public TValue this[TKey?[] key]
    {
        set
        {
            if (key.Length != keyComponentCount)
            {
                throw new ArgumentException("Invalid key component count");
            }
            // This could be optimised (a dictionary of dictionaries), but there
            // won't be many additions to the dictionary compared with accesses
            foreach (Dictionary<TKey[], TValue> dictionary in dictionaries)
            {
                PartialComparer comparer = (PartialComparer)dictionary.Comparer;
                if (comparer.IsValidForPartialKey(key))
                {
                    TKey[] maskedKey = key.Select(x => x ?? default(TKey)).ToArray();
                    dictionary[maskedKey] = value;
                    return;
                }
            }
            throw new InvalidOperationException("We should never get here");
        }
    }

    private sealed class PartialComparer : IEqualityComparer<TKey[]>
    {
        private readonly int keyComponentCount;
        private readonly bool[] usedKeyComponents;
        private static readonly EqualityComparer<TKey> Comparer = EqualityComparer<TKey>.Default;

        internal PartialComparer(int keyComponentCount, int usedComponentBits)
        {
            this.keyComponentCount = keyComponentCount;
            usedKeyComponents = new bool[keyComponentCount];
            for (int i = 0; i < keyComponentCount; i++)
            {
                usedKeyComponents[i] = ((usedComponentBits & (1 << i)) != 0);
            }
        }

        internal int Score(int[] priorities)
        {
            return priorities.Where((value, index) => usedKeyComponents[index]).Sum();
        }

        internal bool IsValidForPartialKey(TKey?[] key)
        {
            for (int i = 0; i < keyComponentCount; i++)
            {
                if ((key[i] != null) != usedKeyComponents[i])
                {
                    return false;
                }
            }
            return true;
        }

        public bool Equals(TKey[] x, TKey[] y)
        {
            for (int i = 0; i < keyComponentCount; i++)
            {
                if (!usedKeyComponents[i])
                {
                    continue;
                }
                if (!Comparer.Equals(x[i], y[i]))
                {
                    return false;
                }
            }
            return true;
        }

        public int GetHashCode(TKey[] obj)
        {
            int hash = 23;
            for (int i = 0; i < keyComponentCount; i++)
            {
                if (!usedKeyComponents[i])
                {
                    continue;
                }
                hash = hash * 37 + Comparer.GetHashCode(obj[i]);
            }
            return hash;
        }
    }
}

它为您提供的样品提供了正确的结果。我不知道性能是什么 - 它应该是O(1),但它可能会进一步优化。

答案 3 :(得分:1)

我假设规则很少,而且您要根据规则检查大量项目。在这种情况下,预先计算一个可以帮助您更快找到对象的结构可能值得花费内存和前期时间。

这个结构的基本思想是树,这样在深度为i时,你将遵循规则的第i个元素,如果在字典中找不到它,则为null分支。

要构建树,我会以递归方式构建它。从包含其池中所有可能规则的根节点开始。过程:

  • 将池中每个规则的当前值定义为给定到达节点所用路径的当前规则的分数,如果无法获取路径,则定义为-infinity。例如,如果当前节点位于根的“1”分支,则规则{null,null,null,1}的得分为0,并且规则{1,null,null,2}将得分10
  • 将池中每个规则的最大值定义为其当前分数,以及剩余密钥的分数。例如,如果当前节点位于根的“1”分支,那么规则{null,1,2,1}的得分为12(0 + 7 + 5),而规则{1, null,null,2}得分10(10 + 0 + 0)。
  • 从池中删除最大值低于池中最高当前值的元素
  • 如果只有一条规则,请使用规则制作一个叶子。
  • 如果池中还有多个规则,并且没有其他键,那么??? (问题描述中不清楚这一点。我假设选择最高的一个)
  • 对于当前池中第(i + 1)个密钥的每个唯一值,并且为null,使用当前池从当前节点构造一个新树。

作为最终的优化检查,我会检查节点的所有子节点是否都是叶子,如果它们都包含相同的规则,那么使该节点成为具有该值的叶子。

给出以下规则:

null,  null,  null = 1
1,     null,  null = 2
9,     null,  null = 21
1,     null,  3    = 3
null,  2,     3    = 4
1,     2,     3    = 5

示例树:

       key1   key2   key3
root:
 |----- 1
 |      |----- 2          = 5
 |      |-----null
 |             |----- 3   = 3
 |             |-----null = 2
 |----- 9
 |      |----- 2
 |      |      |----- 3   = 4
 |      |      |-----null = 21
 |      |-----null        = 21
 |-----null
        |----- 2          = 4
        |-----null        = 1

如果以这种方式构建树,首先从最高值键开始,然后你可以修复对后来键的大量检查。

编辑以添加代码:

class Program
{
    static void Main(string[] args)
    {
        Config config = new Config(10, 7, 5)
        {
            { new int?[]{null,  null,  null},  1},
            { new int?[]{1,     null,  null},  2},
            { new int?[]{9,     null,  null},  21},
            { new int?[]{1,     null,  3},     3 },
            { new int?[]{null,  2,     3},     4 },
            { new int?[]{1,     2,     3},     5 }
        };

        Console.WriteLine("5 == {0}", config[1, 2, 3]);
        Console.WriteLine("4 == {0}", config[3, 2, 3]);
        Console.WriteLine("1 == {0}", config[8, 10, 11]);
        Console.WriteLine("2 == {0}", config[1, 10, 11]);
        Console.WriteLine("4 == {0}", config[9, 2, 3]);
        Console.WriteLine("21 == {0}", config[9, 3, 3]);            
        Console.ReadKey();
    }
}


public class Config : IEnumerable
{
    private readonly int[] priorities;
    private readonly List<KeyValuePair<int?[], int>> rules =
        new List<KeyValuePair<int?[], int>>();
    private DefaultMapNode rootNode = null;

    public Config(params int[] priorities)
    {
        // In production code, copy the array to prevent tampering
        this.priorities = priorities;
    }

    public int this[params int[] keys]
    {
        get
        {
            if (keys.Length != priorities.Length)
            {
                throw new ArgumentException("Invalid entry - wrong number of keys");
            }

            if (rootNode == null)
            {
                rootNode = BuildTree();
                //rootNode.PrintTree(0);
            }

            DefaultMapNode curNode = rootNode;
            for (int i = 0; i < keys.Length; i++)
            {
                // if we're at a leaf, then we're done
                if (curNode.value != null)
                    return (int)curNode.value;

                if (curNode.children.ContainsKey(keys[i]))
                    curNode = curNode.children[keys[i]];
                else
                    curNode = curNode.defaultChild;
            }

            return (int)curNode.value;
        }
    }

    private DefaultMapNode BuildTree()
    {
        return new DefaultMapNode(new int?[]{}, rules, priorities);
    }

    public void Add(int?[] keys, int value)
    {
        if (keys.Length != priorities.Length)
        {
            throw new ArgumentException("Invalid entry - wrong number of keys");
        }
        // Again, copy the array in production code
        rules.Add(new KeyValuePair<int?[], int>(keys, value));

        // reset the tree to know to regenerate it.
        rootNode = null;
    }

    public IEnumerator GetEnumerator()
    {
        throw new NotSupportedException();
    }

}


public class DefaultMapNode
{
    public Dictionary<int, DefaultMapNode> children = new Dictionary<int,DefaultMapNode>();
    public DefaultMapNode defaultChild = null; // done this way to workaround dict not handling null
    public int? value = null;

    public DefaultMapNode(IList<int?> usedValues, IEnumerable<KeyValuePair<int?[], int>> pool, int[] priorities)
    {
        int bestScore = Int32.MinValue;

        // get best current score
        foreach (KeyValuePair<int?[], int> rule in pool)
        {
            int currentScore = GetCurrentScore(usedValues, priorities, rule);
            bestScore = Math.Max(bestScore, currentScore);
        }

        // get pruned pool
        List<KeyValuePair<int?[], int>> prunedPool = new List<KeyValuePair<int?[], int>>();
        foreach (KeyValuePair<int?[], int> rule in pool)
        {
            int maxScore = GetCurrentScore(usedValues, priorities, rule);
            if (maxScore == Int32.MinValue)
                continue;

            for (int i = usedValues.Count; i < rule.Key.Length; i++)
                if (rule.Key[i] != null)
                    maxScore += priorities[i];

            if (maxScore >= bestScore)
                prunedPool.Add(rule);
        }

        // base optimization case, return leaf node
        // base case, always return same answer
        if ((prunedPool.Count == 1) || (usedValues.Count == prunedPool[0].Key.Length))
        {
            value = prunedPool[0].Value;
            return;
        }

        // add null base case
        AddChild(usedValues, priorities, prunedPool, null);
        foreach (KeyValuePair<int?[], int> rule in pool)
        {
            int? branch = rule.Key[usedValues.Count];
            if (branch != null && !children.ContainsKey((int)branch))
            {
                AddChild(usedValues, priorities, prunedPool, branch);
            }
        }


        // if all children are the same, then make a leaf
        int? maybeOnlyValue = null;
        foreach (int v in GetAllValues())
        {
            if (maybeOnlyValue != null && v != maybeOnlyValue)
                return;
            maybeOnlyValue = v;
        }
        if (maybeOnlyValue != null)
            value = maybeOnlyValue;

    }

    private static int GetCurrentScore(IList<int?> usedValues, int[] priorities, KeyValuePair<int?[], int> rule)
    {
        int currentScore = 0;
        for (int i = 0; i < usedValues.Count; i++)
        {
            if (rule.Key[i] != null)
            {
                if (rule.Key[i] == usedValues[i])
                    currentScore += priorities[i];
                else
                    return Int32.MinValue;
            }
        }
        return currentScore;
    }

    private void AddChild(IList<int?> usedValues, int[] priorities, List<KeyValuePair<int?[], int>> prunedPool, Nullable<int> nextValue)
    {
        List<int?> chainedValues = new List<int?>();
        chainedValues.AddRange(usedValues);
        chainedValues.Add(nextValue);            
        DefaultMapNode node = new DefaultMapNode(chainedValues, prunedPool, priorities);
        if (nextValue == null)
            defaultChild = node;
        else
            children[(int)nextValue] = node;
    }

    public IEnumerable<int> GetAllValues()
    {
        foreach (DefaultMapNode child in children.Values)
            foreach (int v in child.GetAllValues())
                yield return v;
        if (defaultChild != null)
            foreach (int v in defaultChild.GetAllValues())
                yield return v;
        if (value != null)
            yield return (int)value;
    }

    public void PrintTree(int depth)
    {
        if (value == null)
            Console.WriteLine();
        else
        {
            Console.WriteLine(" = {0}", (int)value);
            return;
        }

        foreach (KeyValuePair<int, DefaultMapNode> child in children)
        {
            for (int i=0; i<depth; i++)
                Console.Write("    ");
            Console.Write(" {0}  ", child.Key);                
            child.Value.PrintTree(depth + 1);
        }
        for (int i = 0; i < depth; i++)
            Console.Write("    ");
        Console.Write("null");
        defaultChild.PrintTree(depth + 1);
    }
}