如何加快递归搜索功能?

时间:2014-06-13 12:04:32

标签: c# linq search recursion tree

我在编写搜索功能的速度方面遇到了麻烦。功能步骤如下所述:

  1. 该函数以两个表名参数开始,即起点和目标
  2. 该函数然后遍历表 - 列组合列表(50,000长)并检索与起始点表关联的所有组合。
  3. 然后该函数遍历每个检索到的组合,并且对于每个组合,它再次遍历表 - 列组合列表,但这次查找与给定列匹配的表。
  4. 最后,函数遍历最后一步中检索到的每个组合,并且对于每个组合,它检查表是否与目标表相同;如果是这样它会保存它,如果不是,它会调用自己传递表名形式的组合。
  5. 功能目标是能够跟踪链接为直接或具有多个分离度的表之间的链接。递归级别是固定的整数值。

    我的问题是,每当我尝试在两个级别的搜索深度上运行此功能时(在此阶段不敢尝试更深),作业内存不足,或者我失去了耐心。我等了17分钟才把工作用完了。

    每个表的平均列数为28,标准差为34。

    这是一个图表,显示了可以在表之间建立的各种链接的示例:

    Each column can have a match in multiple tables. Each matching table can then be searched column by column for tables with matching columns and so on

    这是我的代码:

    private void FindLinkingTables(List<TableColumns> sourceList, TableSearchNode parentNode, string targetTable, int maxSearchDepth)
    {
        if (parentNode.Level < maxSearchDepth)
        {
            IEnumerable<string> tableColumns = sourceList.Where(x => x.Table.Equals(parentNode.Table)).Select(x => x.Column);
    
            foreach (string sourceColumn in tableColumns)
            {
                string shortName = sourceColumn.Substring(1);
    
                IEnumerable<TableSearchNode> tables = sourceList.Where(
                    x => x.Column.Substring(1).Equals(shortName) && !x.Table.Equals(parentNode.Table) && !parentNode.Ancenstory.Contains(x.Table)).Select(
                        x => new TableSearchNode { Table = x.Table, Column = x.Column, Level = parentNode.Level + 1 });
                foreach (TableSearchNode table in tables)
                {
                    parentNode.AddChildNode(sourceColumn, table);
                    if (!table.Table.Equals(targetTable))
                    {
                        FindLinkingTables(sourceList, table, targetTable, maxSearchDepth);
                    }
                    else
                    {
                        table.NotifySeachResult(true);
                    }
                }
            }
        }
    }
    

    编辑分离出TableSearchNode逻辑并添加属性和方法以实现完整性

    //TableSearchNode
    public Dictionary<string, List<TableSearchNode>> Children { get; private set; }
    
    //TableSearchNode
    public List<string> Ancenstory
    {
        get
        {
            Stack<string> ancestory = new Stack<string>();
            TableSearchNode ancestor = ParentNode;
            while (ancestor != null)
            {
                ancestory.Push(ancestor.tbl);
                ancestor = ancestor.ParentNode;
            }
            return ancestory.ToList();
        }
    }
    
    //TableSearchNode
    public void AddChildNode(string referenceColumn, TableSearchNode childNode)
        {
            childNode.ParentNode = this;
            List<TableSearchNode> relatedTables = null;
            Children.TryGetValue(referenceColumn, out relatedTables);
            if (relatedTables == null)
            {
                relatedTables = new List<TableSearchNode>();
                Children.Add(referenceColumn, relatedTables);
            }
            relatedTables.Add(childNode);
        }
    

    提前感谢您的帮助!

4 个答案:

答案 0 :(得分:4)

你真的浪费了很多记忆。立刻想到的是:

  1. 首先用List<TableColumns> sourceList替换传入的ILookup<string, TableColumns>。您应该在致电FindLinkingTables之前执行此操作:

    ILookup<string, TableColumns> sourceLookup = sourceList.ToLookup(s => s.Table);
    FindLinkingTables(sourceLookup, parentNode, targetTable, maxSearchDepth);
    
  2. 如果确实不需要,请不要致电.ToList()。例如,如果您只想枚举结果列表的所有子项,则不需要它。所以你的主要功能将如下所示:

    private void FindLinkingTables(ILookup<string, TableColumns> sourceLookup, TableSearchNode parentNode, string targetTable, int maxSearchDepth)
    {
        if (parentNode.Level < maxSearchDepth)
        {
            var tableColumns = sourceLookup[parentNode.Table].Select(x => x.Column);
    
            foreach (string sourceColumn in tableColumns)
            {
                string shortName = sourceColumn.Substring(1);
    
                var tables = sourceLookup
                    .Where(
                        group => !group.Key.Equals(parentNode.Table)
                                 && !parentNode.Ancenstory.Contains(group.Key))
                    .SelectMany(group => group)
                    .Where(tableColumn => tableColumn.Column.Substring(1).Equals(shortName))
                    .Select(
                        x => new TableSearchNode
                        {
                            Table = x.Table,
                            Column = x.Column,
                            Level = parentNode.Level + 1
                        });
    
                foreach (TableSearchNode table in tables)
                {
                    parentNode.AddChildNode(sourceColumn, table);
                    if (!table.Table.Equals(targetTable))
                    {
                        FindLinkingTables(sourceLookup, table, targetTable, maxSearchDepth);
                    }
                    else
                    {
                        table.NotifySeachResult(true);
                    }
                }
            }
        }
    }
    

    <强> [编辑]

  3. 另外,为了加速剩余的复杂LINQ查询,您可以准备另一个ILookup

    ILookup<string, TableColumns> sourceColumnLookup = sourceLlist
            .ToLookup(t => t.Column.Substring(1));
    
    //...
    
    private void FindLinkingTables(
        ILookup<string, TableColumns> sourceLookup, 
        ILookup<string, TableColumns> sourceColumnLookup,
        TableSearchNode parentNode, string targetTable, int maxSearchDepth)
    {
        if (parentNode.Level >= maxSearchDepth) return;
    
        var tableColumns = sourceLookup[parentNode.Table].Select(x => x.Column);
    
        foreach (string sourceColumn in tableColumns)
        {
            string shortName = sourceColumn.Substring(1);
    
            var tables = sourceColumnLookup[shortName]
                .Where(tableColumn => !tableColumn.Table.Equals(parentNode.Table)
                                      && !parentNode.AncenstoryReversed.Contains(tableColumn.Table))
                .Select(
                    x => new TableSearchNode
                        {
                            Table = x.Table,
                            Column = x.Column,
                            Level = parentNode.Level + 1
                        });
    
    
            foreach (TableSearchNode table in tables)
            {
                parentNode.AddChildNode(sourceColumn, table);
                if (!table.Table.Equals(targetTable))
                {
                    FindLinkingTables(sourceLookup, sourceColumnLookup, table, targetTable, maxSearchDepth);
                }
                else
                {
                    table.NotifySeachResult(true);
                }
            }
        }
    }
    
  4. 我已检查过您的Ancestory财产。如果IEnumerable<string>足以满足您的需求,请检查此实施:

    public IEnumerable<string> AncenstoryEnum
    {
        get { return AncenstoryReversed.Reverse(); }
    }
    
    public IEnumerable<string> AncenstoryReversed
    {
        get
        {
            TableSearchNode ancestor = ParentNode;
            while (ancestor != null)
            {
                yield return ancestor.tbl;
                ancestor = ancestor.ParentNode;
            }
        }
    }
    

答案 1 :(得分:2)

我设法将您的FindLinkingTables代码重构为:

private void FindLinkingTables(
    List<TableColumns> sourceList, TableSearchNode parentNode,
    string targetTable, int maxSearchDepth)
{
    if (parentNode.Level < maxSearchDepth)
    {
        var sames = sourceList.Where(w => w.Table == parentNode.Table);

        var query =
            from x in sames
            join y in sames
                on x.Column.Substring(1) equals y.Column.Substring(1)
            where !parentNode.Ancenstory.Contains(y.Table)
            select new TableSearchNode
            {
                Table = x.Table,
                Column = x.Column,
                Level = parentNode.Level + 1
            };

        foreach (TableSearchNode z in query)
        {
            parentNode.AddChildNode(z.Column, z);
            if (z.Table != targetTable)
            {
                FindLinkingTables(sourceList, z, targetTable, maxSearchDepth);
            }
            else
            {
                z.NotifySeachResult(true);
            }
        }
    }
}

在我看来,查询的where !parentNode.Ancenstory.Contains(y.Table)部分中的逻辑是有缺陷的。我想你需要在这里重新考虑你的搜索操作,看看你想出了什么。

答案 2 :(得分:2)

有一些事情让我看到这个源方法:

  1. 在您的Where条款中,您拨打了parentNode.Ancenstory的电话;这本身就有对数运行时间,然后你在它返回的.Contains上调用List<string>,这是另一个对数调用(它是线性的,但列表的对数为元件)。 你在这里做的是检查图表中的周期。通过向TableColumns.Table添加字段可以使这些费用保持不变,该字段存储有关算法处理Table的方式的信息(或者,您可以使用Dictionary<Table, int>,以避免添加字段到对象)。通常,在DFS算法中,此字段为白色,灰色或黑色 - 白色表示未处理(您之前未见Table),灰色表示当前Table的祖先正在处理,当你处理Table及其所有孩子的时候,正在处理黑色。要更新代码以执行此操作,它看起来像:

    foreach (string sourceColumn in tableColumns)
    {
        string shortName = sourceColumn.Substring(1);
    
        IEnumerable<TableSearchNode> tables =
            sourceList.Where(x => x.Column[0].Equals(shortName) &&
                                  x.Color == White)
                      .Select(x => new TableSearchNode
                                       {
                                            Table = x.Table,
                                            Column = x.Column,
                                            Level = parentNode.Level + 1
                                        });
        foreach (TableSearchNode table in tables)
        {
            parentNode.AddChildNode(sourceColumn, table);
    
            table.Color = Grey;
    
            if (!table.Table.Equals(targetTable))
            {
                FindLinkingTables(sourceList, table, targetTable, maxSearchDepth);
            }
            else
            {
                table.NotifySeachResult(true);
            }
    
            table.Color = Black;
        }
    }
    
  2. 如上所述,您的内存不足。最简单的解决方法是删除递归调用(充当隐式堆栈)并将其替换为显式Stack数据结构,删除递归。另外,这会将递归更改为循环,C#在优化时更好。

    private void FindLinkingTables(List<TableColumns> sourceList, TableSearchNode root, string targetTable, int maxSearchDepth)
    {
        Stack<TableSearchNode> stack = new Stack<TableSearchNode>();
        TableSearchNode current;
    
        stack.Push(root);
    
        while (stack.Count > 0 && stack.Count < maxSearchDepth)
        {
            current = stack.Pop();
    
            var tableColumns = sourceList.Where(x => x.Table.Equals(current.Table))
                                         .Select(x => x.Column);
    
            foreach (string sourceColumn in tableColumns)
            {
                string shortName = sourceColumn.Substring(1);
    
                IEnumerable<TableSearchNode> tables =
                    sourceList.Where(x => x.Column[0].Equals(shortName) &&
                                          x.Color == White)
                              .Select(x => new TableSearchNode
                                               {
                                                    Table = x.Table,
                                                    Column = x.Column,
                                                    Level = current.Level + 1
                                                });
                foreach (TableSearchNode table in tables)
                {
                    current.AddChildNode(sourceColumn, table);
    
                    if (!table.Table.Equals(targetTable))
                    {
                        table.Color = Grey;
                        stack.Push(table);
                    }
                    else
                    {
                        // you could go ahead and construct the ancestry list here using the stack
                        table.NotifySeachResult(true);
                        return;
                    }
                }
            }
    
            current.Color = Black;
    
        }
    }
    
  3. 最后,我们不知道Table.Equals的代价是多少,但如果比较深,那么可能会给内循环增加大量的运行时间。

答案 3 :(得分:2)

好的,这是一个基本上放弃了你发布的所有代码的答案。

首先,您应该使用List<TableColumns>并将它们哈希到可以编入索引的内容中,而不必遍历整个列表。

为此,我写了一个名为TableColumnIndexer的课程:

class TableColumnIndexer
{
    Dictionary<string, HashSet<string>> tables = new Dictionary<string, HashSet<string>>();

    public void Add(string tableName, string columnName)
    {
        this.Add(new TableColumns { Table = tableName, Column = columnName });
    }

    public void Add(TableColumns tableColumns)
    {
        if(! tables.ContainsKey(tableColumns.Table))
        {
            tables.Add(tableColumns.Table, new HashSet<string>());
        }

        tables[tableColumns.Table].Add(tableColumns.Column);
    }

    // .... More code to follow

现在,一旦将所有表/列值注入此索引类,就可以调用递归方法来检索两个表之间的最短祖先链接。这里的实现有点草率,但为了清晰起见,这是为了清晰起见:

    // .... continuation of TableColumnIndexer class
    public List<string> GetShortestAncestry(string parentName, string targetName, int maxDepth)
    {
        return GetSortestAncestryR(parentName, targetName, maxDepth - 1, 0, new Dictionary<string,int>());
    }

    private List<string> GetSortestAncestryR(string currentName, string targetName, int maxDepth, int currentDepth, Dictionary<string, int> vistedTables)
    {
        // Check if we have visited this table before
        if (!vistedTables.ContainsKey(currentName))
            vistedTables.Add(currentName, currentDepth);

        // Make sure we have not visited this table at a shallower depth before
        if (vistedTables[currentName] < currentDepth)
            return null;
        else
            vistedTables[currentName] = currentDepth;


        if (currentDepth <= maxDepth)
        {
            List<string> result = new List<string>();

            // First check if the current table contains a reference to the target table
            if (tables[currentName].Contains(targetName))
            {
                result.Add(currentName);
                result.Add(targetName);
                return result;
            }
            // If not try to see if any of the children tables have the target table
            else
            {
                List<string> bestResult = null;
                    int bestDepth = int.MaxValue;

                foreach (string childTable in tables[currentName])
                {
                    var tempResult = GetSortestAncestryR(childTable, targetName, maxDepth, currentDepth + 1, vistedTables);

                    // Keep only the shortest path found to the target table
                    if (tempResult != null && tempResult.Count < bestDepth)
                    {
                        bestDepth = tempResult.Count;
                        bestResult = tempResult;
                    }
                }

                // Take the best link we found and add it to the result list
                if (bestDepth < int.MaxValue && bestResult != null)
                {
                    result.Add(currentName);
                    result.AddRange(bestResult);
                    return result;
                }
                // If we did not find any result, return nothing
                else
                {
                    return null;
                }
            }
        }
        else
        {
            return null;
        }
    }
}

现在所有这些代码只是一个(有点冗长的)最短路径算法的实现,它允许源表和目标表之间的循环路径和多个路径。请注意,如果两个表之间有两条具有相同深度的路径,则算法将只选择一个(并且不一定是可预测的)。