C#在LINQ查询中有所区别

时间:2015-05-21 09:02:51

标签: c# linq dictionary distinct

更改一些代码后我遇到了问题。我的想法是这样的:我在计算文档中的单词数量,但每个文档只有一个单词的副本,例如:

  

文件1 = Smith Smith Smith Smith =>史密斯x1

     

文件2 = Smith Alan Alan =>史密斯x1,艾伦x1

     

文件3 = John John =>约翰x1

但史密斯的总数应该是:

  

史密斯x2(3篇文件中的2篇),艾伦x1(3篇文章中的1篇),约翰x1篇(3篇文章中的1份)

我认为之前有一个单独的方法可以解决这个问题(如果distinct = false也计算所有单词),现在只产生1

之前的代码:

    private Dictionary<string, int> tempDict = new Dictionary<string, int>();
    private void Splitter(string[] file)
    {              
            tempDict = file
                .SelectMany(i => File.ReadAllLines(i)
                .SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))                    
                .AsParallel()
                .Select(word => word.ToLower()) 
                .Distinct())
                .GroupBy(word => word)                    
                .ToDictionary(g => g.Key, g => g.Count());
    }

应该更改它以便它返回字典,但在制作应用程序的过程中将此更改为此代码:

private Dictionary<string, int> Splitter(string[] file, bool distinct, bool pairs)
{
    var query = file
        .SelectMany(i => File.ReadLines(i)
        .SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
        .AsParallel()
        .Select(word => word.ToLower())
        .Where(word => !word.All(char.IsDigit)));
    if (distinct)
    {
        query = query.Distinct();
    }
    if (pairs)
    {
        var pairWise = query.Pairwise((first, second) => string.Format("{0} {1}", first, second));

        return query
                .Concat(pairWise)
                .GroupBy(word => word)
                .ToDictionary(g => g.Key, g => g.Count());
    }
    return query
        .GroupBy(word => word)
        .ToDictionary(g => g.Key, g => g.Count());           
}

另请注意,query = file.Distinct();仅返回文档的名称。所以它必须是不同的东西。

@edit 这就是我调用这个方法的方法:

  private void EnterDocument(object sender, RoutedEventArgs e)
    {
        List<string> myFile= new List<string>();
        OpenFileDialog openFileDialog = new OpenFileDialog();
        openFileDialog.Multiselect = true;
        openFileDialog.Filter = "All files (*.*)|*.*|Text files (*.txt)|*.txt";
        if (openFileDialog.ShowDialog() == true)
        {
            foreach (string filename in openFileDialog.FileNames)
            {
                myFile.Add(filename);

            }
        }
        string[] myFiles= myFile.ToArray();
        myDatabase = Splitter(myFiles, true, false);
    }

1 个答案:

答案 0 :(得分:1)

Distinct()会删除IEnumerable中的重复内容,因此请在以下内容之前调用它...

return query
    .GroupBy(word => word)
    .ToDictionary(g => g.Key, g => g.Count());  

...将生成所有唯一单词的列表,但计数为1.

修改

要解决合并所有行问题,您可以执行以下操作:

List<string> allFilesWords = new List<string>();
foreach (var filename in file)
{
    var fileQuery = File.ReadLines(filename)
        .SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
        .AsParallel()
        .Select(word => word.ToLower())
        .Where(word => !word.All(char.IsDigit)));
    allFilesWords.AddRange(fileQuery.Distinct());
}
return allFilesWords
        .GroupBy(word => word)
        .ToDictionary(g => g.Key, g => g.Count());