更改一些代码后我遇到了问题。我的想法是这样的:我在计算文档中的单词数量,但每个文档只有一个单词的副本,例如:
文件1 = Smith Smith Smith Smith =>史密斯x1
文件2 = Smith Alan Alan =>史密斯x1,艾伦x1
文件3 = John John =>约翰x1
但史密斯的总数应该是:
史密斯x2(3篇文件中的2篇),艾伦x1(3篇文章中的1篇),约翰x1篇(3篇文章中的1份)
我认为之前有一个单独的方法可以解决这个问题(如果distinct = false
也计算所有单词),现在只产生1
。
之前的代码:
private Dictionary<string, int> tempDict = new Dictionary<string, int>();
private void Splitter(string[] file)
{
tempDict = file
.SelectMany(i => File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Distinct())
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
应该更改它以便它返回字典,但在制作应用程序的过程中将此更改为此代码:
private Dictionary<string, int> Splitter(string[] file, bool distinct, bool pairs)
{
var query = file
.SelectMany(i => File.ReadLines(i)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
if (distinct)
{
query = query.Distinct();
}
if (pairs)
{
var pairWise = query.Pairwise((first, second) => string.Format("{0} {1}", first, second));
return query
.Concat(pairWise)
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
另请注意,query = file.Distinct();
仅返回文档的名称。所以它必须是不同的东西。
@edit 这就是我调用这个方法的方法:
private void EnterDocument(object sender, RoutedEventArgs e)
{
List<string> myFile= new List<string>();
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Multiselect = true;
openFileDialog.Filter = "All files (*.*)|*.*|Text files (*.txt)|*.txt";
if (openFileDialog.ShowDialog() == true)
{
foreach (string filename in openFileDialog.FileNames)
{
myFile.Add(filename);
}
}
string[] myFiles= myFile.ToArray();
myDatabase = Splitter(myFiles, true, false);
}
答案 0 :(得分:1)
Distinct()
会删除IEnumerable
中的重复内容,因此请在以下内容之前调用它...
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
...将生成所有唯一单词的列表,但计数为1.
修改强>
要解决合并所有行问题,您可以执行以下操作:
List<string> allFilesWords = new List<string>();
foreach (var filename in file)
{
var fileQuery = File.ReadLines(filename)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
allFilesWords.AddRange(fileQuery.Distinct());
}
return allFilesWords
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());