如何计算字符串中每个单词的出现次数?

时间:2014-04-22 18:44:00

标签: c# asp.net

我使用以下代码从字符串输入中提取单词,我怎样才能得到每个单词的出现?

var words = Regex.Split(input, @"\W+")
                        .AsEnumerable()
                        .GroupBy(w => w)
                        .Where(g => g.Count() > 10)
                        .Select(g => g.Key);

4 个答案:

答案 0 :(得分:4)

您可以使用Regex.Split而不是string.Split来获取每个单词的计数,如:

string str = "Some string with Some string repeated";
var result  = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
                .GroupBy(r => r)
                .Select(grp => new
                    {
                        Word = grp.Key,
                        Count = grp.Count()
                    });

如果您要过滤掉至少重复10次的单词,那么您可以在Select之前添加条件Where(grp=> grp.Count >= 10)

输出:

foreach (var item in result)
{
    Console.WriteLine("Word: {0}, Count:{1}", item.Word, item.Count);
}

输出:

Word: Some, Count:2
Word: string, Count:2
Word: with, Count:1
Word: repeated, Count:1

对于不区分大小写的分组,您可以将当前的GroupBy替换为:

.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)

所以你的查询是:

var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
                .GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
                .Where(grp => grp.Count() >= 10)
                .Select(grp => new
                    {
                        Word = grp.Key,
                        Count = grp.Count()
                    });

答案 1 :(得分:2)

试试这个:

var words = Regex.Split(input, @"\W+")
                        .AsEnumerable()
                        .GroupBy(w => w)
                        .Select(g => new {key = g.Key, count = g.Count()});

答案 2 :(得分:0)

删除Select语句以保留IGrouping,您可以使用var words = Regex.Split(input, @"\W+") .AsEnumerable() .GroupBy(w => w) .Where(g => g.Count() > 10); foreach (var wordGrouping in words) { var word = wordGrouping.Key; var count = wordGrouping.Count(); } 查看这两个键并计算值。

{{1}}

答案 3 :(得分:0)

你可以制作这样的字典:

var words = Regex.Split(input, @"\W+")
                 .GroupBy(w => w)
                 .Select(g => g.Count() > 10)
                 .ToDictionary(g => g.Key, g => g.Count());

或者,如果您想避免计算两次计数,请执行以下操作:

var words = Regex.Split(input, @"\W+")
                 .GroupBy(w => w)
                 .Select(g => new { g.Key, Count = g.Count() })
                 .Where(g => g.Count > 10)
                 .ToDictionary(g => g.Key, g => g.Count);

现在你可以得到这样的单词数(假设单词" foo"在input中出现超过10次):

var fooCount = words["foo"];