查找在字符串中多次使用的短语

时间:2014-01-12 13:05:34

标签: c#

更新

抱歉,我的英语很少。

我想用字符串计算短语。

我的字符串在下面;

  

Lorem ipsum dolor 坐下来,奉献精神。法无   venenatis, lorem ipsum augue vel pellentesque sit amet lorem ipsum dolor egestas lacus,   et ipsum dolor nulla。

我想在下面;

  
      
  • 3x Lorem ipsum

  •   
  • 2x 坐下来

  •   

我试过这个链接的功能 Find which phrases have been used multiple times in a string.

但我的结果如下;

  • 重复= 10080x(计算空格?)
  • 重复= 99x photoshop
  • 重复= 52x dersleri
  • 重复= 44x photoshop dersleri
  • 重复= 36x photoshop ile

但我想在下面;

  • 重复= 44x photoshop dersleri
  • 重复= 36x photoshop ile
  • 重复=和其他人......

我使用了这个功能;

var splitBySpace = text2.Split(' ');

var doubleWords = splitBySpace
        .Select((x, i) => new { Value = x, Index = i })
        .Where(x => x.Index != splitBySpace.Length - 1)
        .Select(x => x.Value + " " + splitBySpace.ElementAt(x.Index + 1));

var duplicates = doubleWords
    .GroupBy(x => x)
    .Where(x => x.Count() > 1)
    .Select(x => new { x.Key, Count = x.Count() })
    .OrderByDescending(w => w.Count);

foreach (var word in duplicates)
    ensikkelimeler.Add(string.Format("{0}x {1}", word.Count, word.Key));

2 个答案:

答案 0 :(得分:1)

我稍微调整了你的代码(似乎是从this answer获取的)(我描述了评论中的更改):

// all separators from sample text, add additional if necessary
var splitBySpace = text2.Split(new[] {' ', '.', ','}, StringSplitOptions.RemoveEmptyEntries);

var doubleWords = splitBySpace
    // make the search case insensitive
    .Select((x, i) => new {Value = x.ToLowerInvariant(), Index = i})
    .Where(x => x.Index != splitBySpace.Length - 1)
    .Select(x => x.Value + " " + splitBySpace.ElementAt(x.Index + 1));

var ensikkelimeler = doubleWords
    .GroupBy(x => x)
    .Where(x => x.Count() > 1)
    .Select(x => new {x.Key, Count = x.Count()})
    .OrderByDescending(w => w.Count)
    // do the formatting inside the link expression
    .Select(word => string.Format("{0}x {1}", word.Count, word.Key))
    .ToList();

以下是示例文本的结果:

3x lorem ipsum 
3x ipsum dolor 
2x sit amet 

我还尝试了与您关联的问题的accepted answer。在我添加了对ToLowerInvariant()的调用后,它返回了两个单词短语的相同结果,但也包含了一个三字短语:

2x lorem ipsum dolor 
3x lorem ipsum 
3x ipsum dolor 
2x sit amet 

答案 1 :(得分:0)

var text = @"Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
             Nulla venenatis, lorem ipsum augue vel pellentesque sit amet, 
             lorem ipsum dolor egestas lacus, et ipsum dolor nulla.";

var phrases = new string[] { "sit amet", "lorem ipsum" };

var q = phrases.Select(p => new { phrase = p, Count =  CountPhraseInText(text, p) })
               .OrderBy(x => x.Count);

CountPhraseInText功能:

int CountPhraseInText(string input, string phrase)
{
     return new Regex(phrase, RegexOptions.IgnoreCase).Matches(input).Count;
}