计算单词出现次数,允许特殊字符和换行符

时间:2017-07-29 00:24:38

标签: javascript node.js regex count find-occurrences

我正在尝试构建一个能够计算短语中单词出现次数的函数。

该功能应包括短语中的单词具有其他非字母字符和/或行尾字符的情况。

function countWordInText(word,phrase){
    var c=0;
    phrase = phrase.concat(" ");
    regex = (word,/\W/g);
    var fChar = phrase.indexOf(word);
    var subPhrase = phrase.slice(fChar);

    while (regex.test(subPhrase)){
        c += 1;
        subPhrase = subPhrase.slice((fChar+word.length));
        fChar = subPhrase.indexOf(word);
    }
    return c;
}

问题在于对于简单的值,例如

phrase = "hi hi hi all hi. hi";
word = "hi"
// OR
word = "hi all";

它返回错误值。

1 个答案:

答案 0 :(得分:1)

您编写的算法显示您花了一些时间试图让它工作。但是,还有很多地方无法工作。例如,(word,/W/g)实际上并没有创建您可能认为的正则表达式。

还有一种更简单的方法:

function countWordInText (word, phrase) {
  // Escape any characters in `word` that may have a special meaning
  // in regular expressions.
  // Taken from https://stackoverflow.com/a/6969486/4220785
  word = word.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, '\\$&')

  // Replace any whitespace in `word` with `\s`, which matches any
  // whitespace character, including line breaks.
  word = word.replace(/\s+/g, '\\s')

  // Create a regex with our `word` that will match it as long as it
  // is surrounded by a word boundary (`\b`). A word boundary is any
  // character that isn't part of a word, like whitespace or
  // punctuation.
  var regex = new RegExp('\\b' + word + '\\b', 'g')

  // Get all of the matches for `phrase` using our new regex.
  var matches = phrase.match(regex)

  // If some matches were found, return how many. Otherwise, return 0.
  return matches ? matches.length : 0
}

countWordInText('hi', 'hi hi hi all hi. hi') // 5

countWordInText('hi all', 'hi hi hi all hi. hi') // 1

countWordInText('hi all', 'hi hi hi\nall hi. hi') // 1

countWordInText('hi all', 'hi hi hi\nalligator hi. hi') // 0

countWordInText('hi', 'hi himalayas') // 1

我在整个示例中都添加了评论。希望这有助于您开始使用!

以下是一些在Javascript中了解正则表达式的好地方:

您还可以使用Regexr测试正则表达式。