如何查找/定位.txt文件中最常用的单词并进行更改

时间:2016-09-14 09:50:36

标签: javascript word-frequency

我试图找出如何在文本文件中找到一个最常用的单词并更改该单个单词以便将其包含在其他内容中,例如:freewordchoice(免费+常用单词+选择)以及随处可见该单词可能改变的文本所在的文本。我一直看起来像这样的事情,但我无法找到它。我对Javascript很新,这就是我想要用的。要上传和显示文本工作正常,我不明白的是我如何定位最常用的单词,并在它实际显示在浏览器上之前在整个文本中进行更改。在我看来,我需要某种变量来找到这个词和某个地方来存储这个世界,还需要一个变量来放置我想要添加或改变目标词的内容。

示例文字:阿拉丁项目Gutenberg的文字和精彩的灯光

信息/问题提示:下面的代码在上面的示例文本的全文中找到最常用的字词。我现在这个词是阿拉丁。问题是我可以正确地替换阿拉丁这个词。我确实打印出fooAladdinbar,就像我想要的那样,而不仅仅是改变Aladding = fooAladdinbar,它只是示例文本中每个字母之间的fooAladdinbar。

这是解决的,是一个可变的问题。

1 个答案:

答案 0 :(得分:0)

这不完美,但有效,这是一个演示:

(此演示只找到常用词)

  • 它使用正则表达式
  • 拆分文本
  • 然后计算单词
  • 然后返回最常用的单词



var data = document.getElementById("data").value;

var allWords = data.split(/\b/);
var wordCountList = {};

allWords.forEach(function(word){
  if(word !== " "){
    if(!wordCountList.hasOwnProperty(word)){
      wordCountList[word] = {word: word, count:0};
    }
    wordCountList[word].count++;
  }
})


var maxCountWord = {count:0};
for(var propName in wordCountList){
  var currentWord = wordCountList[propName];
  if(maxCountWord.count<currentWord.count){
    maxCountWord = currentWord;
  }
}
console.info(maxCountWord);
&#13;
textarea{
  width:100%;
  height:100px;
}
&#13;
<textarea id="data" >
<!-- start slipsum code -->

The path of the righteous man is beset on all sides by the iniquities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and good will, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who would attempt to poison and destroy My brothers. And you will know My name is the Lord when I lay My vengeance upon thee.

<!-- end slipsum code -->  
</textarea>

<div id="result"></div>
&#13;
&#13;
&#13;

要替换这个词你也可以使用正则表达式:
(此Demo只是替换了一个常用词)

&#13;
&#13;
function freewordchoice (free, word, choice){
  var data = document.getElementById("data").innerHTML;   
  var replaceExpression = new RegExp("\\b"+word+"\\b","gi");
  console.info(replaceExpression);
  data =data.replace(replaceExpression, free + word + choice);
  document.getElementById("result").innerHTML = data;   
   
 }


freewordchoice("<b>", "the", "</b>");
&#13;
<b>Before:</b>
<div id="data" >
<!-- start slipsum code -->

The path of the righteous man is beset on all sides by the iniquities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and good will, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who would attempt to poison and destroy My brothers. And you will know My name is the Lord when I lay My vengeance upon thee.

<!-- end slipsum code -->  
</div>
<br/><br/>
<b>After:</b>
 <div id="result" >
   
   </div>
&#13;
&#13;
&#13;

<强>更新

问题在于这一行

common = 'the,a,do,in,with,this,so,that,of,and,not,did,when,what,were,went,was,as,  
if,who,had,at,can,you,which,while,will,to,till,then,them,their,she,  
he,once,out,no,must,many,me,is,it,his,him,her,about,have,i,has,your,  
would,where,whom,s,on,from,for,by,but,all,said,my,';

问题是在字符串,said,my,';的末尾删除最后一个逗号,它应该有效,如下所示:

common = 'the,a,do,in,with,this,so,that,of,and,not,did,when,what,were,went,was,as,  
if,who,had,at,can,you,which,while,will,to,till,then,them,their,she,  
he,once,out,no,must,many,me,is,it,his,him,her,about,have,i,has,your,  
would,where,whom,s,on,from,for,by,but,all,said,my';

从最后一个逗号开始,最后一个单词是一个空字符串。