字频数以非常低效的方式计数

时间:2013-03-20 09:33:49

标签: ruby-on-rails time word-cloud

这是我的计算单词频率的代码

  word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay",  "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]

arr_stop_kwd=["a","and"] 

 frequencies = Hash.new(0)
   word_arr.each { |word|
      if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
        frequencies["#{word.downcase}"] += 1
      end
   }

当我有100k数据时需要9.03秒,那么,我可以用很多时间计算其他方式

提前谢谢

1 个答案:

答案 0 :(得分:2)

查看Facets gem

您可以使用frequency method

执行此类操作
require 'facets'
frequencies = (word_arr-arr_stop_kwd).frequency

请注意,可以从word_arr中减去停用词。请参阅Array Documentation