Question

我读过这篇文章：Scala beginners - simplest way to count words in file

代码只是一行而且非常华丽。但是，我无法理解它是什么：

scala.io.Source.fromFile("file.txt")
  .getLines
  .flatMap(_.split("\\W+"))
  .foldLeft(Map.empty[String, Int]){
     (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
  }

我无法理解foldLeft部分，这直接影响了我修改此代码的能力。 foldLeft的定义如下：def foldLeft[B](z: B)(f: (B, A) => B): B

这foldLeft做了什么？为什么它可以提取并输入count和word并放入第二个咖喱函数的部分功能？

我要匹配的字词用Map：

表示

  val dictionary = Map(
    """will""" -> 1,
    """going to""" -> 2,
    """future""" -> 3
  )

如何将此Map与该代码合并？或者我应该完全尝试其他的东西吗？

我想出了这个坏主意：我可能会在外面抛出一个for循环..但它看起来会非常难看。

Answer 1

假设您只想计算dictionary中作为键出现的单词，您需要添加过滤器：

scala.io.Source.fromFile("file.txt")
  .getLines
  .flatMap(_.split("\\W+"))
  .filter(dictionary.contains(_))
  .foldLeft(Map.empty[String, Int]){
     (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
  }

现在，有关如何使用foldLeft的一般解释......

将我们的类型插入foldLeft定义，我们有： foldLeft[Map[String, Int]](z: Map[String, Int])(f: (Map[String, Int], String) => Map[String, Int]): Map[String, Int]

如此简化，foldLeft采用两个curried参数，累加器的初始值（在我们的例子中为空Map），以及一个本身带有两个参数的函数：累加器和正在遍历的结构中的当前项目（当前单词）。

因此，对于每个函数调用，count将是当前Map个计数，word将是当前单词。

对于每个单词，我们将返回一个新的Map，其中包含当前单词的计数（如果它不存在，则为0）递增1. foldLeft的最终结果是完整的Map计数。

Answer 2

让file.txt包含例如

他将走向未来，因为未来将成为现在。

因此

val file = scala.io.Source.fromFile("file.txt").mkString

将文件内容上传到字符串中（最大字符串大小是此方法的限制因素;否则可能会考虑StringBuffer）。

然后对于给定的字典，例如

val dictionary = Map( """will""" -> 1,
                      """going to""" -> 2,
                      """future""" -> 3 )

我们有那个

dictionary.map { case(k,v) => k -> k.r.findAllIn(file).size }
res: Map[String,Int] = Map( will -> 1, going to -> 2, future -> 2 )

要结束此代码，请考虑

implicit class RichWordCount(val filename: String) extends AnyVal {
  def dictioCount(dictionary: Map[String,Int]): Map[String,Int] = {
    val file = scala.io.Source.fromFile(filename).mkString
    dictionary.map { case(k,v) => k -> k.r.findAllIn(file).size }
  }
}

然后我们可以用

调用它

"file.txt".dictioCount(dictionary)
res: Map[String,Int] = Map(will -> 1, going to -> 2, future -> 2)

Answer 3

不能直接回答你的问题（我不确定我是否正确理解你的问题），但我认为这个代码可以更简单地计算单词：

  val words = List("the", "the", "water")
  val groupedWords = words.groupBy(word => word)
  println(groupedWords)
  val wordsWithCount = groupedWords.mapValues(_.size)
  println(wordsWithCount)

groupBy只是根据某些键将数据分类为子集。在这种情况下，我只需要将键作为单词本身（参见下面的输出）。顺便说一下，groupBy基本上是foldLeft的实现，用于对事物进行分组。

输出结果为：

Map(water -> List(water), the -> List(the, the))
Map(water -> 1, the -> 2)

修改

我想我现在明白你不想算所有的单词，只需要某些单词：

val validWords = dictionary.keys.toSet val filteredWords = words.filter(word => validWords.contains(word))

（为了提高性能，首先执行分组可能会更好，并且只在最后过滤有效字。但这取决于字典大小，要处理的字数以及字的频率重复。）

Answer 4

给出一个字符串数组，该函数获取每个字符串的频率并返回一个元组数组。

$(".list-group-item").('click',function(){

     var liIndex = $(this).index();
     if( liIndex === 0 ){ // i'm the first li from .list-group }

     // target last - logical
     if( liIndex === $(this).parent().children('.list-group-item').length -1 ){ // i'm the last li from .list-group }

    // target last - hardcoded
     if( liIndex === $('.list-group .list-group-item').length -1 ){ // i'm the last li from .list-group }

})

Scala匹配某些单词和计数频率

4 个答案: