Question

我正在尝试从文件读取输入并使用map计算它们。我想在从文件中读取时忽略空格。

val lines = Source.fromFile("file path","utf-8").getLines()

val counts = new collection.mutable.HashMap[String, Int].withDefaultValue(0)
lines.flatMap(line => line.split(" ")).foreach(word => counts(word) += 1)
for ((key, value) <- counts) println (key + "-->" + value)

当我尝试使用此代码进行以下输入时。

hello hello
    world goodbye hello
  world

输出

world-->2
goodbye-->1
hello-->3
-->2

它有2个空格。我该如何解决？

Answer 1

lines.flatMap(_.trim.split("\\s+"))

Answer 2

可能一种方法是使用过滤器：

lines
  .flatMap(line => line.split(" "))
  .filter(_ != " ")
  .foreach(word => counts(word) += 1)

无论如何，我会说有更好的方法，您可以强制迭代器使用toList方法进行评估，然后将groupBy与collect一起使用：

Iterator("some  word", "some    other")
  .flatMap(_.split(" "))
  .toList
  .groupBy(identity)
  .collect { case (a,b) if !a.isEmpty => (a, b.length)}

输出：

Map(some -> 2, word -> 1, other -> 1)

另请注意，这种方法可能效率低于您使用的方法，因为它创建了许多中间集合，我还没有对其进行任何基准测试，对于大型文件，它可能不是最佳选择。< / p>

Answer 3

此方法使用"\\W+"从每行中提取单词，而不管单词之间的空格数是多少，

Source.fromFile("filepath")
  .getLines
  .flatMap(_.trim.split("\\W+"))
  .toArray.groupBy(identity)
  .map ( kv => kv._1 -> kv._2.size )

因此

res: Map(world -> 2, goodbye -> 1, hello -> 3)

使用Scala中的get行从文件读取时忽略空格

3 个答案: