在Scala中如何从列表中删除重复项?

时间:2011-08-21 00:41:21

标签: scala

假设我有

val dirty = List("a", "b", "a", "c")

是否有返回“a”,“b”,“c”

的列表操作

8 个答案:

答案 0 :(得分:157)

查看Seq的ScalaDoc,

scala> dirty.distinct
res0: List[java.lang.String] = List(a, b, c)

更新。其他人建议使用Set而不是List。没关系,但请注意,默认情况下,Set接口不会保留元素顺序。您可能希望使用明确 保留顺序的Set实现,例如collection.mutable.LinkedHashSet

答案 1 :(得分:15)

scala.collection.immutable.List现在有.distinct方法。

现在可以调用dirty.distinct,而无需转换为SetSeq

答案 2 :(得分:14)

在使用Kitpon的解决方案之前,考虑使用Set而不是List,它确保每个元素都是唯一的。

由于大多数列表操作(foreachmapfilter,...)对于集合和列表都是相同的,因此在代码中更改集合非常容易。

答案 3 :(得分:5)

首先使用Set是正确的方法,但是:

scala> List("a", "b", "a", "c").toSet.toList
res1: List[java.lang.String] = List(a, b, c)

作品。或者只是toSet因为它支持 Seq Traversable接口。

答案 4 :(得分:0)

对于已排序列表

如果您碰巧想要一个列表,该列表的各个项目已经已经排序(如我经常需要的那样),则以下内容的速度大约是.distinct的两倍:

  def distinctOnSorted[V](seq: List[V]): List[V] =
    seq.foldLeft(List[V]())((result, v) =>
      if (result.isEmpty || v != result.head) v :: result else result)
    .reverse

性能结果来自0-99的100,000,000个随机整数:

distinct        : 0.6655373s
distinctOnSorted: 0.2848134s

MutableList或ListBuffer的性能

实践似乎表明,更具可变性/非功能性的编程方法似乎比添加不可变列表更快。不变的实现始终表现更好。我的猜测是,scala将其编译器优化的重点放在不可变的集合上,并且做得很好。 (我欢迎其他人提交更好的实施方案。)

List size 1e7, random 0 to 1e6
------------------------------
distinct            : 4562.2277ms
distinctOnSorted    : 201.9462ms
distinctOnSortedMut1: 4399.7055ms
distinctOnSortedMut2: 246.099ms
distinctOnSortedMut3: 344.0758ms
distinctOnSortedMut4: 247.0685ms

List size 1e7, random 0 to 100
------------------------------
distinct            : 88.9158ms
distinctOnSorted    : 41.0373ms
distinctOnSortedMut1: 3283.8945ms
distinctOnSortedMut2: 54.4496ms
distinctOnSortedMut3: 58.6073ms
distinctOnSortedMut4: 51.4153ms

实施:

object ListUtil {
  def distinctOnSorted[V](seq: List[V]): List[V] =
    seq.foldLeft(List[V]())((result, v) =>
      if (result.isEmpty || v != result.head) v :: result else result)
    .reverse

  def distinctOnSortedMut1[V](seq: List[V]): Seq[V] = {
    if (seq.isEmpty) Nil
    else {
      val result = mutable.MutableList[V](seq.head)
      seq.zip(seq.tail).foreach { case (prev, next) =>
        if (prev != next) result += next
      }
      result //.toList
    }
  }

  def distinctOnSortedMut2[V](seq: List[V]): Seq[V] = {
    val result = mutable.MutableList[V]()
    if (seq.isEmpty) return Nil
    result += seq.head
    var prev = seq.head
    for (v <- seq.tail) {
      if (v != prev) result += v
      prev = v
    }
    result //.toList
  }

  def distinctOnSortedMut3[V](seq: List[V]): List[V] = {
    val result = mutable.MutableList[V]()
    if (seq.isEmpty) return Nil
    result += seq.head
    var prev = seq.head
    for (v <- seq.tail) {
      if (v != prev) v +=: result
      prev = v
    }
    result.reverse.toList
  }

  def distinctOnSortedMut4[V](seq: List[V]): Seq[V] = {
    val result = ListBuffer[V]()
    if (seq.isEmpty) return Nil
    result += seq.head
    var prev = seq.head
    for (v <- seq.tail) {
      if (v != prev) result += v
      prev = v
    }
    result //.toList
  }
}

测试:

import scala.util.Random

class ListUtilTest extends UnitSpec {
  "distinctOnSorted" should "return only the distinct elements in a sorted list" in {
    val bigList = List.fill(1e7.toInt)(Random.nextInt(100)).sorted

    val t1 = System.nanoTime()
    val expected = bigList.distinct
    val t2 = System.nanoTime()
    val actual = ListUtil.distinctOnSorted[Int](bigList)
    val t3 = System.nanoTime()
    val actual2 = ListUtil.distinctOnSortedMut1(bigList)
    val t4 = System.nanoTime()
    val actual3 = ListUtil.distinctOnSortedMut2(bigList)
    val t5 = System.nanoTime()
    val actual4 = ListUtil.distinctOnSortedMut3(bigList)
    val t6 = System.nanoTime()
    val actual5 = ListUtil.distinctOnSortedMut4(bigList)
    val t7 = System.nanoTime()

    actual should be (expected)
    actual2 should be (expected)
    actual3 should be (expected)
    actual4 should be (expected)
    actual5 should be (expected)

    val distinctDur = t2 - t1
    val ourDur = t3 - t2

    ourDur should be < (distinctDur)

    print(s"distinct            : ${distinctDur / 1e6}ms\n")
    print(s"distinctOnSorted    : ${ourDur / 1e6}ms\n")
    print(s"distinctOnSortedMut1: ${(t4 - t3) / 1e6}ms\n")
    print(s"distinctOnSortedMut2: ${(t5 - t4) / 1e6}ms\n")
    print(s"distinctOnSortedMut3: ${(t6 - t5) / 1e6}ms\n")
    print(s"distinctOnSortedMut4: ${(t7 - t6) / 1e6}ms\n")
  }
}

答案 5 :(得分:0)

您还可以使用递归和模式匹配:

def removeDuplicates[T](xs: List[T]): List[T] = xs match {
  case Nil => xs
  case head :: tail => head :: removeDuplicates(for (x <- tail if x != head) yield x)
}

答案 6 :(得分:-1)

inArr.distinct foreach println _

答案 7 :(得分:-3)

算法方式......

def dedupe(str: String): String = {
  val words = { str split " " }.toList

  val unique = words.foldLeft[List[String]] (Nil) {
    (l, s) => {
      val test = l find { _.toLowerCase == s.toLowerCase } 
      if (test == None) s :: l else l
    }
  }.reverse

  unique mkString " "
}