Scala:Seq [T]元素的功能聚合=> Seq [Seq [T]](保留顺序)

时间:2011-11-10 20:55:07

标签: scala collections functional-programming

我希望聚合序列中的兼容元素,即将Seq[T]转换为Seq[Seq[T]],其中每个子序列中的元素彼此兼容,同时保留原始seq顺序,例如,从

case class X(i: Int, n: Int) {
  def canJoin(that: X): Boolean = this.n == that.n
  override val toString = i + "." + n
}
val xs = Seq(X(1, 1), X(2, 3), X(3, 3), X(4, 3), X(5, 1), X(6, 2), X(7, 2), X(8, 1))
/* xs = List(1.1, 2.3, 3.3, 4.3, 5.1, 6.2, 7.2, 8.1) */

想要获得

val js = join(xs)
/* js = List(List(1.1), List(2.3, 3.3, 4.3), List(5.1), List(6.2, 7.2), List(8.1)) */

我试图以功能的方式做到这一点,但我中途陷入困境:

使用while循环

def split(seq: Seq[X]): (Seq[X], Seq[X]) = seq.span(_ canJoin seq.head)
def join(seq: Seq[X]): Seq[Seq[X]] = {
  var pp = Seq[Seq[X]]()
  var s = seq
  while (!s.isEmpty) {
    val (p, r) = split(s)
    pp :+= p
    s = r
  }
  pp
}

split我很满意,但join似乎有点太长了。

在我看来,这是一项标准任务。这引出了我的问题:

  1. 是否有集合库中的功能 可以减少代码大小吗?
  2. 或者可能有不同的方法来解决这个任务?尤其是另一个 方法比Rewriting a sequence by partitioning and collapsing
  3. 用尾递归替换while循环

    def join(xs: Seq[X]): Seq[Seq[X]] = {
      @annotation.tailrec
      def jointr(pp: Seq[Seq[X]], rem: Seq[X]): Seq[Seq[X]] = {
        val (p, r) = split(rem)
        val pp2 = pp :+ p
        if (r.isEmpty) pp2 else jointr(pp2, r)
      }
      jointr(Seq(), xs)
    }
    

3 个答案:

答案 0 :(得分:8)

def join(seq: Seq[X]): Seq[Seq[X]] = {
  if (seq.isEmpty) return Seq()
  val (p,r) = split(seq)
  Seq(p) ++ join(r)
}

答案 1 :(得分:4)

以下是foldLeft版本:

def join(seq: Seq[X]) = xs.reverse.foldLeft(Nil: List[List[X]]) {
    case ((top :: group) :: rest, x) if x canJoin top => 
        (x :: top :: group) :: rest
    case (list, x) => (x :: Nil) :: list
} 

foldRight版本(在这种情况下,您不需要reverse列表):

def join(seq: Seq[X]) = xs.foldRight(Nil: List[List[X]]) {
    case (x, (top :: group) :: rest) if x canJoin top => 
        (x :: top :: group) :: rest
    case (x, list) => (x :: Nil) :: list
} 

答案 2 :(得分:3)

基准

因为我有太多的时间;-),我问自己,因为不同的方法的运行时间是为了感觉重型构造是否潜伏在轻量级语法背后。

所以我创建了一个微测量基准来测量三个序列的运行时间

(1, 3, 3, 3, 1, 2, 2, 1)
(1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 8, 8, 7, 6, 5, 4, 3, 3, 3, 2, 1, 2, 3)
(2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 7, 6, 5, 4, 4, 4, 4, 3, 3, 3, 2, 1)

并得到以下结果:

摘要

编辑:新结果(开始):

在我的真实项目中纳入结果时,我遇到了基准测试的不一致性。因此,我再次使用更多热身圈(现在为1000)重复基准测试,因此JIT编译器可以充分利用代码。因此,对结果进行了洗牌,并为我带来了新的喜爱: X7(pimp my lib)=快乐无悔。而List版本X8(reverse.foldLeft)现在也非常快。

Nr (Approach)                      Running time (ns)  Contributor
X2 (poor.reference.impl)            in    15.202 ns
X1 (original while loop)            in     8.166 ns
X3 (tail recursion)                 in     7.473 ns
X4 (recursion with ++)              in     6.671 ns   Peter Schmitz
X5 (simplified recursion with ++)   in     6.161 ns   Peter Schmitz
X6 (foldRight)                      in     4.083 ns   tenshi
X7 (pimp my lib)                    in     1.677 ns   Rex Kerr
X8 (reverse.foldLeft)               in     1.349 ns   tenshi

编辑:新结果(结束)

旧结果:

Nr (Approach)                      Running time (ns)  Contributor
X2 (poor.reference.impl)            in 2.972.015 ns
X7 (pimp my lib)                    in 1.185.599 ns   Rex Kerr
X3 (tail recursion)                 in 1.027.008 ns
X8 (reverse.foldLeft)               in   643.840 ns   tenshi
X6 (foldRight)                      in   608.112 ns   ""
X1 (original while loop)            in   564.726 ns
X4 (recursion with ++)              in   468.478 ns   Peter Schmitz
X5 (simplified recursion with ++)   in   447.699 ns   ""

详细

X2(poor.reference.impl)

// in    15.202 ns
import collection.mutable.ArrayBuffer
def join2(seq: Seq[X]): Seq[Seq[X]] = {
  var pp = Seq[ArrayBuffer[X]](ArrayBuffer(seq(0)))
  for (i <- 1 until seq.size) {
    if (seq(i) canJoin seq(i - 1)) {
      pp.last += seq(i)
    } else {
      pp :+= ArrayBuffer(seq(i))
    }
  }
  pp
}

X1(while循环)

// in     8.166 ns
def join(xs: Seq[X]): Seq[Seq[X]] = {
  var xss = Seq.empty[Seq[X]]
  var s = xs
  while (!s.isEmpty) {
    val (p, r) = split(s)
    xss :+= p
    s = r
  }
  xss
}

这是问题开头的原始必要方法。

X3(尾递归)

// in     7.473 ns
def join(xs: Seq[X]): Seq[Seq[X]] = {
  @annotation.tailrec
  def jointr(xss: Seq[Seq[X]], rxs: Seq[X]): Seq[Seq[X]] = {
    val (g, r) = split(rxs)
    val xsn = xss :+ g
    if (r.isEmpty) xsn else jointr(xsn, r)
  }
  jointr(Seq(), xs)
}

X4(用++递归)

// in     6.671 ns
def join(seq: Seq[X]): Seq[Seq[X]] = {
  if (seq.isEmpty) return Seq()
  val (p, r) = split(seq)
  Seq(p) ++ join(r)
}

X5(用++简化递归)

// in     6.161 ns
def join(xs: Seq[X]): Seq[Seq[X]] = if (xs.isEmpty) Seq() else {
  val (p, r) = split(xs)
  Seq(p) ++ join(r)
}

简化几乎相同,但仍然快一点。

X6(foldRight)

// in     4.083 ns
def join(xs: Seq[X]) = xs.foldRight(Nil: List[List[X]]) {
  case (x, (top :: group) :: rest) if x canJoin top => (x :: top :: group) :: rest
  case (x, list)                                    => (x :: Nil) :: list
}

试图避免使用reversefoldRight似乎比列表的reverse.foldLeft更糟糕。

X7(pimp my lib)

// in     1.677 ns
import collection.generic.CanBuildFrom
class GroupingCollection[A, C, D[C]](ca: C)(
    implicit c2i: C => Iterable[A],
    cbf: CanBuildFrom[C, C, D[C]],
    cbfi: CanBuildFrom[C, A, C]) {
  def groupedWhile(p: (A, A) => Boolean): D[C] = {
    val it = c2i(ca).iterator
    val cca = cbf()
    if (!it.hasNext) cca.result
    else {
      val as = cbfi()
      var olda = it.next
      as += olda
      while (it.hasNext) {
        val a = it.next
        if (p(olda, a)) as += a
        else { cca += as.result; as.clear; as += a }
        olda = a
      }
      cca += as.result
    }
    cca.result
  }
}
implicit def collections_have_grouping[A, C[A]](ca: C[A])(
  implicit c2i: C[A] => Iterable[A],
  cbf: CanBuildFrom[C[A], C[A], C[C[A]]],
  cbfi: CanBuildFrom[C[A], A, C[A]]) = {
  new GroupingCollection[A, C[A], C](ca)(c2i, cbf, cbfi)
}
// xs.groupedWhile(_ canJoin _)

X8(reverse.foldLeft)

// in     1.349 ns
def join(xs: Seq[X]) = xs.reverse.foldLeft(Nil: List[List[X]]) {
  case ((top :: group) :: rest, x) if x canJoin top => (x :: top :: group) :: rest
  case (list, x)                                    => (x :: Nil) :: list
}

结论

不同的方法(X1,X3,X4,X5,X6)都在同一个联赛中发挥。

因为 X7(pimp my lib)允许非常简洁的使用xs.groupedWhile(_ canJoin _)并导致必要的代码可以隐藏在自己的util lib中,我决定使用它我的真实项目。