Kryo / Chill-Scala序列化程序 - 序列化包含其他类

时间:2015-10-21 07:10:49

标签: scala serialization apache-spark kryo scalding

我想序列化一个Scalding TypedPipe[MyClass]并在Spark 1.5.1中对其进行去序列化。

我能够使用kryo和Twitter的Chill for Scala序列化/反序列化一个只包含“原语”的“简单”案例类,例如布尔和地图:

//In Scalding
case class MyClass(val foo: Boolean) extends Serializable {}

val data = ... //TypedPipe[MyClass]

def serialize[A](data: A) = {
  val instantiator = new ScalaKryoInstantiator
  instantiator.setRegistrationRequired(false)
  val kryo = instantiator.newKryo()
  val bao = new ByteArrayOutputStream
  val output = new Output(bao)
  kryo.writeObject(output, data)
  output.close
  bao.toByteArray()
}

data.map(t => (NullWritable.get, new BytesWritable(serialize(t))))
  .write(WritableSequenceFile(outPath))

//In Spark:
def deserialize[A](ser: Array[Byte], clazz: Class[A]): A = {
  val instantiator = new ScalaKryoInstantiator
  instantiator.setRegistrationRequired(false)
  val kryo = instantiator.newKryo()
  val input = new Input(new ByteArrayInputStream(ser))
  val deserData = kryo.readObject(input, clazz)
  deserData
}

sc.sequenceFile(inPath, classOf[NullWritable], classOf[BytesWritable]).map(_._2)
  .map(t => deserialize(t.get, classOf[MyClass])) //where 'sc' is SparkContext

我还能够序列化/反序列化一个“复杂”类,其中包含由我编写的其他自定义类的成员(例如org.joda.time.LocalDate)。我在序列化和反序列化期间按照Kryo文档中提到的顺序注册类,使用kryo的默认Serializer:

//In Scalding
class MyClass2(val bar: MyClass, val someDate: LocalDate) extends Serializable {}

def serialize[A](data: A) = {
  val instantiator = new ScalaKryoInstantiator
  instantiator.setRegistrationRequired(false)
  val kryo = instantiator.newKryo()
  kryo.register(classOf[MyClass2])
  kryo.register(classOf[MyClass])
  kryo.register(classOf[LocalDate])
  kryo.register(classOf[ISOChronology])
  kryo.register(classOf[GregorianChronology])
  val bao = new ByteArrayOutputStream
  val output = new Output(bao)
  kryo.writeObject(output, data)
  output.close
  bao.toByteArray()
}

//In Spark
def deserialize[A](ser: Array[Byte], clazz: Class[A]): A = {  
  val instantiator = new ScalaKryoInstantiator
  instantiator.setRegistrationRequired(false)
  val kryo = instantiator.newKryo()
  kryo.register(classOf[MyClass2])
  kryo.register(classOf[MyClass])
  kryo.register(classOf[LocalDate])
  kryo.register(classOf[ISOChronology])
  kryo.register(classOf[GregorianChronology])
  val input = new Input(new ByteArrayInputStream(ser))
  val deserData = kryo.readObject(input, clazz)
  deserData
}

a)如上所述,这有效,但似乎过于冗长。我错过了一种更简单的方法吗?

b)当我只注册LocalDate时,Spark抱怨它没有“知道”ISOChronology。当我注册ISOChronology时,它抱怨它不知道GregorianChronology。我注册了GregorianChronology并且Spark停止了抱怨并且一切正常。有没有办法注册LocalDate“及其中的所有内容”?

0 个答案:

没有答案