使用Spark Cassandra Connector将Enum存储为Cassandra作为Integer

时间:2015-12-15 18:17:27

标签: apache-spark cassandra spark-cassandra-connector

我正在尝试使用其Int表示在Cassandra中存储scala Enumeration,但我总是得到com.datastax.spark.connector.types.TypeConversionException。我想知道Enumeration类是特殊情况,还是我做错了。

编辑(2015-12-16)。 让我尝试用代码片段来扩展我的问题,所以我可以更好地传达这个想法。

import org.apache.spark.{SparkConf, SparkContext}

import com.datastax.spark.connector._

object WeekDay {
  sealed abstract class WeekDay(val id: Int)

  case object MON extends WeekDay(0)
  case object TUE extends WeekDay(1)
  case object WED extends WeekDay(2)
  case object THU extends WeekDay(3)
  case object FRI extends WeekDay(4)
  case object SAT extends WeekDay(5)
  case object SUN extends WeekDay(6)

  val values = Map(0 -> MON, 1 -> TUE, 2 -> WED, 3 -> THU, 4 -> FRI, 5 -> SAT, 6 -> SUN)
}
import WeekDay._

object Example {

  case class MyCassandraRow(id: String, weight: Int, day: WeekDay)

  def main (args: Array[String]) {
    val conf = new SparkConf()
      .setAppName("cassandra-connector-example")
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .set("spark.cassandra.connection.host", "127.0.0.1")
      .setMaster("local[*]")
    val sc = new SparkContext(conf)

    val data = sc.parallelize(
      Seq(
        MyCassandraRow("identifier1", 10, MON),
        MyCassandraRow("identifier2", 20, FRI),
        MyCassandraRow("identifier3", 1, SUN)
      )
    )

    data.saveToCassandra("db", "custom_data")
  }
}

如果我使用TEXT为“day”字段创建custom_data表,则此代码可以正常工作,但如果我使用以下stacktrace设置为INT则会失败:

com.datastax.spark.connector.types.TypeConversionException: Cannot convert object FRI of type class WeekDay$FRI$ to java.lang.Integer.
at com.datastax.spark.connector.types.TypeConverter$$anonfun$convert$1.apply(TypeConverter.scala:42)
at com.datastax.spark.connector.types.TypeConverter$$anonfun$convert$1.apply(TypeConverter.scala:40)
at scala.PartialFunction$AndThen.applyOrElse(PartialFunction.scala:185)

所以,我尝试按照https://github.com/datastax/spark-cassandra-connector/blob/master/doc/6_advanced_mapper.md中的描述实现TypeConverter 如下:

implicit object IntToWeekDayConverter extends TypeConverter[WeekDay] {
  def targetTypeTag = typeTag[WeekDay]
  def convertPF = {
    case i: Int => values.getOrElse(i, MON)
  }
}

implicit object WeekDayToIntConverter extends TypeConverter[Int] {
  def targetTypeTag = typeTag[Int]
  def convertPF = {
    case d: WeekDay => d.id
  }
}

但我仍然得到同样的错误。

我在这里发布了整个scala文件:https://gist.github.com/davideanastasia/b0bef569b4b7dec66c3f#file-cassandraenum-scala

1 个答案:

答案 0 :(得分:1)

Enum没有自动转换器 - > Spark Cassandra连接器中的整数。我只想用.id映射该列以获得整数表示。

object WeekDay extends Enumeration {
  type WeekDay = Value
  val Mon, Tue, Wed, Thu, Fri, Sat, Sun = Value
}
import WeekDay._
val meetingDays = Seq(WeekDay.Mon, WeekDay.Wed)
//meetingDays: Seq[WeekDay.Value] = List(Mon, Wed)
meetingDays.map(_.id)
//Seq[Int] = List(0, 2)