New to spark and scala. Trying to achieve below. My Messages look like below (key, id, version, dataObject)
val transformedRDD = processedMessages.flatMap(message => {
message.isProcessed match {
case true => Some(message.key, message.id, message.version, message)
case false => None
}
}).groupByKey
I want to group by ID on each message and get latest version of message, then groupbykey, then call a predefined method which looks like below
Ingest(key,RDD[dataObject])
答案 0 :(得分:0)
In most cases, you should avoid <?xml version="1.0" encoding="utf-8"?>
<configuration>
<configSections>
</configSections>
<system.diagnostics>
<assert assertuienabled="false"/>
</system.diagnostics>
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
</startup>
</configuration>
as it may result in a re-shuffle which can be very expensive. In your use case, you do not require a groupByKey
and can use groupByKey
instead.
reduceByKey