Question

我想在Spark 1.6 DataFrame中收集一个特定的Row，它来自一个分区的HiveTable（该表由一个名为date的String列分区并保存为Parquet）

记录由date，section，sample

明确标识

另外，我有以下约束

到目前为止，我使用此查询，但执行需要相当长的时间（使用10个执行程序约25秒）：

sqlContext.table("mytable")
.where($"date"=== date)
.where($"section"=== section)
.where($"sample" === sample)
.collect()(0)

我还尝试将collect()(0)替换为take(1)(0)而不是更快。