我需要从HDFS文件夹中读取文件。我正在使用以下代码读取文件。它正在读取我们在最近1分钟内创建的文件,但不会读取早于1分钟的现有文件。
val filterF = new Function[Path, Boolean] {
def apply(x: Path): Boolean = {
println("looking if "+x+" to be consider or not")
val flag: Boolean = true
return flag
}
}
def processStream(inputPath: String) = {
val messages = streamingContext.fileStream [LongWritable, Text, TextInputFormat]( "/user/cust/sample", filterF, false).map{case (x, y) => (y.toString)}
val words = messages.flatMap(_.split(" "))
val wordCount = words.map(rec => (rec, 1)).reduceByKey(_ + _)
wordCount.print()
}
可以帮忙吗?
谢谢