Question

我试图在Hadoop服务器（非本地）上使用Apache Beam读取文件。问题是：我该怎么做？我通过Beam阅读了一些关于Hadoop I / O格式的内容：

https://beam.apache.org/documentation/io/built-in/hadoop/

我不太了解这一部分：

Configuration myHadoopConfiguration = new Configuration(false);
THIS --> // Set Hadoop InputFormat, key and value class in configuration <-- THIS
myHadoopConfiguration.setClass("mapreduce.job.inputformat.class", 
InputFormatClass,
InputFormat.class);
myHadoopConfiguration.setClass("key.class", InputFormatKeyClass, Object.class);
myHadoopConfiguration.setClass("value.class", InputFormatValueClass, Object.class);

如何设置此格式？我需要创建课程吗？因为如果我这样做，这段代码就不起作用了。感谢

Answer 1

标准默认输入格式为TextInputFormat，extends FileInputFormat<LongWritable,Text>

它将Long值读作文件中的字节偏移量。 import org.apache.hadoop.io.LongWritable

并且Text值为奇异线。 import org.apache.hadoop.io.Text

该代码无效，因为InputFormatClass，InputFormatKeyClass或InputFormatValueClass不是实际变量

如何使用Apache Beam读取Hadoop文件？

1 个答案: