Question

我有一个文本文件，其中包含以下（键，值）格式编写的数据：

1,34
5,67
8,88

该文件放在本地文件系统中。

我想将它转换为一个hadoop序列文件，再次在本地文件系统上，以便在mahout中使用它。序列文件应该包含所有记录。例如，对于记录1，1是键，34是值。其他记录也是如此。

我是Java新手。我将不胜感激。

感谢。

Answer 1

我确实找到了一条路。这是代码：

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;

public class CreateSequenceFile {
    public static void main(String[] argsx) throws FileNotFoundException, IOException 
      {
       String myfile = "/home/ashokharnal/keyvalue.txt";
       String outputseqfile =  "/home/ashokharnal/part-0000";
       Path path = new Path(outputseqfile);

       //open input file
       BufferedReader br = new BufferedReader(new FileReader(myfile));
       //create Sequence Writer
       Configuration conf = new Configuration();        
       FileSystem fs = FileSystem.get(conf);
       SequenceFile.Writer writer = new SequenceFile.Writer(fs,conf,path,LongWritable.class,Text.class);
       LongWritable key ; 
       Text value ;
       String line = br.readLine();
       String field_delimiter = ",";
       String[] temp;
       while (line != null) {
          try
           {
               temp = line.split(field_delimiter);
               key = new LongWritable(Integer.valueOf(temp[0]))  ;
               value = new Text(temp[1].toString());
               writer.append(key,value);    
               System.out.println("Appended to sequence file key " + key.toString() + " and value " + value.toString());
               line = br.readLine();    
           }
           catch(Exception ex)
           {
              ex.printStackTrace();
           }
      }        
    writer.close();
}
}

编写hadoop序列文件

1 个答案: