从.txt转换为.arff时,丢失第一个记录集,weka CSVLoader

时间:2013-10-12 23:00:39

标签: java weka arff

我有一个.txt格式的训练数据集,我将.txt文件转换为.arff文件。 这样做时,我松开了第一条记录,因为它将.txt文件中的第一条记录作为整个文件的属性。 (我使用的是.txt文件,它是制表符分隔格式)

这是我将文件转换为.arff的代码,有一种方法可以保留第一条记录,也可以作为整个文件的属性。

http://weka.wikispaces.com/Converting+CSV+to+ARFF

import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;

import java.io.File;

public class CSV2Arff {
  /**
   * takes 2 arguments:
   * - CSV input file
   * - ARFF output file
   */
  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.out.println("\nUsage: CSV2Arff <input.csv> <output.arff>\n");
      System.exit(1);
    }

    // load CSV
    CSVLoader loader = new CSVLoader();
    loader.setSource(new File(args[0]));
    Instances data = loader.getDataSet();

    // save ARFF
    ArffSaver saver = new ArffSaver();
    saver.setInstances(data);
    saver.setFile(new File(args[1]));
    saver.setDestination(new File(args[1]));
    saver.writeBatch();
  }
}

这是一个非常大的训练数据集的前两个记录。

39   State-gov  77516    Bachelors  13   Never-married   Adm-clerical    Not-in-family   White   Male   2174    0   40   United-States   <=50K
50   Self-emp-not-inc   83311    Bachelors  13   Married-civ-spouse  Exec-managerial     Husband     White   Male   0   0   13   United-States   <=50K

运行代码生成.arff后,第一条记录被视为属性。因此考虑减少1个记录。我希望该记录在属性中,并将以下的训练数据设置为。

@relation training.txt

@attribute 39 numeric
@attribute ' State-gov' {' Self-emp-not-inc',' Private',' State-gov',' Federal-gov',' Local-gov',' ?',' Self-emp-inc',' Without-pay',' Never-worked'}
@attribute 77516 numeric
@attribute ' Bachelors' {' Bachelors',' HS-grad',' 11th',' Masters',' 9th',' Some-college',' Assoc-acdm',' Assoc-voc',' 7th-8th',' Doctorate',' Prof-school',' 5th-6th',' 10th',' 1st-4th',' Preschool',' 12th'}
@attribute 13 numeric
@attribute ' Never-married' {' Married-civ-spouse',' Divorced',' Married-spouse-absent',' Never-married',' Separated',' Married-AF-spouse',' Widowed'}
@attribute ' Adm-clerical' {' Exec-managerial',' Handlers-cleaners',' Prof-specialty',' Other-service',' Adm-clerical',' Sales',' Craft-repair',' Transport-moving',' Farming-fishing',' Machine-op-inspct',' Tech-support',' ?',' Protective-serv',' Armed-Forces',' Priv-house-serv'}
@attribute ' Not-in-family' {' Husband',' Not-in-family',' Wife',' Own-child',' Unmarried',' Other-relative'}
@attribute ' White' {' White',' Black',' Asian-Pac-Islander',' Amer-Indian-Eskimo',' Other'}
@attribute ' Male' {' Male',' Female'}
@attribute 2174 numeric
@attribute 0 numeric
@attribute 40 numeric
@attribute ' United-States' {' United-States',' Cuba',' Jamaica',' India',' ?',' Mexico',' South',' Puerto-Rico',' Honduras',' England',' Canada',' Germany',' Iran',' Philippines',' Italy',' Poland',' Columbia',' Cambodia',' Thailand',' Ecuador',' Laos',' Taiwan',' Haiti',' Portugal',' Dominican-Republic',' El-Salvador',' France',' Guatemala',' China',' Japan',' Yugoslavia',' Peru',' Outlying-US(Guam-USVI-etc)',' Scotland',' Trinadad&Tobago',' Greece',' Nicaragua',' Vietnam',' Hong',' Ireland',' Hungary',' Holand-Netherlands'}
@attribute ' <=50K' {' <=50K',' >50K'}

@data
50,' Self-emp-not-inc',83311,' Bachelors',13,' Married-civ-spouse',' Exec-managerial',' Husband',' White',' Male',0,0,13,' United-States',' <=50K'

感谢您的帮助。

0 个答案:

没有答案
相关问题