How to create the files for training and testing for a dataset

时间:2018-03-25 19:20:01

标签: python machine-learning

Me and my colleagues have created a dataset with *.csv files. Each *.csv file represents a gesture frame. I have several folders and each of them contains *.csv files representing one gesture. So if I have 10 folders, this means that the number of gestures is 10.

Now that I have the dataset for this (distributed in many folders), how to start training and testing different classifiers in Python? I know that for:

clf.fit(features, labels)

I need to have two files - the features and the labels representing these features.

May you please let me know how do I start on this?

1 个答案:

答案 0 :(得分:1)

You'd have to find some way to encode each sample numerically in order to pass it into an SKlearn fit() function.

If the contents of your CSV's are numeric, you'd probably flatten the file into a 1D list of numbers. You may have to then 0-pad some of them to ensure that every sample is the same size. If it's alphabetic instead, you'd do roughly the same thing, but you'd first have to determine a mapping between the contents of your file and some numeric encoding of it.

Your label variable would then be a one-hot encoding of however many classes you're using to classify the gestures.

Make sure to set aside a validation set and a test set for cross-validation and evaluation purposes. In order to fit, pass in a list of all samples and a list of all labels (from the training set) into fit() respectively.