onehotencoder的用法

时间:2015-11-22 02:36:07

标签: python numpy tree encoder

我是python的新手。我之前只有VBA中的代码。最近开始使用python进行数据挖掘,但是使用python时遇到了麻烦

我无法使用onehotencoder正确转换我的catergory功能,这是我的代码

from __future__ import print_function
import os import subprocess from sklearn.preprocessing import OneHotEncoder
from sklearn import preprocessing import csv    
import pandas as pd import numpy as np
from sklearn.tree import DecisionTreeClassifier, export_graphviz     

datapoint = []    
with open('raw2.csv',  'rb') as csvfile:    
    spamreader = csv.reader(csvfile, delimiter=',')    
    for row in spamreader: # Reading each row    
        data_point = []    
        for column in row: # Reading each column of the row    
            data_point.append((column))    
        datapoint.append(data_point)    
datapoint = np.array(datapoint)   

print(datapoint)
enc = preprocessing.OneHotEncoder()
enc.fit(datapoint)
enc.transform(datapoint).toarray()

features = list(df.columns[1:8])
print("* features:", features, sep="\n")
"#fit the decision tree"
y = df[,0]
X = df[features]
dt = DecisionTreeClassifier(min_samples_split=5, random_state=51)
dt.fit(X, y)

""produce graphic visualization""
def visualize_tree(tree, feature_names):
    """Create tree png using graphviz.

    Args
    ----
    tree -- scikit-learn DecsisionTree.
    feature_names -- list of feature names.
    """
    with open("dt.dot", 'w') as f:
        export_graphviz(tree, out_file=f,
                        feature_names=feature_names)

    command = ["dot", "-Tpng", "dt.dot", "-o", "dt.png"]
    try:
        subprocess.check_call(command)
    except:
        exit("Could not run dot, ie graphviz, to "
             "produce visualization")

visualize_tree(dt, features)        

这是我的第一个数据集的样本

['Tobermory' 'Car' '2-3hr' 'Fall' '<$100' '3 days' 'Male' '18 - 23'] 

这是我遇到的错误

ValueError                                Traceback (most recent call
last) <ipython-input-13-0bb2597d0276> in <module>()        
      25 enc = preprocessing.OneHotEncoder()    
 ---> 26 enc.fit(datapoint)    
      27 enc.transform(datapoint).toarray()    

ValueError: invalid literal for int() with base 10: 'Tobermory'    

1 个答案:

答案 0 :(得分:1)

我相信你正在寻找public static string Fibonacci(int n) { if (n < 2) return "1"; int[] numbers = new int[n]; numbers[0]=0; numbers[1]=1; for (int i = 2; i < n; i++) { numbers[i] = numbers[i - 1] + numbers[i - 2]; } return string.Join(" ", numbers); } sklearn.preprocessing.LabelBinarizer取整数并从中创建虚拟变量。

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html