在LDA中应用fit_transform时输入形状错误

时间:2019-03-26 04:23:36

标签: machine-learning scikit-learn lda

当我尝试应用LDA的get_dummies()方法时,为了训练和测试目的而拆分数据集后,我在数据集中应用了fit_transform()方法。

  

ValueError:输入形状错误(26905,8)

我在做什么错?我不确定问题是由于get_dummies()方法引起的还是我遗漏的其他问题

# Sample Code


df = pd.read_csv('/Users/rushirajparmar/Downloads/Problem 16 (1)/Problem 16/Problem 16/train_file.csv')


df.drop(['UsageClass','CheckoutType','CheckoutYear','CheckoutMonth'],axis = 1,inplace = True)


Y=pd.get_dummies(df,columns = ['MaterialType'])
X=pd.get_dummies(df,columns = ['Title','Creator','Subjects','Publisher','PublicationYear'])


X.drop(['MaterialType'],axis = 1,inplace = True)


Y.drop(['ID','Checkouts','Title','Creator','Subjects','Publisher','PublicationYear'],axis = 1,inplace = True)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.15)


from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 1)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)

数据集:

这里是train_file.csv供参考

1 个答案:

答案 0 :(得分:1)

您不必将get_dummies应用于目标变量。您可以直接将多类别标签提供给LDA

From Documentation:

  

fit_transform(X,y = None,** fit_params)

     

适合数据,然后   对其进行转换。

     

使用可选参数fit_params和   返回X的转换版本。

     

参数:
   X: numpy形状的数组[n_samples,n_features]训练   设置。

     

y:形状为[n_samples]个目标值的numpy数组。

     

返回值:X_new:形状为[n_samples,n_features_new]的numpy数组   转换后的数组。

因此,您的y必须是一维的。

X_train, X_test, y_train, y_test = train_test_split(X, df['MaterialType'], test_size = 0.15)

lda = LDA(n_components = 1)
X_train = lda.fit_transform(X_train, y_train)