ML代码在转换数据时引发值错误

时间:2020-10-28 17:29:09

标签: python scikit-learn numpy encoding transformer

可以找到数据源here

我在编写的某些代码中遇到了绊脚石,因为fit_transform方法连续失败。它引发此错误:

Traceback (most recent call last):

  File "/home/user/Datasets/CSVs/Working/Playstore/untitled0.py", line 18, in <module>
    data = data[oh_cols].apply(oh.fit_transform)

  File "/usr/lib/python3.8/site-packages/pandas/core/frame.py", line 7547, in apply
    return op.get_result()

  File "/usr/lib/python3.8/site-packages/pandas/core/apply.py", line 180, in get_result
    return self.apply_standard()

  File "/usr/lib/python3.8/site-packages/pandas/core/apply.py", line 255, in apply_standard
    results, res_index = self.apply_series_generator()

  File "/usr/lib/python3.8/site-packages/pandas/core/apply.py", line 284, in apply_series_generator
    results[i] = self.f(v)

  File "/usr/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 410, in fit_transform
    return super().fit_transform(X, y)

  File "/usr/lib/python3.8/site-packages/sklearn/base.py", line 690, in fit_transform
    return self.fit(X, **fit_params).transform(X)

  File "/usr/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 385, in fit
    self._fit(X, handle_unknown=self.handle_unknown)

  File "/usr/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 74, in _fit
    X_list, n_samples, n_features = self._check_X(X)

  File "/usr/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 43, in _check_X
    X_temp = check_array(X, dtype=None)

  File "/usr/lib/python3.8/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)

  File "/usr/lib/python3.8/site-packages/sklearn/utils/validation.py", line 620, in check_array
    raise ValueError(

ValueError: Expected 2D array, got 1D array instead:
array=['Everyone' 'Everyone' 'Everyone' ... 'Everyone' 'Mature 17+' 'Everyone'].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

我已经在网上对此进行了一些搜索,并找到了一些潜在的解决方案,但是它们似乎没有用。

这是我的代码:

import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from category_encoders import CatBoostEncoder,CountEncoder,TargetEncoder

data = pd.read_csv("/home/user/Datasets/CSVs/Working/Playstore/data.csv")


oh = OneHotEncoder()
cb = CatBoostEncoder()
ce = CountEncoder()
te = TargetEncoder()

obj = [i for i in data if data[i].dtypes=="object"]
unique = dict(zip(list(obj),[len(data[i].unique()) for i in obj]))
oh_cols = [i for i in unique if unique[i] < 100]
te_cols = [i for i in unique if unique[i] > 100]

data = data[oh_cols].apply(oh.fit_transform)

它将引发上述错误。我看到的一个解决方案建议我在转换数据时使用.values,并尝试了以下操作:

data = data[oh_cols].values.apply(oh.fit_transform)

data = data[oh_cols].apply(oh.fit_transform).values

encoding = np.array(data[oh_cols])
encoding.apply(oh.fit_transform)

第一个和第三个抛出相同的错误,如下所示:

AttributeError: 'numpy.ndarray' object has no attribute 'apply'

第二个错误引发了我再次提到的第一个错误:

ValueError: Expected 2D array, got 1D array instead:

老实说,我很困惑,我不确定从这里去哪里。我从中学到的Kaggle练习进行得很顺利,但是由于某些原因,当我自己动手做事情时,事情就做不到。

0 个答案:

没有答案