Question

TypeError：不可散列的类型：load_boston数据上的'slice'

我尝试了boston.iloc和boston.loc并得到了属性错误：iloc

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
boston = load_boston()
print(boston.data.shape)
print("Data shape: {}".format(boston.data.shape))
print('The first few lines of data: {}'.format(boston.data[0:5,:]))
m = len(boston)
X = boston[:,0]
y = boston[:,1]

print("Number of examples: {}".format(m))
print("Shape of data     : {}".format(X.shape))
print("Shape of labels   : {}".format(y.shape))

Answer 1

如果运行print(boston.keys())，您将获得输出

>>> print(boston.keys())
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])

您应该首先使用DataFrame将数据转换为bos = pd.DataFrame(boston.data)，如下所示：

>>> bos = pd.DataFrame(boston.data)
>>> print(bos.head())
        0     1     2    3      4      5     6       7    8      9     10      11    12
0  0.00632  18.0  2.31  0.0  0.538  6.575  65.2  4.0900  1.0  296.0  15.3  396.90  4.98
1  0.02731   0.0  7.07  0.0  0.469  6.421  78.9  4.9671  2.0  242.0  17.8  396.90  9.14
2  0.02729   0.0  7.07  0.0  0.469  7.185  61.1  4.9671  2.0  242.0  17.8  392.83  4.03
3  0.03237   0.0  2.18  0.0  0.458  6.998  45.8  6.0622  3.0  222.0  18.7  394.63  2.94
4  0.06905   0.0  2.18  0.0  0.458  7.147  54.2  6.0622  3.0  222.0  18.7  396.90  5.33

然后您可能想知道为什么该列仅显示其索引而不显示其名称。事实证明列名不是直接嵌入的，并且回想一下，我们有列名列表。因此，让我们将索引转换为列名称：

bos.columns = boston.feature_names
print(bos.head())

给出输出：

>>> print(bos.head())
      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  PTRATIO       B  LSTAT
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0     15.3  396.90   4.98
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0     17.8  396.90   9.14
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0     17.8  392.83   4.03
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0     18.7  394.63   2.94
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0     18.7  396.90   5.33

那我相信您希望PRICE作为您的y：

bos['PRICE'] = boston.target
print(bos.head())

具有输出

>>> print(bos.head())
      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  PTRATIO       B  LSTAT  PRICE
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0     15.3  396.90   4.98   24.0
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0     17.8  396.90   9.14   21.6
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0     17.8  392.83   4.03   34.7
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0     18.7  394.63   2.94   33.4
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0     18.7  396.90   5.33   36.2

然后最后将您的数据集分为X和`Yy：

X = bos.drop('PRICE', axis = 1)
y = bos['PRICE']

下一部分是回归，但首先将数据进一步分为训练和测试集：

X_train, X_test, Y_train, Y_test = sklearn.cross_validation.train_test_split(X, y, test_size = 0.3, random_state = 5)

您可以打印出如下形状：

print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)

最后拟合线性模型：

from sklearn.linear_model import LinearRegression

lm = LinearRegression()
lm.fit(X_train, Y_train)

Y_pred = lm.predict(X_test)

Python TypeError：不可散列的类型：scikit load_boston数据上的'slice'

1 个答案: