如何对训练和测试数据进行逻辑回归?

时间:2020-06-21 04:27:37

标签: python pandas numpy matplotlib logistic-regression

我运行了这段代码,但是lr.fit行上似乎有一个错误。有谁知道该怎么做?

from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn import linear_model
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('2019.csv')
df1 = pd.DataFrame(df,columns=['GDP per capita', 'Social support'])

lr = LogisticRegression()
columns = ['GDP per capita', 'Social support']

X = df[columns]
y = df["Score"]
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20,random_state=0)

lr.fit(X_train,y_train)
predictions = lr.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(accuracy)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-afa10dbaa367> in <module>
     19 X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.30,random_state=0)
     20 
---> 21 lr.fit(X_train,y_train)
     22 predictions = lr.predict(X_test)
     23 accuracy = accuracy_score(y_test, predictions)

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py in fit(self, X, y, sample_weight)
   1526         X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype, order="C",
   1527                          accept_large_sparse=solver != 'liblinear')
-> 1528         check_classification_targets(y)
   1529         self.classes_ = np.unique(y)
   1530         n_samples, n_features = X.shape

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    167     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    168                       'multilabel-indicator', 'multilabel-sequences']:
--> 169         raise ValueError("Unknown label type: %r" % y_type)
    170 
    171 

ValueError: Unknown label type: 'continuous'

最上面是完整的调试错误,当我在X和y旁边执行.astype(int)时,才使它起作用。否则,如果我不这样做,则会发生您所看到的错误。

1 个答案:

答案 0 :(得分:1)

我去了Kaggle,搜索并发现2019.csv有两列。这些数据与世界各国人民的幸福感以及人均GDP与“幸福感评分”有关。很好,为我工作。

无论如何,我编辑了2019.csv,并保留了两个数据列和得分。我有1列=分数,并且必须全为零或零(这非常重要)。我将其他两列重命名为GDP和SS,并删除了所有其他列。

得分,GDP,SS-2019.csv中的列

此代码在Macbook Pro上的PyCharm中运行时产生以下输出:

数字为“准确性”

0.46875

以退出代码0结束的过程

因此,起初并不是那么好(几乎47%的准确率),可以很容易地改进...

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd

df = pd.read_csv('2019.csv')
df.head()

x = df.drop('Score', axis=1)
y = df.Score

lr = LogisticRegression()
columns = ['GDP', 'SS']

X = df[columns]
y = df["Score"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

lr.fit(X_train, y_train)
predictions = lr.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(accuracy)

“”“ 这是输出

0.46875

进程完成,退出代码为0 “”“

希望这会有所帮助。