Question

最简单的线性回归示例存在问题。在输出，系数为零，我做错了什么？谢谢你的帮助。

[[1]]
[1] 0 1 2 3

[[2]]
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
[30] 2.9 3.0

输出：

import sklearn.linear_model as lm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

x = [25,50,75,100]
y = [10.5,17,23.25,29]
pred = [27,41,22,33]
df = pd.DataFrame({'x':x, 'y':y, 'pred':pred})
x = df['x'].values.reshape(1,-1)
y = df['y'].values.reshape(1,-1)
pred = df['pred'].values.reshape(1,-1)
plt.scatter(x,y,color='black')
clf = lm.LinearRegression(fit_intercept =True)
clf.fit(x,y)


m=clf.coef_[0]
b=clf.intercept_
print("slope=",m, "intercept=",b)

Answer 1

仔细考虑一下。鉴于您有多个系数返回表明您有多个因素。由于它是单个回归，因此问题在于输入数据的形状。你原来的重塑让班级认为你有4个变量，每个变量只有一个观察。

尝试这样的事情：

import sklearn.linear_model as lm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

x = np.array([25,99,75,100, 3, 4, 6, 80])[..., np.newaxis]
y = np.array([10.5,17,23.25,29, 1, 2, 33, 4])[..., np.newaxis]

clf = lm.LinearRegression()
clf.fit(x,y)
clf.coef_

输出：

array([[ 0.09399429]])

Answer 2

正如@jrjames83在重塑形式（.reshape(1,-1)）后的答案中已经解释的那样，你正在提供一个包含一个样本（行）和四个特征（列）的数据集：

In [103]: x.shape
Out[103]: (1, 4)

很可能你想以这种方式重塑它：

In [104]: x = df['x'].values.reshape(-1, 1)

In [105]: x.shape
Out[105]: (4, 1)

这样你就有四个样本和一个特征......

或者您可以将DataFrame列传递给您的模型，如下所示（不需要使用其他变量污染您的内存）：

In [98]: clf = lm.LinearRegression(fit_intercept =True)

In [99]: clf.fit(df[['x']],df['y'])
Out[99]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [100]: clf.coef_
Out[100]: array([0.247])

In [101]: clf.intercept_
Out[101]: 4.5

SKlearn线性回归系数等于0

2 个答案: