我有一个时间序列,我想用xt
来预测xt + 1
。我正在使用sklearn的支持向量回归,但我无法理解我在预测中发生了这种错误。这是我的代码和结果(在图中)。
bts_sup = timeseries_to_supervised(bts,1)
bts_sup = bts_sup.iloc[1:,:] # delete the line because x0 don't have antecedant
train, test = split_data(bts_sup)
# sacling data
scaler_in = MinMaxScaler() # for inputs
scaler_out = MinMaxScaler() # for outputs
X_train = scaler_in.fit_transform(train[:,0].reshape(-1,1))
y_train = scaler_out.fit_transform(train[:,1].reshape(-1,1))
X_test = scaler_in.transform(test[:,0].reshape(-1,1))
y_test = scaler_out.transform(test[:,1].reshape(-1,1))
param_grid = {"C": np.linspace(10**(-2),10**3,100),
'gamma': np.linspace(0.0001,1,20)}
mod = SVR(epsilon = 0.1,kernel='rbf')
model = GridSearchCV(estimator = mod, param_grid = param_grid,
scoring = "neg_mean_squared_error",verbose = 0)
best_model = model.fit(X_train, y_train.ravel())
#prediction
predicted_tr = model.predict(X_train)
predicted_te = model.predict(X_test)
# inverse_transform because prediction is done on scaled inputs
predicted_tr = scaler_out.inverse_transform(predicted_tr.reshape(-1,1))
predicted_te = scaler_out.inverse_transform(predicted_te.reshape(-1,1))
#plot
forcast = np.concatenate((predicted_tr,predicted_te))
real = np.concatenate((train[:,1],test[:,1]))
plt.plot(real, color = 'blue', label = 'Real Erlangs')
plt.plot(forcast,"--", linewidth=2,color = 'red', label = 'Predicted Erlangs')
plt.title('Erlangs Prediction--'+data_set.columns[choice])
plt.xlabel('Time')
plt.ylabel('Erlangs')
plt.legend()
plt.show()
#error
print("MSE: ", mse(real,forcast), " R2: ", r2_score(real,forcast))
print(best_model.best_params_)
[[9.26 11.01] [11.01 22.72] [22.72 20.75] [20.75 11.54] [11.54 11.85] [11.85 18.17] [18.17 16.05] [16.05 17.98] [17.98 14.85] [14.85 12.62] [12.62 16.95] [16.95 16.81] [16.81 16.23] [16.23 21.81] [21.81 22.47] [22.47 20.37] [20.37 16.68] [16.68 17.07] [17.07 20.48] [20.48 21.99] [21.99 25.54] [25.54 21.1] [21.1 16.91] [16.91 24.23] [24.23 27.37] [27.37 30.55] [30.55 28.47] [28.47 26.74] [26.74 40.37] [40.37 36.55] [36.55 39.65] [39.65 45.58] [45.58 48.91] [48.91 37.82] [37.82 39.7] [39.7 36.09] [36.09 25.33] [25.33 23.64] [23.64 18.33] [18.33 21.59] [21.59 22.4] [22.4 15.89] [15.89 18.94] [18.94 21.78] [21.78 19.38] [19.38 17.81] [17.81 21.33] [21.33 22.61] [22.61 27.11] [27.11 26.48] [26.48 19.87] [19.87 18.57] [18.57 14.03] [14.03 18.82] [18.82 22.46] [22.46 22.33] [22.33 21.58] [21.58 22.66] [22.66 19.51] [19.51 21.54] [21.54 20.58] [20.58 20.48]]
[[20.48 25.78] [25.78 21.89] [21.89 19.61] [19.61 22.95] [22.95 21.67] [21.67 26.03] [26.03 21.96] [21.96 21.81] [21.81 21.91] [21.91 21.82] [21.82 19.6] [19.6 24.61] [24.61 30.97] [30.97 18.29] [18.29 19.84] [19.84 20.81] [20.81 29.17] [29.17 24.01] [24.01 21.3] [21.3 25.08] [25.08 27.18] [27.18 26.59] [26.59 25.99] [25.99 28.74] [28.74 25.32] [25.32 27.56] [27.56 28.69]]
答案 0 :(得分:0)
根据我的观察,该模型预测的值接近于先前时间段,该时间段已作为输入数据给出。当x_t
较低时,可以观察到较小的方差;模型预测x_t+1
稍高,而x_t为高值时则相反。
这似乎是模型的最佳猜测,只有一个滞后功能。
需要改进的方法可以添加5-10个滞后的附加功能,并让模型学习模式不断运行。
对于更复杂的模型,如果SVM不起作用,则可以尝试使用RNN进行预测。