使用SVR预测时间序列

时间:2018-12-21 19:58:41

标签: python machine-learning scikit-learn regression svm

我有一个时间序列,我想用xt来预测xt + 1。我正在使用sklearn的支持向量回归,但我无法理解我在预测中发生了这种错误。这是我的代码和结果(在图中)。

bts_sup = timeseries_to_supervised(bts,1)
bts_sup = bts_sup.iloc[1:,:]   # delete the line because x0 don't have antecedant
train, test = split_data(bts_sup)

# sacling data
scaler_in = MinMaxScaler()  #  for inputs
scaler_out = MinMaxScaler()  # for outputs

X_train = scaler_in.fit_transform(train[:,0].reshape(-1,1))
y_train = scaler_out.fit_transform(train[:,1].reshape(-1,1))

X_test = scaler_in.transform(test[:,0].reshape(-1,1))
y_test = scaler_out.transform(test[:,1].reshape(-1,1))


param_grid = {"C": np.linspace(10**(-2),10**3,100),
             'gamma': np.linspace(0.0001,1,20)}

mod = SVR(epsilon = 0.1,kernel='rbf')
model = GridSearchCV(estimator = mod, param_grid = param_grid,
                                   scoring = "neg_mean_squared_error",verbose = 0)

best_model = model.fit(X_train, y_train.ravel())

#prediction
predicted_tr = model.predict(X_train)
predicted_te = model.predict(X_test)

# inverse_transform because prediction is done on scaled inputs
predicted_tr = scaler_out.inverse_transform(predicted_tr.reshape(-1,1))
predicted_te = scaler_out.inverse_transform(predicted_te.reshape(-1,1))

#plot
forcast = np.concatenate((predicted_tr,predicted_te))
real = np.concatenate((train[:,1],test[:,1]))
plt.plot(real, color = 'blue', label = 'Real Erlangs')
plt.plot(forcast,"--", linewidth=2,color = 'red', label = 'Predicted Erlangs')
plt.title('Erlangs Prediction--'+data_set.columns[choice])
plt.xlabel('Time')
plt.ylabel('Erlangs')
plt.legend()
plt.show()


#error
print("MSE: ", mse(real,forcast), " R2: ", r2_score(real,forcast))
print(best_model.best_params_)

火车

  

[[9.26 11.01] [11.01 22.72] [22.72 20.75] [20.75 11.54] [11.54   11.85] [11.85 18.17] [18.17 16.05] [16.05 17.98] [17.98 14.85] [14.85 12.62] [12.62 16.95] [16.95 16.81] [16.81 16.23] [16.23   21.81] [21.81 22.47] [22.47 20.37] [20.37 16.68] [16.68 17.07] [17.07 20.48] [20.48 21.99] [21.99 25.54] [25.54 21.1] [21.1   16.91] [16.91 24.23] [24.23 27.37] [27.37 30.55] [30.55 28.47] [28.47 26.74] [26.74 40.37] [40.37 36.55] [36.55 39.65] [39.65   45.58] [45.58 48.91] [48.91 37.82] [37.82 39.7] [39.7 36.09] [36.09 25.33] [25.33 23.64] [23.64 18.33] [18.33 21.59] [21.59   22.4] [22.4 15.89] [15.89 18.94] [18.94 21.78] [21.78 19.38] [19.38 17.81] [17.81 21.33] [21.33 22.61] [22.61 27.11] [27.11   26.48] [26.48 19.87] [19.87 18.57] [18.57 14.03] [14.03 18.82] [18.82 22.46] [22.46 22.33] [22.33 21.58] [21.58 22.66] [22.66   19.51] [19.51 21.54] [21.54 20.58] [20.58 20.48]]

测试

  

[[20.48 25.78] [25.78 21.89] [21.89 19.61] [19.61 22.95] [22.95   21.67] [21.67 26.03] [26.03 21.96] [21.96 21.81] [21.81 21.91] [21.91 21.82] [21.82 19.6] [19.6 24.61] [24.61 30.97] [30.97   18.29] [18.29 19.84] [19.84 20.81] [20.81 29.17] [29.17 24.01] [24.01 21.3] [21.3 25.08] [25.08 27.18] [27.18 26.59] [26.59   25.99] [25.99 28.74] [28.74 25.32] [25.32 27.56] [27.56 28.69]]

Result graph

1 个答案:

答案 0 :(得分:0)

根据我的观察,该模型预测的值接近于先前时间段,该时间段已作为输入数据给出。当x_t较低时,可以观察到较小的方差;模型预测x_t+1稍高,而x_t为高值时则相反。

这似乎是模型的最佳猜测,只有一个滞后功能。

需要改进的方法可以添加5-10个滞后的附加功能,并让模型学习模式不断运行。

对于更复杂的模型,如果SVM不起作用,则可以尝试使用RNN进行预测。