如何使用Keras / Theano for Regression配置一个非常简单的LSTM

时间:2016-05-19 10:29:19

标签: regression theano keras lstm

我正在努力为简单的回归任务配置Keras LSTM。官方网页上有一些非常基本的解释:Keras RNN documentation

但要完全理解,带有示例数据的示例配置将非常有用。

我几乎没有找到使用Keras-LSTM进行回归的示例。大多数示例都是关于分类(文本或图像)。我研究了Keras发行版附带的LSTM示例和我通过Google搜索找到的一个示例:http://danielhnyk.cz/它提供了一些见解,尽管作者承认这种方法的内存效率非常高,因为数据样本必须是存储非常冗余。

虽然评论员(Taha)引入了一项改进,但数据存储仍然是多余的,我怀疑这是Keras开发人员的意图。

我已经下载了一些简单的示例顺序数据,这些数据恰好是来自雅虎财经的股票数据。它可以从雅虎财经免费获得Data

Date,       Open,      High,      Low,       Close,     Volume,   Adj Close
2016-05-18, 94.160004, 95.209999, 93.889999, 94.559998, 41923100, 94.559998
2016-05-17, 94.550003, 94.699997, 93.010002, 93.489998, 46507400, 93.489998
2016-05-16, 92.389999, 94.389999, 91.650002, 93.879997, 61140600, 93.879997
2016-05-13, 90.00,     91.669998, 90.00,     90.519997, 44188200, 90.519997

该表包含8900多条此类Apple股票数据。每天有7列=数据点。要预测的值是“AdjClose”,这是一天结束时的值

因此,目标是根据前几天的顺序预测第二天的 AdjClose 。 (这可能几乎不可能,但是看看工具在具有挑战性的条件下的行为总是很好。)

我认为这应该是LSTM非常标准的预测/回归案例,并且可以轻松转移到其他问题域。

那么,如何将数据格式化(X_train,y_train)以实现最小冗余,以及如何仅使用一个LSTM层和几个隐藏神经元来初始化Sequential模型?

亲切的问候, 西奥

PS:我开始编码:

...
X_train
Out[6]: 
array([[  2.87500000e+01,   2.88750000e+01,   2.87500000e+01,
      2.87500000e+01,   1.17258400e+08,   4.31358010e-01],
   [  2.73750019e+01,   2.73750019e+01,   2.72500000e+01,
      2.72500000e+01,   4.39712000e+07,   4.08852011e-01],
   [  2.53750000e+01,   2.53750000e+01,   2.52500000e+01,
      2.52500000e+01,   2.64320000e+07,   3.78845006e-01],
   ..., 
   [  9.23899994e+01,   9.43899994e+01,   9.16500015e+01,
      9.38799973e+01,   6.11406000e+07,   9.38799973e+01],
   [  9.45500031e+01,   9.46999969e+01,   9.30100021e+01,
      9.34899979e+01,   4.65074000e+07,   9.34899979e+01],
   [  9.41600037e+01,   9.52099991e+01,   9.38899994e+01,
      9.45599976e+01,   4.19231000e+07,   9.45599976e+01]], dtype=float32)

y_train
Out[7]: 
array([  0.40885201,   0.37884501,   0.38822201, ...,  93.87999725,
   93.48999786,  94.55999756], dtype=float32)

到目前为止,数据准备就绪。没有引入冗余。现在的问题是,如何描述这个数据的Keras LSTM模型/培训过程。

编辑3:

以下是具有循环网络所需的3D数据结构的更新代码。 (见Lorrit的回答)。 但它不起作用。

编辑4:在激活('sigmoid')后删除额外的逗号,以正确的方式塑造Y_train。仍然是同样的错误。

import numpy as np

from keras.models import Sequential
from keras.layers import Dense,  Activation, LSTM

nb_timesteps    =  4
nb_features     =  5
batch_size      = 32

# load file
X_train = np.genfromtxt('table.csv', 
                        delimiter=',',  
                        names=None, 
                        unpack=False,
                        dtype=None)

# delete the first row with the names
X_train = np.delete(X_train, (0), axis=0)

# invert the order of the rows, so that the oldest
# entry is in the first row and the newest entry
# comes last
X_train = np.flipud(X_train)

# the last column is our Y
Y_train = X_train[:,6].astype(np.float32)

Y_train = np.delete(Y_train, range(0,6))
Y_train = np.array(Y_train)
Y_train.shape = (len(Y_train), 1)

# we don't use the timestamps. convert the rest to Float32
X_train = X_train[:, 1:6].astype(np.float32)

# shape X_train
X_train.shape = (1,len(X_train), nb_features)


# Now comes Lorrit's code for shaping the 3D-input-data
# http://stackoverflow.com/questions/36992855/keras-how-should-i-prepare-input-data-for-rnn
flag = 0

for sample in range(X_train.shape[0]):
    tmp = np.array([X_train[sample,i:i+nb_timesteps,:] for i in range(X_train.shape[1] - nb_timesteps + 1)])

    if flag==0:
        new_input = tmp
        flag = 1

    else:
        new_input = np.concatenate((new_input,tmp))

X_train = np.delete(new_input, len(new_input) - 1, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
# X successfully shaped

# free some memory
tmp = None
new_input = None


# split data for training, validation and test
# 50:25:25
X_train, X_test = np.split(X_train, 2, axis=0)
X_valid, X_test = np.split(X_test, 2, axis=0)

Y_train, Y_test = np.split(Y_train, 2, axis=0)
Y_valid, Y_test = np.split(Y_test, 2, axis=0)


print('Build model...')

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

model.compile(loss='mse',
              optimizer='RMSprop',
              metrics=['accuracy'])

print('Train...')
print(X_train.shape)
print(Y_train.shape)
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15,
          validation_data=(X_test, Y_test))
score, acc = model.evaluate(X_test, Y_test,
                            batch_size=batch_size)

print('Test score:', score)
print('Test accuracy:', acc)

数据似乎仍然存在问题,Keras说:

Using Theano backend.
Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN not available)Build model...

Traceback (most recent call last):

  File "<ipython-input-1-3a6e9e045167>", line 1, in <module>
    runfile('C:/Users/admin/Documents/pycode/lstm/lstm5.py', wdir='C:/Users/admin/Documents/pycode/lstm')

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/admin/Documents/pycode/lstm/lstm5.py", line 79, in <module>
    Activation('sigmoid')

  File "d:\git\keras\keras\models.py", line 93, in __init__
    self.add(layer)

  File "d:\git\keras\keras\models.py", line 146, in add
    output_tensor = layer(self.outputs[0])

  File "d:\git\keras\keras\engine\topology.py", line 441, in __call__
    self.assert_input_compatibility(x)

  File "d:\git\keras\keras\engine\topology.py", line 382, in assert_input_compatibility
    str(K.ndim(x)))

Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2

3 个答案:

答案 0 :(得分:2)

在模型定义中,您在LSTM图层之前放置了一个Dense图层。您需要在Dense图层上使用TimeDistributed图层。

尝试更改

$http.post("EmpWebService.asmx/DeleteEmployee", EID, config)

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

答案 1 :(得分:1)

在将数据提供给LSTM之前,您仍然缺少一个预处理步骤。您必须决定在计算当天的AdjClose时要包含的先前数据样本(前几天)。请参阅我的回答here,了解如何执行此操作。您的数据应该是三维形状(nb_samples,nb_included_previous_days,features)。

然后,您可以将3D输入到具有一个输出的标准LSTM图层。您可以将此值与y_train进行比较,并尝试将错误最小化。请记住选择适合回归的损失函数,例如:均方误差。

答案 2 :(得分:0)

不确定这是否仍然相关,但有一个很好的例子,说明如何使用LSTM网络预测Jason Brownlees博士的时间序列here

我准备了三个具有不同幅度的噪声相移正弦曲线的例子。不是市场数据,但我认为,你假设一只股票会说另一种股票。

import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Reshape
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# generate sine wavepip
def make_sine_with_noise(_start, _stop, _step, _phase_shift, gain):
    x = numpy.arange(_start, _stop, step = _step)
    noise = numpy.random.uniform(-0.1, 0.1, size = len(x))
    y = gain*0.5*numpy.sin(x+_phase_shift)
    y = numpy.add(noise, y)
    return x, y
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1, look_ahead=1):
    dataX, dataY = [], []
    for i in range(len(dataset) - look_back - look_ahead - 1):
        a = dataset[i:(i + look_back), :]
        dataX.append(a)
        b = dataset[(i + look_back):(i + look_back + look_ahead), :]
        dataY.append(b)
    return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
numpy.random.seed(7)
# generate sine wave
x1, y1 = make_sine_with_noise(0, 200, 1/24, 0, 1)
x2, y2 = make_sine_with_noise(0, 200, 1/24, math.pi/4, 3)
x3, y3 = make_sine_with_noise(0, 200, 1/24, math.pi/2, 20)
# plt.plot(x1, y1)
# plt.plot(x2, y2)
# plt.plot(x3, y3)
# plt.show()
#transform to pandas dataframe
dataframe = pandas.DataFrame({'y1': y1, 'y2': y2, 'x3': y3})
dataset = dataframe.values
dataset = dataset.astype('float32')
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
#split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 10
look_ahead = 5
trainX, trainY = create_dataset(train, look_back, look_ahead)
testX, testY = create_dataset(test, look_back, look_ahead)
print(trainX.shape)
print(trainY.shape)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], trainX.shape[2]))
testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], testX.shape[2]))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(look_ahead, input_shape=(trainX.shape[1], trainX.shape[2]), return_sequences=True))
model.add(LSTM(look_ahead, input_shape=(look_ahead, trainX.shape[2])))
model.add(Dense(trainY.shape[1]*trainY.shape[2]))
model.add(Reshape((trainY.shape[1], trainY.shape[2])))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=1, batch_size=1, verbose=1)
# make prediction
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

#save model
model.save('my_sin_prediction_model.h5')

trainPredictPlottable = trainPredict[::look_ahead]
trainPredictPlottable = [item for sublist in trainPredictPlottable for item in sublist]
trainPredictPlottable = scaler.inverse_transform(numpy.array(trainPredictPlottable))
# create single testPredict array concatenating every 'look_ahed' prediction array
testPredictPlottable = testPredict[::look_ahead]
testPredictPlottable = [item for sublist in testPredictPlottable for item in sublist]
testPredictPlottable = scaler.inverse_transform(numpy.array(testPredictPlottable))
# testPredictPlottable = testPredictPlottable[:-look_ahead]
# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredictPlottable)+look_back, :] = trainPredictPlottable
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(dataset)-len(testPredictPlottable):len(dataset), :] = testPredictPlottable
# plot baseline and predictions
dataset = scaler.inverse_transform(dataset)
plt.plot(dataset, color='k')
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()