Keras,为什么在模型中添加图层这么慢?

时间:2016-05-17 14:07:11

标签: python neural-network theano keras

我正在尝试在Keras建造一个非常大的模型,其中包含3个LSTM层,每个层有4096个隐藏单元。以前我在每一层都有1024个隐藏单位。该网络的编译时间是合理的。每层将在大约1至2秒内添加。现在模型每层有4096个隐藏单位,每层的添加时间约为5分钟。我认为奇怪的是,在model.add(LSTM...)的三次调用期间而不是在model.compile(...)期间,性能变慢。我需要使用更大的网络,但这个等待时间有点难以忍受。这对培训来说并不是那么糟糕,因为这需要更长的时间,但我不希望每次想要生成测试输出时都能通过它。为什么添加会花费这么多时间?是不是只添加定义图层,所有时间都应该花在编译函数中?还有什么我可以做的吗?

print('Building Model')
model = Sequential()
model.add(LSTM(lstm_size, batch_input_shape = (batch_size, 1, len(bytes_set)), stateful = True, return_sequences = True, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_1')
model.add(LSTM(lstm_size, stateful = True, return_sequences = True, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_2')
model.add(LSTM(lstm_size, stateful = True, return_sequences = False, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_3')
model.add(Dense(len(bytes_set), activation = 'softmax'))

print('Compiling Model')
model.compile(optimizer = SGD(lr = 0.3, momentum = 0.9, decay = 1e-5, nesterov = True),
              loss = 'categorical_crossentropy', 
              metrics = ['accuracy'])

这是我的.theanorc

[global]
floatX = float32
mode = FAST_RUN
device = gpu
exception_verbosity = high

[nvcc]
fastmath = 1

以下是我要求的模型摘要。不幸的是,过去几个小时我一直在运行这个新版本,所以我不想做任何新的改动。该模型有4个LSTM层,每层大小为1500.

Layer (type)                       Output Shape        Param #     Connected to                     
====================================================================================================
lstm_1 (LSTM)                      (64, 1, 1500)       9774000     lstm_input_1[0][0]               
____________________________________________________________________________________________________
dropout_1 (Dropout)                (64, 1, 1500)       0           lstm_1[0][0]                     
____________________________________________________________________________________________________
lstm_2 (LSTM)                      (64, 1, 1500)       18006000    dropout_1[0][0]                  
____________________________________________________________________________________________________
dropout_2 (Dropout)                (64, 1, 1500)       0           lstm_2[0][0]                     
____________________________________________________________________________________________________
lstm_3 (LSTM)                      (64, 1, 1500)       18006000    dropout_2[0][0]                  
____________________________________________________________________________________________________
dropout_3 (Dropout)                (64, 1, 1500)       0           lstm_3[0][0]                     
____________________________________________________________________________________________________
lstm_4 (LSTM)                      (64, 1500)          18006000    dropout_3[0][0]                  
____________________________________________________________________________________________________
dropout_4 (Dropout)                (64, 1500)          0           lstm_4[0][0]                     
____________________________________________________________________________________________________
dense_1 (Dense)                    (64, 128)           192128      dropout_4[0][0]                  
====================================================================================================
Total params: 63984128
____________________________________________________________________________________________________

1 个答案:

答案 0 :(得分:1)

它很慢,因为你试图分配一个至少需要0.5GB内存的矩阵。 4096单位* 4097重量已经是一个巨大的数字。 LSTM具有与输入,输出和遗忘门相关的额外内部权重。正如你所看到的,这总计很多。

<强>更新 我从我的手机写了我的答案,我写了TB而不是GB。您可以通过添加以下内容轻松检查模型的大小:

print model.summary()

在两种情况下(1024和4096)。请在评论中分享您的结果,因为我感兴趣:)

相关问题