Keras CUDA_ERROR_OUT_OF_MEMORY小数据集

时间:2019-05-12 15:57:51

标签: python tensorflow keras lstm

模型:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_6 (LSTM)                (900, 30)                 4560      
_________________________________________________________________
dense_6 (Dense)              (900, 8)                  248       
=================================================================

培训代码:

for epoch in epochs:
    print('epoch: ', epoch)

    start_time_day = time.time()

    for d in days : 
        X,y = split_sequence(features, labels, n_steps)
        X = X.reshape(X.shape[0], X.shape[1], inputs_n)
        history = model.train_on_batch(X, y)

X形状是float32的(900, 11250, 7),大约为280Mb。

我在具有K80(11Gb RAM)的GCP VM上尝试了此操作,但出现了 CUDA内存不足错误(它反复循环出现该错误):

name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.562
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-05-12 15:40:43.694393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-12 15:40:45.602743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-12 15:40:45.602828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-05-12 15:40:45.602839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-05-12 15:40:45.603245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10754 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2019-05-12 15:40:56.928097: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
2019-05-12 15:41:05.702532: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-05-12 15:41:05.702600: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 4294967296

…

0 个答案:

没有答案