即使批量非常小,分配器也会耗尽内存

时间:2018-04-09 14:13:33

标签: memory tensorflow keras gpu

此问题从未发生过,但从今天开始,即使使用非常小的批量大小,Tensorflow也会尝试分配大量内存。

我遵循了本教程:

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

"使用预先训练的网络的瓶颈功能:一分钟内准确度达到90%"

这是我的代码:

import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications


img_width, img_height = 150, 150

top_model_weights_path = 'bottleneck_fc_model.h5'
train_data_dir = 'C:\\ImageData\\Augmented\\Train'
validation_data_dir = 'C:\\ImageData\\Augmented\\Validate'
#train_data_dir = 'C:\\Users\\NSA\\flower_photos\\Train'
#validation_data_dir = 'C:\\Users\\NSA\\flower_photos\\Validate'
nb_train_samples = 25
nb_validation_samples = 5
epochs = 10
my_batch_size = 10

def save_bottleneck_features():
    datagen = ImageDataGenerator(rescale=1./255)

    # build the VGG16 network
    model = applications.VGG16(include_top=False, weights='imagenet')

    generator = datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_width, img_height),
        batch_size=my_batch_size,
        class_mode=None,
        shuffle=False)
    bottleneck_features_train = model.predict_generator(
        generator, 
        steps=nb_train_samples // my_batch_size,
        max_queue_size=10,
        workers=1,
        use_multiprocessing=False,
        verbose=1)
    np.save(open('bottleneck_features_train.npy', 'w'),
            bottleneck_features_train)

    generator = datagen.flow_from_directory(
        validation_data_dir,
        target_size=(img_width, img_height),
        batch_size=my_batch_size,
        class_mode=None,
        shuffle=False)
    bottleneck_features_validation = model.predict_generator(
        generator, nb_validation_samples // my_batch_size)
    np.save(open('bottleneck_features_validation.npy', 'w'),
        bottleneck_features_validation)

def train_top_model():
    train_data = np.load(open('bottleneck_features_train.npy'))
    train_labels = np.array(
        [0] * (nb_train_samples / 2) + [1] * (nb_train_samples / 2))

    validation_data = np.load(open('bottleneck_features_validation.npy'))
    validation_labels = np.array(
        [0] * (nb_validation_samples / 2) + [1] * (nb_validation_samples / 2))

    model = Sequential()
    model.add(Flatten(input_shape=train_data.shape[1:]))
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='rmsprop',
                  loss='binary_crossentropy', metrics=['accuracy'])

    model.fit(train_data, train_labels,
              epochs=epochs,
              batch_size=my_batch_size,
              validation_data=(validation_data, validation_labels))
    model.save_weights(top_model_weights_path)

save_bottleneck_features()
train_top_model()

这是我得到的错误:

PS C:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts> cd 'c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts'; ${env:PYTHONIOENCODING}='UTF-8'; ${env:PYTHONUNBUFFERED}='1'; & 'C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\python.exe' 'C:\Users\NSA\.vscode\extensions\ms-python.python-2018.3.1\pythonFiles\PythonTools\visualstudio_py_launcher.py' 'c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts' '50490' '34806ad9-833a-4524-8cd6-18ca4aa74f14' 'RedirectOutput,RedirectOutput' 'c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts\first_try_real_transfer_learning_keras_vgg16.py'
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will
be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Bottleneck Features saven
2018-04-09 16:02:08.772206: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-04-09 16:02:09.345010: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1212] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:02:00.0
totalMemory: 2.00GiB freeMemory: 1.66GiB
2018-04-09 16:02:09.356147: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0
2018-04-09 16:02:10.108947: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1429 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:02:00.0, compute capability: 5.0)
Found 109 images belonging to 2 classes.
2018-04-09 16:02:16.979539: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.33GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-09 16:02:17.441196: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.19GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-09 16:02:17.792983: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-09 16:02:18.122577: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2/2 [==============================] - 4s 2s/step
Traceback (most recent call last):
  File "c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts\first_try_real_transfer_learning_keras_vgg16.py", line 94, in <module>
    save_bottleneck_features()
  File "c:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts\first_try_real_transfer_learning_keras_vgg16.py", line 56, in save_bottleneck_features
    bottleneck_features_train)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 511, in save
    pickle_kwargs=pickle_kwargs)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\format.py", line 565, in write_array
    version)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\format.py", line 335, in _write_array_header
    fp.write(header_prefix)
TypeError: write() argument must be str, not bytes
PS C:\Users\NSA\ownCloud\Documents\Tensorflow\Skripts>

在调用 model.predict_generator()

时会发生错误

起初我认为它的内存耗尽因为我的批量太大,但即使我使用的批量大小为1,它也需要超过2GiB的内存。我使用TensorFlow后端安装了CUDA 9.0,cuDNN 7.0,Tensorflow 1.6.0和Keras 2.1.5。这曾经没有问题,但它突然开始给我这个错误。我使用的是NVIDIA GeForce 940MX

1 个答案:

答案 0 :(得分:0)

您的问题与内存或张量流无关。以文本形式打开的文件正在写入字节。 而不是以文本形式打开文件:

open('bottleneck_features_train.npy', 'w')

以字节为单位打开它:

open('bottleneck_features_train.npy', 'wb')

这适用于您打给open的所有电话。