Question

我使用简单的MINST神经网络程序在Windows 10上运行tensorflow-gpu。当它尝试运行时，会遇到CUBLAS_STATUS_ALLOC_FAILED错误。谷歌搜索没有任何结果。

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Answer 1

会话配置的“allow_growth”属性的位置现在似乎有所不同。这里解释了：https://www.tensorflow.org/tutorials/using_gpu

所以目前你必须这样设置：

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

Answer 2

对于TensorFlow 2.2，当遇到CUBLAS_STATUS_ALLOC_FAILED问题时，以上解决方案均无效。在https://www.tensorflow.org/guide/gpu上找到了解决方案：

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

我在执行任何进一步的计算之前运行了此代码，发现在之前的同一会话中使用过产生CUBLAS错误的相同代码。上面的示例代码是一个特定示例，它设置了多个物理GPU上的内存增长，但也解决了内存扩展问题。

Answer 3

我发现此解决方案有效

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

Answer 4

在Windows上，目前tensorflow没有像文档中所说的那样分配所有可用内存，相反，您可以通过允许动态内存增长来解决此错误，如下所示：

tf.Session(config=tf.ConfigProto(allow_growth=True))

Answer 5

THIS CODE WORK FOR ME

tensorflow> = 2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                         tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)

Answer 6

就我而言，陈旧的python进程正在消耗内存。我通过任务管理器将其杀死，一切恢复正常。

Answer 7

聚会有点晚，但这解决了我在 tensorflow 2.4.0 和 gtx 980ti 上的问题。在限制内存之前，我收到了如下错误：

CUBLAS_STATUS_ALLOC_FAILED

我的解决方案是这段代码：

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])

我在这里找到了解决方案：https://www.tensorflow.org/guide/gpu

Answer 8

喀拉拉邦：

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

Answer 9

Tensorflow 2.0 alpha

允许GPU内存增长可以解决此问题。对于Tensorflow 2.0 alpha /每晚，您可以尝试以下两种方法进行存档。

1。）

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()

2。）

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you 
                                                   # want to give to tensorflow.

我建议您同时尝试一下，看看是否有帮助。来源：https://www.tensorflow.org/alpha/guide/using_gpu

Answer 10

这些修复都不适合我，因为tensorflow库的结构似乎已经改变。对于Tensorflow 2.0，唯一对我有用的修复方法是https://www.tensorflow.org/guide/gpu页上的Limiting GPU memory growth下

出于完整性和面向未来的考虑，这是文档中的解决方案-我认为某些人可能需要更改memory_limit-我的情况适合1 GB。

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Answer 11

这里至少有2个不同的问题。第一个是先前运行的python进程随后重新运行，并且尚未从先前的运行中释放GPU内存时。您可以说这是正在发生的，因为当python进程出现时，它会立即消耗大量的RAM，并且在获取更多RAM时会失败。在附带的屏幕中，启动时会获取〜6GB。通过使用Windows中的任务管理器的“详细信息”选项卡下的“专用GPU内存”列，检查GPU内存。在这种情况下，请重新启动PC，因为问题是由GPU内存不足引起的。 TF被设计为在会话期间不释放内存，因为这会导致碎片，因此，看起来IPython / Python会话正在持有TF实例，并且没有从上一次运行中释放内存。在我将Pycharm与IPython会话结合使用的情况下，反复运行它最终会导致我的所有RAM在静态启动时被抢占，几乎没有动态增长的余地。

第二个问题是GPU设备配置错误。根据TF版本和使用的设备数量，您可能需要将GPU内存设置为在多个设备上具有相同的策略。该策略是允许GPU内存在会话期间增长，或者在启动时尽可能多地获取。上面列出了各种修复程序，请选择适合您使用的TF版本的修复程序，以及是否具有> 1个设备。

Tensorflow与CUBLAS_STATUS_ALLOC_FAILED崩溃

11 个答案: