Question

是否可以在GPU上的tensorflow中优化sparse_tensor_dense_matmul操作？我在CUDA 8中使用了tensoflow 1.2.1。错误示例：

import tensorflow as tf

with tf.device('/gpu:0'):
    st = tf.SparseTensor(
        tf.constant([[0, 0], [1, 1]], dtype=tf.int64),
        tf.constant([1.2, 3.4], dtype=tf.float32),
        tf.constant([2, 2], dtype=tf.int64)
    ) 
    v = tf.Variable([[1.0, 0.0], [0.0, 1.0]], dtype=tf.float32)
    st = tf.sparse_tensor_dense_matmul(st, v)
    st = tf.reduce_min(st)
    optimizer = tf.train.AdamOptimizer()
    trainer = optimizer.minimize(st)

with tf.Session() as sess:
    print(sess.run(trainer))

导致以下错误：

Traceback (most recent call last):
  File "test_tf3.py", line 18, in <module>
    print(sess.run(trainer))
  File "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/media/awork/home/astepochkin/drecs/repo/env/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'gradients/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/strided_slice_1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
     [[Node: gradients/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/strided_slice_1 = StridedSlice[Index=DT_INT32, T=DT_INT64, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2, _device="/device:GPU:0"](Const, gradients/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/strided_slice_1/stack, gradients/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/strided_slice_1/stack_1, gradients/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/strided_slice_1/stack_2)]]

Answer 1

禁用硬设备放置可能有意义：

import tensorflow as tf

with tf.device('/gpu:0'):
    st = tf.SparseTensor(
        tf.constant([[0, 0], [1, 1]], dtype=tf.int64),
        tf.constant([1.2, 3.4], dtype=tf.float32),
        tf.constant([2, 2], dtype=tf.int64)
    ) 
    v = tf.Variable([[1.0, 0.0], [0.0, 1.0]], dtype=tf.float32)
    st = tf.sparse_tensor_dense_matmul(st, v)
    st = tf.reduce_min(st)
    optimizer = tf.train.AdamOptimizer()
    trainer = optimizer.minimize(st)

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(trainer))

你也可以log device placements，这可能有助于确定您关心的内核是否在GPU上。

有host memory fake GPU kernels registered for int32 strided slice，但不是int64。如果您需要/想要硬设备放置，我会在Github上打开一个pull请求/功能请求来添加int64主机内存内核（实际上只是复制int32版本）。

对于背景，在SparseTensorDenseMatMul的渐变中使用了跨步切片。在GPU上运行这些类型的索引操作通常没有任何好处，因此它们被注册为在CPU上运行的GPU内核，以避免您遇到的各种硬设备放置簿记问题。

tensorflow优化GPU上的sparse_tensor_dense_matmul操作

1 个答案: