Question

我正在运行mLSTM（乘法LSTM）变换（基于mLSTM by OpenAi（只是变换，它已经过训练）但转换超过100,000个文档需要很长时间。< / p>

我希望它在多个GPU上运行。我看到了一些examples但我不知道如何在这个mLSTM转换代码上实现它。

我想在多个GPU上运行的特定部分是：

        def transform(xs):
            tstart = time.time()
            xs = [preprocess(x) for x in xs]
            lens = np.asarray([len(x) for x in xs])
            sorted_idxs = np.argsort(lens)
            unsort_idxs = np.argsort(sorted_idxs)
            sorted_xs = [xs[i] for i in sorted_idxs]
            maxlen = np.max(lens)
            offset = 0
            n = len(xs)
            smb = np.zeros((2, n, hps.nhidden), dtype=np.float32)
            for step in range(0, ceil_round_step(maxlen, nsteps), nsteps):
                start = step
                end = step+nsteps
                xsubseq = [x[start:end] for x in sorted_xs]
                ndone = sum([x == b'' for x in xsubseq])
                offset += ndone
                xsubseq = xsubseq[ndone:]
                sorted_xs = sorted_xs[ndone:]
                nsubseq = len(xsubseq)
                xmb, mmb = batch_pad(xsubseq, nsubseq, nsteps)
                for batch in range(0, nsubseq, nbatch):
                    start = batch
                    end = batch+nbatch
                    batch_smb = seq_rep(
                        xmb[start:end], mmb[start:end],
                        smb[:, offset+start:offset+end, :])
                    smb[:, offset+start:offset+end, :] = batch_smb
            features = smb[0, unsort_idxs, :]
            print('%0.3f seconds to transform %d examples' %
                  (time.time() - tstart, n))
            return features

这只是完整代码的一小部分（我不认为可以在此处复制整个代码）。

Answer 1

您所指的部分不是跨GPU分割计算的地方，它只会转换数据（在CPU上！）并运行会话。

正确的位置是定义计算图形的位置，例如def mlstm(inputs, c, h, M, ndim, scope='lstm', wn=False): [...] for idx, x in enumerate(inputs): with tf.device('/gpu:' + str(i % GPU_COUNT)): m = tf.matmul(x, wmx) * tf.matmul(h, wmh) z = tf.matmul(x, wx) + tf.matmul(m, wh) + b [...]方法。有很多方法可以分割图形，例如将LSTM单元放置在不同的GPU上，以便可以并行处理输入序列：

log_device_placement

顺便说一句，tensorflow import tensorflow as tf # Creates a graph. with tf.device('/gpu:0'): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name='b') c = tf.add(a, b) # Creates a session with log_device_placement set to True. with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: # Prints the following: # Device mapping: # /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: <GPU name>, pci bus id: 0000:01:00.0, compute capability: 6.1 # Add: (Add): /job:localhost/replica:0/task:0/device:GPU:0 # b: (Const): /job:localhost/replica:0/task:0/device:GPU:0 # a: (Const): /job:localhost/replica:0/task:0/device:GPU:0 print(sess.run(c))中有一个有用的配置选项，它有助于查看输出中的执行细节。这是一个例子：

{{1}}

转换mLSTM - 在多个GPU上运行它

1 个答案: