Question

我一直在尝试在 pytorch 中训练神经网络时使用 python subprocess 模块，但我注意到如果我在 gpu 上初始化网络，子进程运行速度会慢很多倍。这是我使用的一个示例脚本，带有一个非常简单的线性网络，使用 line_profiler 分析时间，并循环一个简单的子进程调用 100 次：

import torch
import torch.nn as nn
import subprocess
from line_profiler import LineProfiler

class TestNN(nn.Module):
    def __init__(self, device):
        super(TestNN, self).__init__()
        self.fc1 = nn.Linear(5,16)
        self.device=device
        self.to(self.device)

def test_subprocess():
    device = torch.device('cuda:0')
    testNet=TestNN(device)

    for i in range(100):
        subprocess.run(["ls",  "-l"], capture_output=True)

if __name__ == '__main__':

    lprofiler =LineProfiler()
    lp_wrapper = lprofiler(test_subprocess)
    
    lp_wrapper()
    lprofiler.print_stats()

仅将小型网络移至 GPU 会导致 subprocess.run() 的执行速度降低 4 倍以上。

当网络在 cpu 上时，我从 line_profiler 得到的结果：

Total time: 1.46088 s
File: test_subprocess_gpu.py
Function: test_subprocess at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    13                                           def test_subprocess():
    14         1        172.0    172.0      0.0      device = torch.device('cpu')
    15         1        806.0    806.0      0.1      testNet=TestNN(device)
    16                                           
    17       101       1235.0     12.2      0.1      for i in range(100):
    18       100    1458671.0  14586.7     99.8          subprocess.run(["ls",  "-l"], capture_output=True)

我在 GPU 上初始化网络时的结果：

Timer unit: 1e-06 s

Total time: 8.63406 s
File: test_subprocess_gpu.py
Function: test_subprocess at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    13                                           def test_subprocess():
    14         1        174.0    174.0      0.0      device = torch.device('cuda:0')
    15         1    2084937.0 2084937.0     24.1      testNet=TestNN(device)
    16                                           
    17       101       1163.0     11.5      0.0      for i in range(100):
    18       100    6547789.0  65477.9     75.8          subprocess.run(["ls",  "-l"], capture_output=True)

有谁知道是什么导致了这种减速以及如何通过在 GPU 上初始化的网络来提高速度？我很困惑为什么在 GPU 上初始化神经网络会对 subprocess.run() 的速度产生任何影响。非常感谢任何帮助！

pyTorch 使用 GPU 时，为什么 Python 子进程模块运行如此缓慢？

0 个答案: