Question

我需要通过遍历数据来获取非常大的数据块。总共我需要进行几百万次迭代。所以我以为压制会加快我的过程，而且几乎可以做到。我使用subprocess.Queue来调用不同的线程，这实际上可以正常工作，但是当我调用* subprocess.Queue.get（）`时，程序将永远花费时间来获取结果。也许我做错了。这是我的最小示例：

def get_losses(self, tags=None):
    return_dict = {}
    output_list = multiprocessing.Queue()
    process_list = []

    # Create quese definition
    for experiment, path in self.tf_board_dicts.items():
        t = multiprocessing.Process(target=self._load_vec_from_tfboard, args=(path, tags, experiment))
        process_list.append(t)
    print("Starting subprocesse with a total of {} workers. \n These are  {}".format(len(process_list),
                                                                                         process_list))
    # Run processes
    for p in process_list:
        p.start()

    # Exit the finished threads
    for p in process_list:
        p.join()
    print("All subprocesses are termianted")

    # Get results
    results = [output_list.get() for p in process_list]
    print("All losses are gathered: {}".format([tup[0] for tup in results]))

    # Create dict
    for experiment_losses in results:
         return_dict[experiment_losses[0]] = experiment_losses[1]

    return return_dict

Answer 1

您可以在这里找到无限时间排队的答案：Python Processes not joining

之所以会这样，是因为Queue在将大量数据推入缓冲区时在内部使用缓冲区。直到刷新该缓冲区后，写入Queue的过程才能退出，直到您开始将内容从Queue中拉出时才会发生。因此，由于要在将所有内容从要写入的Queue对象中拉出之前尝试连接所有进程，所以它们无法退出，因此join会挂起。您可以通过在进程上调用Queue之前排干join来解决此问题。 – dano 2014年9月25日16:16

multiprocess.Queue.get（）在python中需要很长时间

1 个答案: