Question

如何编写使用两个队列的Python多进程脚本？：

一个作为工作队列，以一些数据开头，并且根据要并行化的函数的条件，动态接收更多任务，
收集结果的另一个，用于在处理完成后写下结果。

我基本上需要在工作队列中添加更多任务，具体取决于我在其初始项目中找到的内容。我在下面发布的示例很愚蠢（我可以根据自己的喜好对项目进行转换并将其直接放在输出队列中），但它的机制很清晰，反映了我需要开发的概念的一部分。

因此我的尝试：

import multiprocessing as mp

def worker(working_queue, output_queue):
    item = working_queue.get() #I take an item from the working queue
    if item % 2 == 0:
        output_queue.put(item**2) # If I like it, I do something with it and conserve the result.
    else:
        working_queue.put(item+1) # If there is something missing, I do something with it and leave the result in the working queue 

if __name__ == '__main__':
    static_input = range(100)    
    working_q = mp.Queue()
    output_q = mp.Queue()
    for i in static_input:
        working_q.put(i)
    processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())] #I am running as many processes as CPU my machine has (is this wise?).
    for proc in processes:
        proc.start()
    for proc in processes:
        proc.join()
    for result in iter(output_q.get, None):
        print result #alternatively, I would like to (c)pickle.dump this, but I am not sure if it is possible.

这不会结束也不会打印任何结果。

在整个过程结束时，我想确保工作队列为空，并且所有并行函数都已完成写入输出队列，然后再迭代以取出结果。你有关于如何使其发挥作用的建议吗？

Answer 1

以下代码可达到预期效果。它遵循@tawmas提出的建议。

此代码允许在进程中使用多个核心，这些核心要求在处理过程中可以由他们更新向工作人员提供数据的队列：

import multiprocessing as mp
def worker(working_queue, output_queue):
    while True:
        if working_queue.empty() == True:
            break #this is the so-called 'poison pill'    
        else:
            picked = working_queue.get()
            if picked % 2 == 0: 
                    output_queue.put(picked)
            else:
                working_queue.put(picked+1)
    return

if __name__ == '__main__':
    static_input = xrange(100)    
    working_q = mp.Queue()
    output_q = mp.Queue()
    results_bank = []
    for i in static_input:
        working_q.put(i)
    processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())]
    for proc in processes:
        proc.start()
    for proc in processes:
        proc.join()
    results_bank = []
    while True:
       if output_q.empty() == True:
           break
       results_bank.append(output_q.get_nowait())
    print len(results_bank) # length of this list should be equal to static_input, which is the range used to populate the input queue. In other words, this tells whether all the items placed for processing were actually processed.
    results_bank.sort()
    print results_bank

Answer 2

您在创建流程的行中有拼写错误。它应该是mp.Process，而不是mp.process。这就是导致你获得异常的原因。

此外，您没有在您的工作程序中循环，因此它们实际上只从队列中使用单个项目然后退出。在不了解所需逻辑的情况下，提供具体建议并不容易，但您可能希望将worker函数的主体包含在while True循环中并在正文中添加条件以退出工作完成后。

请注意，如果您没有添加条件以显式退出循环，那么当队列为空时，您的工作人员将永远停止。您可以考虑使用所谓的毒丸技术来通知他们可能退出的工人。您将在Communication Between processes上的PyMOTW文章中找到一个示例和一些有用的讨论。

至于要使用的进程数，您需要进行一些基准测试以找到适合您的进程，但是，通常，当您的工作负载受CPU限制时，每个核心一个进程是一个很好的起点。如果您的工作负载是IO绑定的，那么工作人员数量可能会更好。

具有更新队列和输出队列的Python多处理

2 个答案: