如何控制最大并发运行进程?

时间:2014-06-04 08:36:32

标签: python process multiprocessing

共有5个文件:main.pyworker.pycat.pydog.pyrabbit.pycatdograbbit继承worker并实施worker_run()

main.py中,我准备 3个进程来执行,但是不知道如何同时控制最大并发运行进程(例如,2个进程)。 / p>

我尝试使用multiprocessing.Pool,但它只支持class(?)之外的函数。

main.py

from multiprocessing import Process
from cat import *
from dog import *
from rabbit import *

p1 = cat()
p2 = dog()
p3 = rabbit()
p1 = start()
p2 = start()
p3 = start()
p1 = join()
p2 = join()
p3 = join()

worker.py

import multiprocessing

class Worker(multiprocessing.Process):
    def __init__(self):
        multiprocessing.Process.__init__(self)
        print "Init"
        self.value = None

    def run(self):
        print "Running"
        self.worker_run()

    @abc.abstractmethod
    def worker_run(self):
    """ implement """
    return

cat.py

from worker import *

class cat(Worker):
    def worker_run(self)
        for i in range(10000)
            print "cat run"

dog.py

from worker import *

class dog(Worker):
    def worker_run(self)
        for i in range(10000)
            print "dog run"

rabbit.py

from worker import *

class dog(Worker):
    def worker_run(self)
        for i in range(10000)
            print "rabbit run"

1 个答案:

答案 0 :(得分:2)

如果你想让两个方法同时运行并阻止第三个方法直到其中一个方法停止,你必须使用Semaphore

您必须将信号量传递给对象方法,以便它们可以获取它。 在主文件中,创建信号量并将其传递给对象:

from multiprocessing import Process, Semaphore
from cat import *
from dog import *
from rabbit import *

semaphore = Semaphore(2)   # at most 2 processes running concurrently
p1 = cat(semaphore)
p2 = dog(semaphore)
p3 = rabbit(semaphore)
p1.start()
p2.start()
p3.start()
p1.join()
p2.join()
p3.join()

然后您可以在运行Worker之前修改worker_run类以获取信号量:

class Worker(multiprocessing.Process):
    def __init__(self, semaphore):
        multiprocessing.Process.__init__(self)
        print "Init"
        self.value = None
        self.semaphore

    def run(self):
        with self.semaphore:
            print "Running"
            self.worker_run()

    @abc.abstractmethod
    def worker_run(self):
    """ implement """
    return

这应该确保最多同时运行2个worker_run方法。


事实上,我相信你制造的东西比应该的东西更复杂。您必须继承Process。您可以使用target参数实现完全相同的功能:

from multiprocessing import Process, Semaphore
from cat import Cat
from dog import Dog
from rabbit import Rabbit

semaphore = Semaphore(2)

cat = Cat()
dog = Dog()
rabbit = Rabbit()

def run(animal, sema):
    with sema:
        animal.worker_run(*args)

cat_proc = Process(target=run, args=(cat, semaphore))
dog_proc = Process(target=run, args=(dog, semaphore))
rabbit_proc = Process(target=run, args=(rabbit, semaphore))

cat_proc.start()
dog_proc.start()
rabbit_proc.start()

cat_proc.join()
dog_proc.join()
rabbit_proc.join()

事实上,只需稍加改动即可摆脱Semaphore,只需使用Pool对象:

from multiprocessing import Pool
from cat import Cat
from dog import Dog
from rabbit import Rabbit


cat = Cat()
dog = Dog()
rabbit = Rabbit()

def run(animal):
    animal.worker_run()


pool = Pool(2)
pool.map(run, [cat, dog, rabbit])

你遇到的问题是你不能传递target参数,或者可以调用Pool.map方法,因为方法不能被腌制(参见What can be pickled and unpickled?)。 multiprocessing模块使用pickle协议在进程之间进行通信,因此它处理的所有内容都应该是可选择的。

特别是要解决有关不可解决方法的问题,标准的解决方法是使用全局函数,您可以将实例作为第一个参数显式传递,就像我上面所做的那样。这正是方法调用所发生的情况,但它由解释器自动完成。在这种情况下,您必须明确处理它。