Question

在长时间运行的多线程脚本中遇到一些可能的内存泄漏后，我发现了关于maxtasksperchild的信息，它可以像这样在多进程池中使用：

import multiprocessing

with multiprocessing.Pool(processes=32, maxtasksperchild=x) as pool:
    pool.imap(function,stuff)

线程池（multiprocessing.pool.ThreadPool）是否可能有类似的东西？

Answer 1

正如noxdafox的回答所说，父类中没有办法，您可以使用threading模块来控制每个孩子的最大任务数。您想使用multiprocessing.pool.ThreadPool时，threading模块很相似，所以...

def split_processing(yourlist, num_splits=4):
    '''
    yourlist = list which you want to pass to function for threading.
    num_splits = control total units passed.
    '''
    split_size = len(yourlist) // num_splits
    threads = []
    for i in range(num_splits):
        start = i * split_size
        end = len(yourlist) if i+1 == num_splits else (i+1) * split_size
        threads.append(threading.Thread(target=function, args=(yourlist, start, end)))
        threads[-1].start()

    # wait for all threads to finish
    for t in threads:
        t.join()

说您的列表中有100个项目，然后

if num_splits = 10; then threads = 10, each thread has 10 tasks.
if num_splits = 5; then threads = 5, each thread has 20 tasks.
if num_splits = 50; then threads = 50, each thread has 2 tasks.
and vice versa.

Answer 2

看multiprocessing.pool.ThreadPool implementation，很明显maxtaskperchild参数没有传播到父multiprocessing.Pool类。 multiprocessing.pool.ThreadPool实现尚未完成，因此缺乏功能（以及测试和文档）。

pebble软件包实现了ThreadPool，它支持在处理了一定数量的任务之后重新启动工作程序。

Answer 3

我想要一个线程池，它会在池中的另一个任务完成后立即运行一个新任务（即 maxtasksperchild=1）。我决定编写一个小的“ThreadPool”类，为每个任务创建一个新线程。一旦池中的任务完成，就会为传递给 map 方法的可迭代对象中的下一个值创建另一个线程。 map 方法会阻塞，直到传递的可迭代对象中的所有值都已被处理并返回它们的线程。

import threading


class ThreadPool():

    def __init__(self, processes=20):
        self.processes = processes
        self.threads = [Thread() for _ in range(0, processes)]

    def get_dead_threads(self):
        dead = []
        for thread in self.threads:
            if not thread.is_alive():
                dead.append(thread)
        return dead

    def is_thread_running(self):
        return len(self.get_dead_threads()) < self.processes

    def map(self, func, values):
        attempted_count = 0
        values_iter = iter(values)
        # loop until all values have been attempted to be processed and
        # all threads are finished running
        while (attempted_count < len(values) or self.is_thread_running()):
            for thread in self.get_dead_threads():
                try:
                    # run thread with the next value
                    value = next(values_iter)
                    attempted_count += 1
                    thread.run(func, value)
                except StopIteration:
                    break

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, exc_tb):
        pass


class Thread():

    def __init__(self):
        self.thread = None

    def run(self, target, *args, **kwargs):
        self.thread = threading.Thread(target=target,
                                       args=args,
                                       kwargs=kwargs)
        self.thread.start()

    def is_alive(self):
        if self.thread:
            return self.thread.is_alive()
        else:
            return False

你可以这样使用它：

def run_job(self, value, mp_queue=None):
    # do something with value
    value += 1


with ThreadPool(processes=2) as pool:
    pool.map(run_job, [1, 2, 3, 4, 5])

是否可以为线程池设置maxtasksperchild？

3 个答案: