我正在for a循环中运行一个multiprocessing
池。它可以运行两次迭代并挂在第三次。如果我减小每个卡盘的尺寸,它可能会在第四次或第五次迭代后挂起。在我发现问题的程序中,我正在运行更广泛的功能,但这可以重现错误。
完成后是否有正确的方法终止池?这样我就可以重新开始了。
import pandas as pd
import numpy as np
from multiprocess import Pool
df = pd.read_csv('paths.csv')
def do_something(user):
v = df[df['userId'] == user]
return v
if __name__ == '__main__':
users = df['userId'].unique()
n_chunks = round(len(users)/40)
subsets = [users[i:i+n_chunks] for i in range(0, len(users), n_chunks)]
chunk_counter = 0
for user_subset in subsets:
chunk_counter += 1
print(f'Beginning to process chunk {chunk_counter}...')
with Pool() as pool:
frames = pool.map(do_something, user_subset)
pool.close()
pool.terminate()
print(f'Completed processing chunk {chunk_counter}.')
答案 0 :(得分:0)
我能够阻止下面的代码挂起:
with Pool(maxtasksperchild=1) as pool:
frames = pool.map_async(do_something, user_subset).get()
pool.terminate()
pool.join()
我不明白为什么使用map_async
会阻止挂起。如果我有机会,如果我理解原因,我会深入了解。