我正在尝试将我的一个程序转换为使用多处理程序,最好是多处理池,因为这些程序似乎更简单。在高级别,该过程从图像创建补丁阵列,然后将它们传递到GPU以进行对象检测。 CPU和GPU部分各占4个,但CPU有8个内核,不需要等待GPU,因为数据通过GPU后无法对数据进行进一步的操作。
为了帮助我完成这个过程,我希望用我的实现的高级版本进行演示。假设我们正在循环浏览具有10个图像的文件夹中的图像列表。我们一次调整图像4的大小。然后我们将它们一次转换为黑白两种,我们可以将转换作为此过程的GPU部分。这是代码的样子:
def im_resize(im, num1, num2):
return im.resize((num1, num2), Image.ANTIALIAS)
def convert_bw(im):
return im.convert('L')
def read_images(path):
imlist = []
for pathAndFileName in glob.iglob(os.path.join(path, "*")):
if pathAndFileName.endswith(tuple([".jpg", ".JPG"])):
imlist.append(Image.open(pathAndFileName))
return imlist
img_list = read_images("path/to/images/")
final_img_list = []
for image in img_list:
# Resize needs to run concurrently on 4 processes so that the next img_tmp is always ready to go for convert
img_tmp = im_resize(image, 100, 100)
# Convert is limited, need to run on 2 processes
img_tmp = convert_bw(img_tmp)
final_img_list.append(img_tmp)
具体数量的进程等原因是系统性能指标,这将减少运行时间。我只是想确保GPU不必等待CPU完成处理图像,并且我希望有一个常量队列填充预处理的图像,以便GPU运行。我希望在队列上保持大约4-10个预处理图像的最大尺寸。如果你们可以帮助我说明如何通过这个简化的例子实现这一目标,我相信我可以弄清楚如何将它转化为我需要的东西。
谢谢!
答案 0 :(得分:2)
这是试图实现你想要的尝试:
...
# Mapping functions can only take one arg, we provide tuple
def img_resize_splat(a):
img_resize(*a)
if __name__=="__main__":
# Make a CPU pool and a GPU pool
cpu = Pool(4)
gpu = Pool(2)
# Hopefully this returns an iterable, and not a list with all images read into memory
img_list = read_images("path/to/images/")
# I'm assuming you want images to be processed as soon as ready, order doesn't matter
resized = cpu.imap_unordered(img_resize_splat, ((img, 100, 100) for img in img_list))
converted = gpu.imap_unordered(convert_bw, resized)
# This is an iterable with your results, slurp them up one at a time
for bw_img in converted:
# do something