访问共享内存然后从文件加载需要更长的时间?

时间:2018-10-02 13:44:39

标签: python multiprocessing shared-memory

我的主进程中加载​​的文件很大。我的目标是同时从内存中读取多个进程,以避免内存限制并使其更快(据我所知,从ram读取比从磁盘读取快得多)。

根据this的答案,我应该使用Shared ctypes Objects

  

构建管理器类型是为了灵活性而不是效率……这必然意味着复制任何有问题的对象。 ....如果要共享物理内存,建议使用Shared ctypes Objects。这些实际上确实指向内存中的一个公共位置,因此速度更快,资源更少。

所以我这样做了:

import time
import pickle
import multiprocessing
from functools import partial

def foo(_, v):
    tp = time.time()
    v = v.value
    print(hex(id(v)))
    print(f'took me {time.time()-tp} in process')

if __name__ == '__main__':
    # creates a file which is about 800 MB
    with open('foo.pkl', 'wb') as file:
        pickle.dump('aaabbbaa'*int(1e8), file, protocol=pickle.HIGHEST_PROTOCOL)

    t1 = time.time()
    with open('foo.pkl', 'rb') as file:
        contract_conversion = pickle.load(file)
    print(f'load took {time.time()-t1}')

    m = multiprocessing.Manager()
    vm = m.Value(str, contract_conversion, lock=False)  # not locked because i only read from it so its safe
    foo_p = partial(foo, v=vm)

    tpo = time.time()
    with multiprocessing.Pool() as pool:
       pool.map(foo_p, range(4))
    print(f'took me {time.time()-tpo} for pool stuff')

但是我可以看到这些进程使用a复制它(每个进程的内存非常高),并且比仅从磁盘读取数据要慢得多。


印刷品:

load took 0.8662333488464355
0x1c736ca0040
took me 2.286606550216675 in process
0x15cc0404040
took me 3.178203582763672 in process
0x1f30f049040
took me 4.179721355438232 in process
0x21d2c8cc040
took me 4.913192510604858 in process
took me 5.251579999923706 for pool stuff

id也不同,尽管我不确定id是否只是python标识符或内存位置。

1 个答案:

答案 0 :(得分:3)

您没有使用共享内存。那将是multiprocessing.Value,而不是multiprocessing.Manager().Value。您将字符串存储在管理器的服务器进程中,并通过TLS连接发送泡菜以访问该值。此外,在处理请求时,服务器进程受其自己的GIL限制。

我不知道这些方面中的每一个都对开销有多大的贡献,但总的来说要比读取共享内存贵。