为什么我的多处理计数器比collections.Counter慢?

时间:2016-11-02 06:22:42

标签: python performance collections multiprocessing counter

我写了一个multiprocessing计数器,并将其与原生collections.Counter进行了比较。

为什么我的多处理计数器比collections.Counter慢?

[multi-count.py]:

import io
from collections import Counter
from multiprocessing import Process, Manager, Lock
import random
import time

class MultiProcCounter(object):
    def __init__(self):
        self.dictionary = Manager().dict()
        self.lock = Lock()

    def increment(self, item):
        with self.lock:
            self.dictionary[item] = self.dictionary.get(item, 0) + 1

def func(counter, item):
    counter.increment(item)

def multiproc_count(inputs):
    counter = MultiProcCounter()
    procs = [Process(target=func, args=(counter,_in)) for _in in inputs]
    for p in procs: p.start()
    for p in procs: p.join()
    return counter.dictionary

inputs = [random.randint(1,10) for _ in range(1000)]
start = time.time()
print (multiproc_count(inputs))
print (time.time() - start)
start = time.time()
print (Counter(inputs))
print (time.time() - start)

[OUT]:

{1: 88, 2: 95, 3: 99, 4: 98, 5: 102, 6: 111, 7: 99, 8: 103, 9: 97, 10: 108}
4.128664016723633
Counter({6: 111, 10: 108, 8: 103, 5: 102, 3: 99, 7: 99, 4: 98, 9: 97, 2: 95, 1: 88})
0.0006728172302246094

我用Python3运行它:

$ ulimit -n 2048
$ python3 multi-count.py

为了使任务更难,我将输入增加到10000,我得到一个OSError:

  File "multi-count.py", line 29, in <module>
    print (multiproc_count(inputs))
  File "multi-count.py", line 23, in multiproc_count
Process Process-2043:
    for p in procs: p.start()
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/process.py", line 105, in start
Traceback (most recent call last):
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/managers.py", line 709, in _callmethod
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/process.py", line 93, in run
  File "bpe-multi.py", line 18, in func
  File "bpe-multi.py", line 15, in increment
  File "<string>", line 2, in get
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/managers.py", line 713, in _callmethod
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/managers.py", line 700, in _connect
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/connection.py", line 487, in Client
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/connection.py", line 612, in SocketClient
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/socket.py", line 134, in __init__
OSError: [Errno 24] Too many open files
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/popen_fork.py", line 66, in _launch
    parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files

我无法在笔记本电脑上增加ulimit

$ ulimit -n 4096
-bash: ulimit: open files: cannot modify limit: Operation not permitted

使用multiprocesing.Pool

import io
from collections import Counter
from multiprocessing import Process, Manager, Lock, Pool
import random
import time


def func(counter, x):
    counter[x] = counter.get(x, 0) + 1


inputs = [random.randint(1,10) for _ in range(10000)]

manager = Manager()
counter = manager.dict()

pool = Pool(4)
for x in inputs:
    pool.apply_async(func, [counter, x])
pool.close()
pool.join()

print counter

[OUT]:

$ time python multi-count.py 
{1: 978, 2: 978, 3: 997, 4: 982, 5: 958, 6: 1033, 7: 1044, 8: 1008, 9: 1007, 10: 1004}

real    0m16.187s
user    0m18.817s
sys 0m14.055s

使用原生collections.Counter

$ time python3 -c 'import random; from collections import Counter; inputs = [random.randint(1,10) for _ in range(10000)]; print (Counter(inputs))'
Counter({6: 1067, 4: 1048, 3: 1021, 5: 1010, 9: 992, 7: 985, 8: 983, 1: 969, 2: 964, 10: 961})

real    0m0.099s
user    0m0.059s
sys 0m0.018s

$ time python3 -c 'import random; from collections import Counter; inputs = [random.randint(1,10) for _ in range(100000)]; print (Counter(inputs))'
Counter({9: 10159, 10: 10114, 8: 10046, 3: 10028, 7: 9998, 6: 9994, 2: 9982, 4: 9951, 1: 9898, 5: 9830})

real    0m0.236s
user    0m0.206s
sys 0m0.016s

0 个答案:

没有答案