Question

我目前正在使用一个函数制作极长的词典（用于比较DNA字符串），有时候我会得到MemoryError。有没有办法为Python分配更多内存，以便它可以同时处理更多数据？

Answer 1

Python不会限制程序的内存使用量。它会根据您的程序需要分配尽可能多的内存，直到您的计算机内存不足为止。您可以做的最多是将限制降低到固定的上限。这可以通过resource模块完成，但它不是您正在寻找的。

您需要考虑使代码更具内存/性能。

Answer 2

如果你使用linux，你可以尝试Extend Memory with Swap - 一种运行程序的简单方法，这些程序需要比机器中安装的内存更多的内存。

但是，更好的方法是更新程序以便在可能的情况下处理数据块，或者扩展机器中的内存，因为使用此方法会导致性能下降（使用较慢的磁盘设备）。

Answer 3

Python有MomeoryError，这是您使用resource软件包手动定义的系统RAM实用程序的限制。

使用 slots 定义类使python解释器知道类的属性/成员是固定的。并可以节省大量内存！

您可以使用__slot__减少由python解释器创建的dict。这将告诉解释器不要在内部创建dict并重用相同的变量。

如果您的python进程消耗的内存将随着时间而继续增长。这似乎是以下各项的组合：

Python中的C内存分配器如何工作。这本质上是内存碎片，因为除非未使用整个内存块，否则分配无法调用“空闲”。但是内存块的使用通常不能完全与您正在创建和使用的对象对齐。
使用多个小字符串比较数据。内部使用了一个称为interning的过程，但是创建多个小字符串会给解释器带来负担。

最好的方法是创建工作线程或单线程池来执行工作，并使工作人员/杀手无效，以释放工作线程中附加/使用的资源。

下面的代码创建单线程工作程序：

__slot__ = ('dna1','dna2','lock','errorResultMap')
lock = threading.Lock()
errorResultMap = []
def process_dna_compare(dna1, dna2):
    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
        futures = {executor.submit(getDnaDict, lock, dna_key): dna_key for dna_key in dna1}
    '''max_workers=1 will create single threadpool'''
    dna_differences_map={}
    count = 0
    dna_processed = False;
    for future in concurrent.futures.as_completed(futures):
        result_dict = future.result()
        if result_dict :
            count += 1
            '''Do your processing XYZ here'''
    logger.info('Total dna keys processed ' + str(count))

def getDnaDict(lock,dna_key):
    '''process dna_key here and return item'''
    try:
        dataItem = item[0]
        return dataItem
    except:
        lock.acquire()
        errorResultMap.append({'dna_key_1': '', 'dna_key_2': dna_key_2, 'dna_key_3': dna_key_3,
                          'dna_key_4': 'No data for dna found'})
        lock.release()
        logger.error('Error in processing dna :'+ dna_key)
    pass

if __name__ == "__main__":
    dna1 = '''get data for dna1'''
    dna2 = '''get data for dna2'''
    process_dna_compare(dna1,dna2)
    if errorResultMap != []:
       ''' print or write to file the errorResultMap'''

以下代码将帮助您了解内存使用情况： 导入objgraph 随机导入进口检验

class Dna(object):
    def __init__(self):
        self.val = None
    def __str__(self):
        return "dna – val: {0}".format(self.val)

def f():
    l = []
    for i in range(3):
        dna = Dna()
        #print “id of dna: {0}”.format(id(dna))
        #print “dna is: {0}”.format(dna)
        l.append(dna)
    return l

def main():
    d = {}
    l = f()
    d['k'] = l
    print("list l has {0} objects of type Dna()".format(len(l)))
    objgraph.show_most_common_types()
    objgraph.show_backrefs(random.choice(objgraph.by_type('Dna')),
    filename="dna_refs.png")

    objgraph.show_refs(d, filename='myDna-image.png')

if __name__ == "__main__":
    main()

内存使用量输出：

list l has 3 objects of type Dna()
function                   2021
wrapper_descriptor         1072
dict                       998
method_descriptor          778
builtin_function_or_method 759
tuple                      667
weakref                    577
getset_descriptor          396
member_descriptor          296
type                       180

有关广告位的更多信息，请访问：https://elfsternberg.com/2009/07/06/python-what-the-hell-is-a-slot/

Answer 4

尝试将您的py从32位更新为64位。

只需在命令行中输入python，您就会看到您的python是哪个。 32位python中的内存非常低。

增加Python的内存限制？

4 个答案: