修改大型数据集时发生Python3内存错误

时间:2019-06-28 12:53:56

标签: python-3.x memory out-of-memory

我正在尝试在python3(64位)中实现symspell,我有一个20 MB的txt文件,其中包含带有频率的单词。我可以成功地将数据加载到名为originalDictionary的字典中。对于字典中每个单词的下一步,我应该一次删除一个字符,并将修改后的单词添加到另一个名为editDictionary的字典中。但是我遇到了内存错误。 我正在具有16GB RAM的Windows10(x64)上运行此程序。 我该怎么解决这个问题?

 for word in originalDictionary:
    for i in range(len(word)):
        edit1 = word[0:i] + word[i + 1:]
        if edit1 not in editedDictionary:
            editedDictionary[edit1] = [word]
        else:
            editedDictionary[edit1].append(word)

以下是错误:

    Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2018.3.2\helpers\pydev\pydevd.py", line 1741, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm 2018.3.2\helpers\pydev\pydevd.py", line 1735, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm 2018.3.2\helpers\pydev\pydevd.py", line 1135, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2018.3.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/ee/PycharmProjects/SymSpell/spellCorrector.py", line 98, in <module>
    createDictionaries()
  File "C:/Users/ee/PycharmProjects/SymSpell/spellCorrector.py", line 40, in createDictionaries
    editedDictionary[edit1] = [word]
MemoryError

1 个答案:

答案 0 :(得分:0)

在最新版本的SymSpell算法中,您可以定义前缀长度。仅在此前缀内生成删除。较短的前缀长度会显着减少内存消耗,但以较慢的查找时间为代价。前缀长度= 5通常是一个不错的选择。

有一个可用的SymSpell Python端口,它支持设置前缀长度: https://github.com/mammothb/symspellpy