gensim lda权限在我尝试保存模型时被拒绝

时间:2018-07-19 19:36:06

标签: python text-mining gensim lda

我从LDA主题模型程序中得到了一些不规则的行为,现在,看来我的文件不会保存它创建的lda模型...我真的不确定为什么。

这是一个代码段,尽管要花更多的时间才能编写可复制的代码,因为我实际上只是在尝试加载预先创建的某些文件。

def naive_LDA_implementation(name_of_lda, create_dict=False, remove_low_freq=False):

    LDA_MODEL_PATH = "lda_dir/" + str(name_of_lda) +"/model_dir/" # for some reason this location doesn't work entirely... and yes, I have made a directory in a the folder of this name.
    # This ends up saving the .state, .id2word, and .expEblogbeta.npy files... But normally when saving an lda model actually works, a fourth file is included that's to my understanding the model itself.
    # LDA_MODEL_PATH = "models/" # This is what I originally had as the location for LDA_MODEL_PATH. I was using a directory called models for multiple lda models. This no longer works.

    doc_df = getCorpus(name_of_lda, cleaned=True) # returns a dataframe containing a row for each text record and an extra column that contains the tokenized version of the text's post/string of words.
    dict_path = "lda_dir/" + str(name_of_lda) + "/dict_of_tokens.dict"
    docs_of_tokens = convert_cleaned_tokens_entries(doc_df['cleaned_tokens'])
    if create_dict != False:
        doc_dict = corpora.Dictionary(docs_of_tokens) :
        if remove_low_freq==True:
            doc_dict.filter_extremes(no_below=5, no_above=0.6)
        doc_dict.save(dict_path)
        print("Finished saving") 
    else:
        doc_dict = corpora.Dictionary.load(dict_path)
doc_term_matrix = [doc_dict.doc2bow(doc) for doc in docs_of_tokens] # gives a unique id for each word in corpus_arr

Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=15, id2word = doc_dict, passes=20, chunksize=10000)
ldamodel.save(LDA_MODEL_PATH)

简而言之...当我尝试将lda模型保存到特定位置时,我不知道为什么权限被拒绝。现在,即使是原始的models/目录位置也给我此错误消息“拒绝权限”。似乎我可以使用的所有目录都无法使用。这是奇怪的行为,我找不到在相同上下文中谈论此错误的询问。我发现有人实际上尝试将其存储在不存在的位置时收到此错误消息。但是对我来说,这并不是一个真正的问题。

当我第一次遇到此错误时,我实际上开始怀疑这是否是因为我有另一个lda主题模型,我将其命名为topic_model_1。它存储在models/子目录中。我开始怀疑这个名称是否是一个潜在的原因,然后将其更改为lda_model_topic_1以查看是否可以更改结果……但没有任何效果。

即使您不能真正弄清楚哪种解决方案适用于我的情况(尤其是由于目前我没有可重复的代码,我也只有我的工作)...有人可以告诉我此错误消息的含义吗?什么时候以及为什么出现?也许这是一个开始。

      Traceback (most recent call last):
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 679,
in save
    _pickle.dump(self, fname_or_handle, protocol=pickle_protocol)
TypeError: file must have a 'write' attribute

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "text_mining.py", line 461, in <module>
    main()
  File "text_mining.py", line 453, in main
    naive_LDA_implementation(name_of_lda="lda_model_topic_1", create_dict=True,
remove_low_freq=True)
  File "text_mining.py", line 411, in naive_LDA_implementation
    ldamodel.save(LDA_MODEL_PATH)
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\models\ldamodel.py",
line 1583, in save
    super(LdaModel, self).save(fname, ignore=ignore, separately=separately, *arg
s, **kwargs)
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 682,
in save
    self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_prot
ocol=pickle_protocol)
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 538,
in _smart_save
    pickle(self, fname, protocol=pickle_protocol)
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 1337,
 in pickle
    with smart_open(fname, 'wb') as fout:  # 'b' for binary, needed on Windows
  File "C:\Users\biney\Miniconda3\lib\site-packages\smart_open\smart_open_lib.py
", line 181, in smart_open
    fobj = _shortcut_open(uri, mode, **kw)
  File "C:\Users\biney\Miniconda3\lib\site-packages\smart_open\smart_open_lib.py
", line 287, in _shortcut_open
    return io.open(parsed_uri.uri_path, mode, **open_kwargs)
PermissionError: [Errno 13] Permission denied: 'lda_dir/lda_model_topic_1/model_
dir/'

1 个答案:

答案 0 :(得分:0)

似乎是因为您使用的是相对路径,所以您可能试图将其保存到SCRIPT_LAUNCH_PATH + lda_dir/lda_model_topic_1/model_dir/的位置,该位置不可写(可能是SCRIPT_LAUNCH_PATH实际上是您的{{1 }}-python解释器的安装目录。

您可以check your launch directory

PYTHONPATH

或(更好)将文件保存到绝对路径,例如:import os print(os.path.dirname(os.path.abspath(__file__))) (在Windows中,请记住将C:\Users\<youruser>\Documents\...交换为您的登录名),您应该在其中拥有所有写许可权。

另一个原因可能是您使用与创建目录不同的用户来运行脚本。