Question

在gensim LDA模型上训练了LDA模型后，我通过包装器随附的malletmodel2ldamodel函数用gensim槌将模型转换为。在转换之前和之后，主题词的分布是完全不同的。短槌版本在转换后返回非常罕见的主题词分布。

ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=13, id2word=dictionary)
model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldamallet)
model.save('ldamallet.gensim')

dictionary = gensim.corpora.Dictionary.load('dictionary.gensim')
corpus = pickle.load(open('corpus.pkl', 'rb'))
lda_mallet = gensim.models.wrappers.LdaMallet.load('ldamallet.gensim')
import pyLDAvis.gensim
lda_display = pyLDAvis.gensim.prepare(lda_mallet, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)

这是gensim原始实现的输出：

我看到有关此问题的一个错误已通过gensim的早期版本修复。我正在使用gensim = 3.7.1

Answer 1

Here is an optional function代替malletmodel2ldamodel使用（据报告存在错误）：

from gensim.models.ldamodel import LdaModel
import numpy

def ldaMalletConvertToldaGen(mallet_model):
    model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, alpha=mallet_model.alpha, eta=0, iterations=1000, gamma_threshold=0.001, dtype=numpy.float32)
    model_gensim.state.sstats[...] = mallet_model.wordtopics
    model_gensim.sync_state()
    return model_gensim

converted_model = ldaMalletConvertToldaGen(mallet_model)

我用了它，效果很好。

Gensim中malletmodel2ldamodel之后的主题词分布问题

1 个答案: