如何查看LDA中每个主题的所有文档?

时间:2018-08-05 13:13:04

标签: python python-3.x scikit-learn lda topic-modeling

我正在使用LDA来了解精彩文本的主题。我设法打印了主题,但我想将每个文本与您的主题一起打印。

数据:

it's very hot outside summer
there are not many flowers in winter
in the winter we eat hot food
in the summer we go to the sea
in winter we used many clothes
in summer we are on vacation
winter and summer are two seasons of the year

我尝试使用sklearn,可以打印主题,但是我想打印属于每个主题的所有短语

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import numpy as np
import pandas

dataset = pandas.read_csv('data.csv', encoding = 'utf-8')
comments = dataset['comments']
comments_list = comments.values.tolist()

vect = CountVectorizer()
X = vect.fit_transform(comments_list)

lda = LatentDirichletAllocation(n_topics = 2, learning_method = "batch", max_iter = 25, random_state = 0)

document_topics = lda.fit_transform(X)

sorting = np.argsort(lda.components_, axis = 1)[:, ::-1]
feature_names = np.array(vect.get_feature_names())

docs = np.argsort(comments_list[:, 1])[::-1]
for i in docs[:4]:
    print(' '.join(i) + '\n')

好的输出:

Topic 1
it's very hot outside summer
in the summer we go to the sea
in summer we are on vacation
winter and summer are two seasons of the year

Topic 2
there are not many flowers in winter
in the winter we eat hot food
in winter we used many clothes
winter and summer are two seasons of the year

1 个答案:

答案 0 :(得分:1)

如果要打印文档,则需要指定它们。

print(" ".join(comments_list[i].split(",")[:2]) + "\n")
相关问题