在函数中返回None:TypeError:类型为'NoneType'的对象没有len()

时间:2018-08-31 10:21:51

标签: python lda nonetype

我正在尝试打印LDA中每个主题的主题和文本。但是在打印主题后无提示会干扰我的脚本。我可以打印主题,但不能打印文本。

import pandas
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

n_top_words = 5
n_components = 5

def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        message = "Topic #%d: " % topic_idx
        message += " ".join([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]])

        return message

text = pandas.read_csv('text.csv', encoding = 'utf-8')
text_list = text.values.tolist()

tf_vectorizer = CountVectorizer()
tf = tf_vectorizer.fit_transform(text_list)

lda = LatentDirichletAllocation(n_components=n_components, learning_method='batch', max_iter=25, random_state=0)

doc_distr = lda.fit_transform(tf)

tf_feature_names = tf_vectorizer.get_feature_names()
print (print_top_words(lda, tf_feature_names, n_top_words))

doc_distr = lda.fit_transform(tf)
topics = print_top_words(lda, tf_feature_names, n_top_words)
for i in range(len(topics)):
    print ("Topic {}:".format(i))
    docs = np.argsort(doc_distr[:, i])[::-1]
    for j in docs[:10]:
       print (" ".join(text_list[j].split(",")[:2]))

我的输出:

Topic 0: no order mail received back 

Topic 1: cancel order wishes possible wish 

Topic 2: keep current informed delivery order 

Topic 3: faulty wooden box present side 

Topic 4: delivered received be produced urgent 

Topic 5: good waiting day response share 

随后出现此错误:

  File "lda.py", line 41, in <module>

    for i in range(len(topics)):

TypeError: object of type 'NoneType' has no len()

3 个答案:

答案 0 :(得分:2)

dput()函数(至少)存在四个问题。

第一个-导致当前问题的原因是-如果my_tibble为空,则for循环的主体将不执行,然后您的函数将(隐式)返回my_tibble <- structure(list(fruit = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Apple", "Banana", "Orange", "Strawberry"), class = "factor"), length = c(0.530543135476024, 0.488977737310336, 0.503193533328075, 0.560337485188931, 0.533439933009971, 0.611517111445543, 0.784118643975375, 0.362563771715571, 0.999994359802019, 0.956308812233702, 0.332481969543643, 0.562729609348448, 0.635908731579197, 0.565161511593215, 0.526448727581439, 0.429069715902935, 0.460919459557728, 0.444385050459595, 0.503366669668819, 0.618141816193079, 0.516525710744663, 0.481938965057342, 0.505085048888451, 0.457048653556098, 0.536921608675353, 0.511397571854412, 0.442487815464855, 0.50103115023886, 0.305442471161553, 0.424241364519466, 2.45596087585689e-09, 0.122698840602406, 0.131431902209926, 0.205210819820745, 0.154445620769804, 0.161286627937974), weight = c(0.0729778030869548, 0.0460942475327506, 0.0796304213241703, 0.0732813711244074, 0.0882995825748408, 0.127183436952234, 0.0670534170610057, 0.0622813564507915, 0.0290840877242033, 0.0283807418126428, 0.107361724942771, 0.119133737366527, 0.185844270761176, 0.108155205104857, 0.189750275168087, 0.0845939609954818, 0.146490609941214, 0.14150784543994, 0.122840037806175, 0.143552891056291, 0.16798564927051, 0.241024152676673, 0.237508762873311, 0.20455939607561, 0.316350856257808, 0.30730862083812, 0.184386251393058, 0.181923008217247, 0.332024894278287, 0.194530111145869, 0.0166977795512452, 0.0569762924658561, 0.0739793228272142, 0.0433330479654348, 0.099781312832018, 0.0396375225550451), length_sd = c(0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161, 0.0695477116271205, 0.0695477116271205, 0.0695477116271205, 0.0695477116271205, 0.0695477116271205, 0.0695477116271205), weight_sd = c(0.0292441784658992, 0.0292441784658992, 0.0292441784658992, 0.0292441784658992, 0.0292441784658992, 0.0292441784658992, 0.0292441784658992, 0.0292441784658992, 0.0292441784658992, 0.0292441784658992, 0.033755823218546, 0.033755823218546, 0.033755823218546, 0.033755823218546, 0.033755823218546, 0.033755823218546, 0.033755823218546, 0.033755823218546, 0.033755823218546, 0.033755823218546, 0.0611975080850528, 0.0611975080850528, 0.0611975080850528, 0.0611975080850528, 0.0611975080850528, 0.0611975080850528, 0.0611975080850528, 0.0611975080850528, 0.0611975080850528, 0.0611975080850528, 0.0290125579882519, 0.0290125579882519, 0.0290125579882519, 0.0290125579882519, 0.0290125579882519, 0.0290125579882519 )), class = c("grouped_df", "tbl_df", "tbl", "data.frame" ), row.names = c(NA, -36L), vars = "fruit", labels = structure(list( fruit = structure(1:4, .Label = c("Apple", "Banana", "Orange", "Strawberry"), class = "factor")), class = "data.frame", row.names = c(NA, -4L), vars = "fruit", drop = TRUE), indices = list(0:9, 20:29, 10:19, 30:35), drop = TRUE, group_sizes = c(10L, 10L, 10L, 6L), biggest_group_size = 10L)

第二个更微妙:如果print_top_words()不为空,则该函数将仅返回第一条消息,然后返回并退出-model.components_语句的定义:返回值(如果未指定值,则返回None)并退出该函数。

第三个问题是(当model.components_不为空时),该函数返回一个字符串,其中调用代码显然需要一个列表。这是一个细微的错误,因为字符串具有长度,因此return上的for循环似乎可以正常工作,但是None肯定不是您期望的值。

最后,该函数的名称非常错误,因为它不会“打印”任何内容-与前三个问题相比,这似乎微不足道,并且不会阻止代码的确起作用(假设前三个问题是固定),但是代码推理本身就很困难,因此正确命名 很重要,因为它可以大大减少认知负担并简化维护/调试工作。

长话短说:考虑一下您真正希望此功能执行的操作并适当地对其进行修复。由于我不确定您要做什么,因此我不会在此处发布“更正”的版本,但是以上说明应该会有所帮助。

NB:同样,您使用完全相同的参数调用model.components_range(len(topics))两次,这完全没有用,纯粹浪费了处理器周期(在最佳情况下)或发出了气味如果您从第二次调用中获得了不同的结果,则会发现另一个错误。

答案 1 :(得分:1)

您没有提供完整的代码,但是最可能的原因是变量topics为None。唯一可能发生的方法是,如果model.components_函数中的print_top_words是一个空集合,则该循环永远不会运行,并且该函数(隐式)返回None。检查集合的值。更好的是,选择在这种情况下要返回的值。

另一个无关的要点:您在每次迭代中初始化message变量,然后在每次迭代时将其返回。检查你的意思。

答案 2 :(得分:1)

如果不了解LatentDirichletAllocation的内部工作原理,这将很难回答。但是,它与components_有关,因为它的重复迭代会产生不同的结果。

您很可能可以通过更改以下内容来避免此错误:

print (print_top_words(lda, tf_feature_names, n_top_words))

doc_distr = lda.fit_transform(tf)
topics = print_top_words(lda, tf_feature_names, n_top_words)

收件人:

temp = print_top_words(lda, tf_feature_names, n_top_words)
print (temp)

doc_distr = lda.fit_transform(tf)
topics = print_top_words(temp)

第二次调用该函数时,model.components_不返回任何内容,因此跳过了循环,该函数不返回任何内容。

但是,我不确定这是否是代码的实际意图。看起来您可能希望print_top_words成为生成器?您将在for循环内返回,从而使其永远不会达到第二次迭代。这可能不是循环的目的。

相关问题