错误的情绪预测,需要帮助

时间:2019-05-02 17:17:36

标签: machine-learning nlp

我有3个句子,包含正词和负词,并且已经应用​​了必要/标准的预处理技术。将所有这三个句子/列表及其对应的句子标记列表与Tfidf加权w2vector一起馈入预测函数,正确和否定句子的60%预测都是正确的。

但是,当发送单个或一对一的产品ID评论以预测功能时,尽管每个句子/列表中的每个单词都带有微笑,好听和令人惊讶的单词,但它们的极性却被预测为负面。

我想知道,当所有3条评论都发送一个镜头来预测功能时,它可以预测正面和负面评论,但是同一条评论所发送的所有评论中的3条评论被一一预测为负面。

有人可以告诉我这里缺少什么吗?

实际审查

smile
Did amazing on my husband. but the medication test was inappropriate.
Overall experience is wonderful.

存储在列表列表中的带格式评论

[['smile'], 
['amazing', 'husband', 'medication', 'test', 'inappropriate'], 
['overall', 'experience', 'good']]

所有3条评论的输出。

polarity_cnt_logistic: [1 0 1]
->here first & third '1's refers to formatted reviews of  first & third.' O' refers to the second review.

每个评论的预测输出

First Review:
polarity_cnt_logistic :[0]

printed below values just to verify inputs to predict function.

length of lst_sent: 1
Review: ['smile']
List of Sentence: ['smile']

Second Review:
polarity_cnt_logistic: [0 0 0 0 0 0]

printed below values just to verify inputs to predict function.

length of lst_of_sentance: 5
Actual Review: ['Did amazing on my husband. but the medication test was inappropriate.']
Formatted Review: ['amazing', 'husband', 'medication', 'test', 'inappropriate']
for idx,row in test_revs_df.iterrows():
    tf_idf = vectorizer_tst.transform(row[["formated_reviews"]])
    tfidf_features = vectorizer_tst.get_feature_names()
    feature_counts =tf_idf.sum(axis=0).A1
    feature_dict = dict(zip(list(tfidf_features),feature_counts))
    # call avg_tf_idf_word2vec function   
    text_feature_avg_tf_idf_w2v=avg_tf_idf_w2vec(tfidf_features,list_of_sentance_tst[idx]) 
    #predcit polarity for each review
    polarity_cnt_logistic=trained_model.predict(text_feature_avg_tf_idf_w2v)

每个格式的评论的预期预测输出。

Formatted Review: ['amazing', 'husband', 'medication', 'test', 'inappropriate']
polarity_cnt_logistic=[1 1 1 1 0 ] <- expected predicted output

Formatted Review=['smile']
polarity_cnt_logistic=[1]<- expected predicted output

0 个答案:

没有答案