Question

我需要创建一个python程序vanilla（不带库），该程序可以计算不同文档之间的文本文档相似度。

程序将文档作为输入，并为给定输入的单词计算字典（矩阵）。每个文档都包含一个句子，当一个新文档进入程序时，我们需要将其与其他文档进行比较以找到相似的文档。请参见下面的示例：

输入文字：

input_text = ["Why I like music", "Beer and music is my favorite combination",
               "The sun is shining", "How to dance in GTA5", ]

必须将句子转换为向量，请参见示例：

希望您能提供帮助。

Answer 1

这里有一些想法：

for word in set: if word not in word_list: word_list.append(word)