Question

我正在研究NLP问题。

我已下载预制嵌入权重以用于嵌入层。在嵌入层之前，我需要标记我的数据集，该数据集当前是句子串的形式。我想使用与预制嵌入层相同的索引来标记它。

有没有办法初始化Keras tokenizer（tensorflow.keras.preprocessing.text.Tokenizer）和预先制作的排序字典：.dropbtn { background-color: #34becd; color: white; padding: 10px; font-size: 12px; border: none; cursor: pointer; width: 200px; } .dropdown { position: relative; display: flex; flex-direction: column; } .dropdown-content { display: none; position: absolute; bottom: 0px; width: 200px; background-color: #f9f9f9; box-shadow: 0px 8px 16px 0px rgba(0, 0, 0, 0.2); z-index: 1; } .dropdown:hover .dropdown-content { display: flex; flex-direction: column; } .dropdown-content a { font-size: 12px; color: black; padding: 5px 10px; text-decoration: none; } .dropdown-content a:hover { background-color: #f1f1f1; } .dropdown-item { display: flex; justify-content: space-between; }所以它不会决定自己给哪个索引字？

Answer 1

您可以初始化令牌生成器对象并将单词索引手动分配给它。然后，您可以使用它为句子加索引。

token = text.Tokenizer()
token.word_index = {"the":1, "elephant":2}
token.texts_to_sequences(["the elephant"])

这将返回[[1，2]]

将Keras的tokenizer与premade索引字典一起使用

1 个答案: