提高LSTM在文本分类(三类问题)方面的表现

时间:2018-11-22 23:59:25

标签: machine-learning deep-learning lstm sentiment-analysis text-classification

我的问题是一个3类情感分析分类问题,每个问题有4000篇评论,平均每篇500字左右。情绪的数据集分布为1800负面1700中性和500正面。我正在尝试以下LSTM,但由于我一直在寻找如何通过更改参数来提高性能,所以我没有找到任何关于如何选择它们的特定规则,我发现的大多数答案都是“这取决于问题”,但是随着我是深度学习方面的新手,我真的不知道从哪里开始。我的模型获得了大约63%的准确度,并通过k = 5交叉值进行了测试。先感谢您。这是我到目前为止的代码:

data = pd.read_csv("nopreall.csv",header=0,encoding = 'UTF-8')
X = data['text']
Y = data['polarity']

x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2,random_state=0)  #split train/test data

batch_size = 64
epochs=5
max_len = 500
max_words=5000

tokenizer = Tokenizer(max_words)
tokenizer.fit_on_texts(x_train)

x_train= tokenizer.texts_to_sequences(x_train)
x_test= tokenizer.texts_to_sequences(x_test)
x_train=np.array(x_train)
x_test=np.array(x_test)


x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)

temp=np.array(y_test)

# create the model
embedding_vecor_length = 64
model = Sequential()
model.add(Embedding(max_words, embedding_vecor_length,input_length=max_len))
model.add(LSTM(100))
model.add(Dense(3, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

model.fit(x_train, y_train, validation_split=0.1, epochs=epochs, batch_size=batch_size, callbacks=callbacks_list)

#load the saved model
print("Loading Best Model Overall")
model.load_weights("weights.best.hdf5")
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#Final evaluation of the model
scores = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

0 个答案:

没有答案
相关问题