Keras LSTM,更多的时期会降低准确性和召回率

时间:2019-04-01 21:38:17

标签: python machine-learning keras time-series lstm

我正在研究一个多标签问题,它具有38个输入变量,并以6个一键编码矢量的形式输出。共有7个不同的时间序列,显示了我希望模型学习的不同行为。数据是不平衡的,因为每个时间序列开始时大约20%的所有标签上都有“ False”,而每个时间序列的后80%数据点至少有一个标签“ Positive”(第一个时间序列都是“假”。

我尝试了许多不同的方法来改进我的模型,但是我一直在努力的基本问题仍然存在:运行的时间越长,模型在召回方面的性能就越差(通常将所有数据点标记为错误/无效假设)。此外,我将看到前两个或三个时间序列的损失减少函数,然后最后一个在整个时期都将具有平坦/不变的损失,即使我为此运行了300个时期也是如此。

有人在我的代码中看到任何严重的错误需要改进吗? 我以为这可能是不平衡数据的组合,并且分别静态地选择了训练和测试集作为每个时间序列的第一个数据点的70%,并测试了后者的30%。但是,除了将时间序列改组以致破坏数据的时间顺序之外,我不确定如何以其他方式实现此目的。

如果有人能向正确的方向推动我,我将非常感激!

class LSTM:

    def __init__(self): 
        self.x_testsets = {}
        self.y_testsets = {} 
        self.variable_split = 0
        self.results_total = pd.DataFrame(columns=['NoRot_P', 'NoThrust_P', 'NoRot_B', 'NoThrust_B','NoRot_S','NoThrust_S','pred_NoRot_P','pred_NoThrust_P', 
                                                   'pred_NoRot_B','pred_NoThrust_B','pred_NoRot_S','pred_NoThrust_S'])



    def model_initialize(self, datasets):
        num_datapoints = 0 
        input_dim = 38
        # One timestep at a time, ensure that statefulness is enabled. 
        timesteps = 1
        batch_size_c = 12
        epochs = 7
        iterate = 1
        total_epochs = epochs * len(datasets)
        epochs_done = 0
        set_names = ['NullHypothesis','behaviour1','behaviour2','behaviour3',
               'behaviour4','behaviour5','behaviour6']
        for k in datasets:
            break_flag = 0
            # Uncomment to shuffle
            #df = df.sample(frac=1).reset_index(drop=True)


            num_datapoints = datasets[k].shape[0] + num_datapoints


            split_num = 0.7
            train_elements = 0
            test_elements = 0 

            # Delete rows until both the test and training sets are divisible with batch size
            if(self.variable_split == 0): 
                if(break_flag):
                    break_flag = 0
                    break
                deletedRows = 0
                split = int(len(datasets[k])*split_num)
                train_elements = split
                test_elements = len(datasets[k])-(split)
                while True: 
                    if ((train_elements%batch_size_c == 0) and (test_elements%batch_size_c == 0)):
                        #print("Sets are now divisble by batch size - deleted " + str(deletedRows) +" rows")
                        break_flag = 1
                        break
                    datasets[k].drop((datasets[k].shape[0]-1),inplace = True)
                    split = int(len(datasets[k])*split_num)
                    train_elements = split
                    test_elements = len(datasets[k])-(split)
                    deletedRows = deletedRows + 1

            x = datasets[k].iloc[:, :-6] # Prediction variables
            y = datasets[k].iloc[:, -6:] # 

            x_train, x_test, y_train, y_test = x[:split], x[split:], y[:split], y[split:]


            # Feature Scaling

            scaler = StandardScaler()
            x_train = scaler.fit_transform(x_train)
            x_test = scaler.transform(x_test)

            # LSTM expect a 3D matrix, so we reshape the training data with an extra
            # singular dimension to satisfy the dimension requirements
            x_train = np.reshape(x_train, (x_train.shape[0],1,x_train.shape[1]))
            x_test = np.reshape(x_test, (x_test.shape[0],1,x_test.shape[1]))

            self.x_testsets[k] = x_test
            self.y_testsets[k] = y_test


            if(hasattr(self,'model')):
                self.model.reset_states()
            else:
                self.model = Sequential()
                self.model.add(LSTM(batch_size_c, return_sequences=False, batch_size = batch_size_c,stateful=True,
                               input_shape=(timesteps, input_dim)))  
                self.model.add(Dense(512, activation='linear'))
                self.model.add(LeakyReLU(alpha=0.3))
                self.model.add(Dense(512, activation='linear'))
                self.model.add(LeakyReLU(alpha=0.3))
                self.model.add(Dense(512, activation='linear'))
                self.model.add(LeakyReLU(alpha=0.3))
                self.model.add(Dense(512, activation='linear'))
                self.model.add(LeakyReLU(alpha=0.3))
                self.model.add(Dense(32, activation='linear'))
                self.model.add(LeakyReLU(alpha=0.3))
                self.model.add(Dense(32, activation='linear'))
                self.model.add(LeakyReLU(alpha=0.3))
                #self.model.add(LeakyReLU(alpha=0.3))
                self.model.add(Dense(6, activation='sigmoid'))
                opt = RMSprop(lr=0.001)
                self.model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['binary_accuracy'])

history = np.zeros(epochs) 
            for i in range(epochs):
                acc = self.model.fit(x_train, y_train, epochs=1, batch_size=batch_size_c, verbose=0, shuffle=False)
                epochs_done = epochs_done + 1
                if(epochs_done/total_epochs > 0.1*iterate):
                    print("Percent done: ",int(100.0*epochs_done/total_epochs))
                    iterate = iterate + 1                
                history[i] = acc.history['loss'][0]
                self.model.reset_states()    
            print("Plot for set: ", set_names[k])
            plt.plot(range(0,epochs),history)
            plt.title('model loss')
            plt.ylabel('loss')
            plt.xlabel('epoch')
            plt.legend(['train'], loc='upper left')
            plt.show()



            y_pred = self.model.predict(x_test,batch_size=batch_size_c)
            y_pred = (y_pred > 0.5)

            data = {'NoRot_P': y_test.iloc[:,0], 'NoThrust_P': y_test.iloc[:,1],'NoRot_B': y_test.iloc[:,2], 'NoThrust_B': y_test.iloc[:,3],
                    'NoRot_S': y_test.iloc[:,4], 'NoThrust_S': y_test.iloc[:,5],'pred_NoRot_P': y_pred[:,0], 'pred_NoThrust_P': y_pred[:,1],
                    'pred_NoRot_B': y_pred[:,2],'pred_NoThrust_B': y_pred[:,3],'pred_NoRot_Star': y_pred[:,4],'pred_NoThrust_S': y_pred[:,5]}
            results = pd.DataFrame(data = data)
            self.results_total = self.results_total.append(results,ignore_index=True)
        self.model.summary()




if (__name__ == '__main__'):
    random.seed(42)
    df_dictionary = {}

    df_dictionary[0] = pd.read_csv('trainingdata_nullhypothesis.csv', encoding = "ISO-8859-1")
    df_dictionary[1] = pd.read_csv('trainingdata1.csv', encoding = "ISO-8859-1")
    df_dictionary[2] = pd.read_csv('trainingdata2.csv', encoding = "ISO-8859-1")
    df_dictionary[3] = pd.read_csv('trainingdata3.csv', encoding = "ISO-8859-1")
    df_dictionary[4] = pd.read_csv('trainingdata4.csv', encoding = "ISO-8859-1")
    df_dictionary[5] = pd.read_csv('trainingdata5.csv', encoding = "ISO-8859-1")
    df_dictionary[6] = pd.read_csv('trainingdata6.csv', encoding = "ISO-8859-1")

    ann = LSTM()
    ann.model_initialize(df_dictionary)

    final = ann.results_total
    wrong_prediction = 0
    right_prediction = 0
    true_negative = 0
    true_positive = 0
    false_negative = 0 
    false_positive = 0
    total_num_predictions = int(final.shape[0]*(final.shape[1]*0.5))
    for i in range(0,len(final)):
        for k in range(0,int(final.shape[1]/2)):
            if(final.iloc[i,k] != final.iloc[i,k+6]):
                wrong_prediction = wrong_prediction + 1
                if(final.iloc[i,k] == 1 and final.iloc[i][k+6] == 0):
                    false_negative = false_negative + 1
                else: 
                    false_positive = false_positive + 1
            else:
                right_prediction = right_prediction + 1
                if(final.iloc[i,k] == 1 and final.iloc[i][k+6] == 1):
                    true_positive = true_positive + 1
                else: 
                    true_negative = true_negative + 1


    accuracy = right_prediction/total_num_predictions
    print("Accuracy: ", accuracy)
    recall = true_positive/(true_positive+false_negative)
    print("False positives: ", false_positive)
    print("Recall: ", recall)`

编辑:我一直在尝试优化器(RMSprop)的学习率,并且得到了一些奇怪的行为。有时,我的召回值是零,但准确性很高,这意味着系统不会将单个标签标记为正。可能会有更多的时期导致解决方案陷入局部最优状态吗?

是否有可能在LSTM中对时间序列进行改组并获得良好的结果,或者这是否危及将时间维包括在机器学习中的整个观点?

Edit2:一直在尝试使用不同的损失函数,以查看一段时间内非减少损失的问题是否消失了,似乎没有。二进制和分类交叉熵都已尝试过,S型和softmax分别作为最后一层激活函数。 Non-decreasing loss, despite even running 200+ epochs

0 个答案:

没有答案