为什么在获得瓶颈功能后图像数量减少了?

时间:2019-05-04 06:58:40

标签: python keras conv-neural-network vgg-net transfer-learning

我试图通过使用预先训练的vgg16(在图像网络上训练)提取瓶颈特征来构建简单的5类对象检测器。我有10000张训练图像-每个课程2000张,测试2500张-每个课程500张。但是,一旦我提取了瓶颈特征,验证张量的大小为2496,但是预期大小应该为2500。我检查了文件夹数据文件夹,发现验证图像的总数为2500。但是我仍然得到尝试执行代码时发生错误。错误显示-“ ValueError:输入数组应具有与目标数组相同的样本数。找到2496个输入样本和2500个目标样本” 。我已经附上了下面的代码,有人可以让我理解为什么输入样本的数量减少到2496吗?

我刚刚检查了火车上的图像数量并测试了数据,以确保没有图像丢失。事实证明,实际上没有图像丢失。

这是获得瓶颈功能的代码。

global_start=dt.now()

#Dimensions of our flicker images is 256 X 256
img_width, img_height = 256, 256

#Declaration of parameters needed for training and validation
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
epochs = 50
batch_size = 16

#Get the bottleneck features by  Weights.T * Xi
def save_bottlebeck_features():
    datagen = ImageDataGenerator(rescale=1./255)

    #Load the pre trained VGG16 model from Keras, we will initialize only the convolution layers and ignore the top layers.
    model = applications.VGG16(include_top=False, weights='imagenet')

    generator_tr = datagen.flow_from_directory(train_data_dir,
                                            target_size=(img_width, img_height),
                                            batch_size=batch_size,
                                            class_mode=None, #class_mode=None means the generator won't load the class labels.
                                            shuffle=False) #We won't shuffle the data, because we want the class labels to stay in order.
    nb_train_samples = len(generator_tr.filenames) #10000. 2000 training samples for each class
    bottleneck_features_train = model.predict_generator(generator_tr, nb_train_samples // batch_size)
    np.save('weights/vgg16bottleneck_features_train.npy',bottleneck_features_train) #bottleneck_features_train is a numpy array

    generator_ts = datagen.flow_from_directory(validation_data_dir,
                                            target_size=(img_width, img_height),
                                            batch_size=batch_size,
                                            class_mode=None,
                                            shuffle=False)
    nb_validation_samples = len(generator_ts.filenames) #2500. 500 training samples for each class
    bottleneck_features_validation = model.predict_generator(generator_ts, nb_validation_samples // batch_size)
    np.save('weights/vgg16bottleneck_features_validation.npy',bottleneck_features_validation)
    print("Got the bottleneck features in time: ",dt.now()-global_start)

    num_classes = len(generator_tr.class_indices)

    return nb_train_samples,nb_validation_samples,num_classes,generator_tr,generator_ts

nb_train_samples,nb_validation_samples,num_classes,generator_tr,generator_ts=save_bottlebeck_features()

这是上面代码片段的输出:

Found 10000 images belonging to 5 classes.
Found 2500 images belonging to 5 classes.
Got the bottleneck features in time:  1:56:44.166846

现在,如果我执行validation_data.shape,我得到的是(2496,8,8,512),而预期的输出应该是(2500,8,8,512)。 train_data输出很好。可能是什么问题?我是Keras调试的新手,我真的无法弄清楚到底是什么引起了这个问题。

任何帮助将不胜感激!

0 个答案:

没有答案