区域提案网络提出的错误地区提案太多

时间:2018-04-13 14:46:51

标签: tensorflow deep-learning object-detection

我正在尝试使用压盖数据集实现用于对象检测的功能金字塔网络(基本上将更快的r cnn中的单个要素替换为金字塔要素)。现在我想确保RPN是Faster-RCNN的第一个状态,所以我只训练了RPN。我训练了40001次迭代,初始lr为0.001,主干为ResNet_V2_50。但我遇到了几个问题:

  1. 我发现在5000次迭代后,重量基本上没有更新,并且渐变真的接近于零。 wired weight 我试图使用较低的lr,我在tf.train.AdamOptimizer中使用的epsilon是0.0001。重量似乎还没有更新。

  2. 至于预测,我觉得边界框预测是相当好的(绿色是基本事实,红色是预测框,如果锚点的重叠率大于0.7,则选择这些预测框。地面真相边界框),predicted bounding box,这意味着至少边界框回归损失可以使预测接近基本事实。但至于rpn_cls_loss,它将太多错误的提案归类为对象wrong proposals。我考虑过类不平衡问题(在腺体细胞数据集中,每个图像的平均锚定数仅为20个,只有85个训练图像),所以我随机选择128个正锚和128个负锚(如果正锚小于128,我random_shuffle索引,所以一个正的锚将出现几次),但它仍然无法正常工作。我不明白的一件事是,当我们计算损失时,我们只考虑一个小批量,但同时,我们将我们生成的所有锚分类为一个对象或非对象,那么这些锚如何呢?没有造成损失?网络能否为这些锚点做出正确的决定?

  3. 这是用于提取金字塔特征地图的代码:

    P5 = conv_layer_obj(conv5_3,"fpn_c5p5",[1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
    P4_ = conv_layer_obj(conv4_3, "fpn_c4p4", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
    P3_ = conv_layer_obj(conv3_3, "fpn_c3p3", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
    P2_ = conv_layer_obj(conv2_2, "fpn_c2p2", [1,256], training_state = training_state, activation_func = None, bias_state = True, padding = 'valid')
    #P4 = tf.add(tf.image.resize_bilinear(P5,[P4_.shape.as_list()[1],P4_.shape.as_list()[1]]),P4_)
    P4 = tf.add(utils.nearest_neighbor_upsampling(P5, scale = 2), P4_)  
    P3 = tf.add(utils.nearest_neighbor_upsampling(P4, scale = 2),P3_)
    P2 = tf.add(utils.nearest_neighbor_upsampling(P3, scale = 2),P2_)
    
    P5 = conv_layer_obj(P5,"fpn_p5", [3,256], training_state = training_state, activation_func = None, bias_state = True)
    
    P4 = conv_layer_obj(P4,"fpn_p4", [3,256], training_state = training_state, activation_func = None, bias_state = True)    
    P3 = conv_layer_obj(P3,"fpn_p3", [3,256], training_state = training_state, activation_func = None, bias_state = True)
    
    P2 = conv_layer_obj(P2,"fpn_p2", [3,256], training_state = training_state, activation_func = None, bias_state = True)
    #P6 is used for the 5th anchor scale in RPN, generated by subsampling from P5 by strid 2. The shape should be
    #as same as P4
    P6 = tf.nn.max_pool(P5,[1,2,2,1],strides = [1,2,2,1], padding = 'VALID', name = "fpn_p6")
    

    这是rpn

    的代码
    anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, config.RPN_ANCHOR_RATIOS, config.BACKBONE_SHAPES,
                                             config.BACKBONE_STRIDES, config.RPN_ANCHOR_STRIDE)
    #Basically, in this step, the anchors that we generated is all the possible anchors in the image. The RPN_ANCHOR_SCALES is
    #32,64,128,256,512 which means there are 5 level, and the anchors returned is arranged in this order. The number mean if
    #The RPN_ANCHOR_RATIOS is 1, then the bounding box will include 32*32 pixels for the first level of scale. Here it's like
    #a selective research, we list all the possible anchors.
    
    layer_outputs = [] #list of lists
    index = 0
    #rpn_graph(feature_map, anchor_per_location, anchor_stride, reuse_state, training_state):
    for index, single_feature in enumerate(rpn_feature_maps): 
        if (index == 0):
            layer_outputs.append(rpn_graph(single_feature, len(config.RPN_ANCHOR_RATIOS), config.RPN_ANCHOR_STRIDE,
                                           reuse_state = False, training_state = training_state))
        else:
            layer_outputs.append(rpn_graph(single_feature, len(config.RPN_ANCHOR_RATIOS), config.RPN_ANCHOR_STRIDE,
                                           reuse_state = True, training_state = training_state))
    #Then concatenate the layer_outputs from [[a1,b1,c1],[a2,b2,c2]] to [[a1,a2],[b1,b2],[c1,c2]]
    output_names = ["rpn_class_logits","rpn_class","rpn_bbox"]
    outputs = list(zip(*layer_outputs))
    output_concate = []
    for o, n in zip(outputs, output_names):
        output_concate.append(tf.concat(list(o),axis=1,name = n))
    
    rpn_class_logits, rpn_class_prob, rpn_bbox = output_concate
    
    #Then we need to filter out those bounding boxes that are not satisfied the criterion. 
    proposal_count =  tf.cond(training_state,
                              lambda: config.POST_NMS_ROIS_TRAINING,
                              lambda: config.POST_NMS_ROIS_INFERENCE)
    
    #This is for proposing the #Number of bounding box that kind of satisfy the criterion, we assume that there are almost 2000
    #boxes in each image.
    proposallayer = ProposalLayer(proposal_count, config.RPN_NMS_THRESHOLD,anchors = anchors,config = config)
    rpn_rois = proposallayer.call([rpn_class_prob, rpn_bbox])
    

    rpn_graph是:

    def rpn_graph(feature_map, anchor_per_location, anchor_stride, reuse_state, training_state):
    """This function is for building up the region proposal graph
    
    Args:
    feature_map: The pyramide feature maps. They have different height, width, but same batch size, and Num_of_Channel. 
    anchor_per_location: The number of anchors for per pixel in the feature map
    anchor_stride: Controls the density of the anchor, most of time we set it to be 1. In our case, we will set it to be 1.
                    anchor_stride is only a int number
    training_state: since the training process for rpn and fast rcnn are different, so that the training state is a dynamic variable
    
    Returns:
    rpn_class_logits. Shape [Batch_Size, Num_of_Anchors, 2]
    rpn_class_prob. Shape [Batch_Size, Num_of_Anchors, 2]
    rpn_bbox. Shape [Batch_Size, Num_of_Anchors, 4] (dx,dy,log(dw),log(dh))
    """
    
    #Frist, build the shared feature maps, this is used for the rpn_class_logit, as same as rpn_bbox 
    #conv_layer_obj(bottom, name, shape, stride, bias_state, relu_state, padding):
    #conv_layer_obj(bottom, name, shape, training_state, strides = (1,1), activation_func = tf.nn.relu, padding='same', dilation_rate = (1,1), bias_state = True, reuse_state = False):
    shared = conv_layer_obj(feature_map,'rpn_conv_shared',shape = [3,512],strides = anchor_stride, bias_state = True, 
                            activation_func = tf.nn.relu, padding = 'same', reuse_state = reuse_state, training_state = training_state)
    
    #Then, let's do the rpn_class_logit, the output shape for the classifier should be [anchor_per_location*2]
    #2 is because either it is a object or it is the background!!!
    x = conv_layer_obj(shared,'rpn_class_classifier',shape = [1,anchor_per_location*2], strides = (1,1), bias_state = True,
                       activation_func = None, padding = 'same', reuse_state = reuse_state, training_state = training_state)
    rpn_class_logit = tf.reshape(x,[x.shape.as_list()[0],-1,2])
    rpn_class_prob = tf.nn.softmax(rpn_class_logit)
    
    #Then, let's do the bounding box refinement. The output shape for the classifier should be [anchor_per_location*4]
    #4 is because they are [ymin, xmin, ymax, xmax]
    x = conv_layer_obj(shared,'rpn_box_classifier',shape = (1,anchor_per_location*4), strides = (1,1), bias_state = True,
                       activation_func = None, padding = 'same', reuse_state = reuse_state, training_state = training_state)
    rpn_bbox = tf.reshape(x,[x.shape.as_list()[0],-1,4])
    
    return [rpn_class_logit,rpn_class_prob,rpn_bbox] 
    

    我真的很感谢你的帮助,提前感谢你的帮助!

0 个答案:

没有答案
相关问题