Question

我有兴趣重现zf net的prototxt文件中的步骤。我不确定的部分是softmax层。 rpn_cls_score在这里用尺寸（1,18，h，w）创建：

layer {
  name: "rpn_cls_score"
  type: "Convolution"
  bottom: "rpn/output"
  top: "rpn_cls_score"
  convolution_param {
    num_output: 18   # 2(bg/fg) * 9(anchors)
    kernel_size: 1 pad: 0 stride: 1
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
  }
}

然后将其重新塑造为尺寸（1,2,9 * h，w）：

layer {
   bottom: "rpn_cls_score"
   top: "rpn_cls_score_reshape"
   name: "rpn_cls_score_reshape"
   type: "Reshape"
   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
}

最后传递给softmax：

layer {
  name: "rpn_cls_prob"
  type: "Softmax"
  bottom: "rpn_cls_score_reshape"
  top: "rpn_cls_prob"
}

我的问题是这个。根据caffe在线文档，softmax采用一维输入，但rpn_cls_score_reshape具有维度（1,2,9 * h，w）。 softmax是否会对所有索引求和，还是选择一个规范轴并仅对其余索引求和（如C ++代码所示）？在这种情况下，这意味着它将rpn_cls_score_reshape分成两个数组，（1，channel = 1,9 * h，w）和（1，channel = 2,9 * h，w），每个数据对应一个数值索引，并在每一个中通过对9 * h * w分量的指数求和来执行softmax，然后将它们重新组合成具有原始尺寸（1,2,9 * h，w）的数组并将其返回为rpn_cls_prob。如果没有，softmax如何处理具有多个维度的输入数组？

谢谢..

Answer 1

caffe.proto中记录了SofmaxParameter，它的参数轴默认设置为1：

// The axis along which to perform the softmax -- may be negative to index
// from the end (e.g., -1 for the last axis).
// Any other axes will be evaluated as independent softmaxes.
optional int32 axis = 2 [default = 1];

因此，您对C ++实现的理解是正确的，并且关于softmax如何使用N＆gt;处理ND输入的问题。 1是分别评估每个轴至于更快的RCNN，如果你只对前景框感兴趣，你可以只拆分rpn_cls_score blob并仅使用它的后半部分（即在训练你的网络集num_output: 9 # instead of 18之后），或者在训练期间使用{{1} }图层只占用下半场）。如果您像往常一样训练并在训练后更改Slice，请注意相应更改caffemodel。

Pyfaster RCNN ZF网络模型中的Softmax输入维度

1 个答案: