Question

系统信息

您正在使用的模型的顶级目录是什么： object_detection / ssd_inception_v2
我是否编写过自定义代码（与使用TensorFlow中提供的库存示例脚本相反）：否
OS平台和发行版（例如，Linux Ubuntu 16.04）： Ubuntu 16.04
从（源代码或二进制代码）安装的TensorFlow：二进制文件
TensorFlow版本（使用下面的命令）： 1.2.1
Bazel版本（如果从源代码编译）：否
CUDA / cuDNN版本： cuda 8.0
GPU型号和内存： Quadro M6000 24GB

在我的自定义数据集上训练ssd_inception_v2模型后，我想用它进行推理。由于推理后来应该在没有GPU的设备上运行，所以我只是为了推断而切换到CPU。我调整了opject_detection_tutorial.ipynb以测量推理时间，并让以下代码在视频中的一系列图像上运行。

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    while success:
      #print(str(datetime.datetime.now().time()) + " " + str(count))
      #read image
      success,image = vidcap.read()
      #resize image
      image = cv2.resize(image , (711, 400))
      # crop image to fit 690 x 400
      image = image[ : , 11:691]
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image, axis=0)
      #print(image_np_expanded.shape)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')
      before = datetime.datetime.now()
      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      print("This took : " + str(datetime.datetime.now() - before))  
      vis_util.visualize_boxes_and_labels_on_image_array(
          image,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)

      #cv2.imwrite("converted/frame%d.jpg" % count, image)     # save frame as JPEG file
      count += 1

输出如下：
这需要：0：00：04.289925
这需要：0：00：00.909071
这需要：0：00：00.917636
这需要：0：00：00.908391
这需要：0：00：00.896601
这需要：0：00：00.908698
这花了：0：00：00.890018
这需要：0：00：00.896373
.....

当然，每张图像900毫秒的速度不足以进行视频处理。在阅读了很多主题之后，我看到了两种可能的改进方法：

图形转换工具：为了更快地获取冻结推理图。（我在犹豫是否尝试这个，因为据我所知，我必须从源代码构建TF，我通常对目前的安装感到满意）
替换喂养：似乎feed_dict = {image_tensor：image_np_expanded}不是向TF Graph提供数据的好方法。 QueueRunner对象可以在这里提供帮助。

所以我的问题是，如果上述两项改进有可能增加对实时使用的推断（10 - 20 fps），或者我在这里错误的路径并且应该尝试别的吗？欢迎任何建议。

CPU上的Tensorflow对象检测推理速度慢

0 个答案: