使用多线程编写TFRecords时出现异常

时间:2017-10-29 10:29:52

标签: multithreading exception tensorflow python-multithreading tfrecord

我有一个巨大的视频数据集;对于每个视频,我有一个带有相应帧的文件夹 我正在为每个视频写一个TFRecord,使用SequenceExample,其中FeatureLists是视频的帧。

我正在使用python线程池迭代视频列表,其中每个线程都在一个视频上工作。然后,我使用张量流队列来操作帧。

我的脚本结构如下:

videos_id = os.listdir(dset_dir)    

def main_loop(video):
    frames_list = get_frames(video)
    filename_queue = tf.train.string_input_producer(frames_list)
    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)

    my_img = tf.image.decode_jpeg(value)
    # resize, etc ...

    init_op = tf.global_variables_initializer()
    sess = tf.InteractiveSession()
    with sess.as_default():
        sess.run(init_op)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    # accumulating images of 1 video
    image_list = []
    for i in range(len(frames_list)):
        image_list.append(my_img.eval(session=sess))

    coord.request_stop()
    coord.join(threads)

    writer = tf.python_io.TFRecordWriter(tfrecord_name)
    ex = make_example(image_list)
    writer.write(ex.SerializeToString())
    writer.close()
    sess.close()

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    future = {executor.submit(
        main_loop, video): video for video in videos_id}

在+ - 一千个视频之后,我得到以下异常(重复了很多次,对于不同的“Thread-id”):

Exception in thread Thread-344395:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/desktop/Documents/tensorflow-py3/lib/python3.5/site-packages/tensorflow/python/training/queue_runner_impl.py", line 254, in _run
    coord.request_stop(e)
  File "/home/desktop/Documents/tensorflow-py3/lib/python3.5/site-packages/tensorflow/python/training/coordinator.py", line 211, in request_stop
    six.reraise(*sys.exc_info())
  File "/home/desktop/Documents/tensorflow-py3/lib/python3.5/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/desktop/Documents/tensorflow-py3/lib/python3.5/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "/home/desktop/Documents/tensorflow-py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1235, in _single_operation_run
    target_list_as_strings, status, None)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/desktop/Documents/tensorflow-py3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled
     [[Node: input_producer_319/input_producer_319_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](input_producer_319, input_producer_319/Identity)]]

知道为什么会这样吗? 提前谢谢。

1 个答案:

答案 0 :(得分:0)

我正在使用这种明显更简洁的方法来阻止协调员。 不确定它是否可以提供帮助。

# ....
# this will throw an OutOfRange exeption after 1  epoch, i.e. one video
filename_queue = tf.train.string_input_producer(frames_list, num_epochs=1)

# ....

coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

# ...

# After everything is built, start the loop.
try:
    while not coord.should_stop():
        #read you frame
except tf.errors.OutOfRangeError:
     # means the loop has finished
     # write yuor tfrecord
finally:
     # When done, ask the threads to stop.
      coord.request_stop()