拆分Tfrecord文件

时间:2018-01-10 10:05:25

标签: tensorflow tfrecord

我有大约8 G的tfrecord文件。我想把它分成4个文件,每个文件大约2个G.我怎么能直接这样做?我可以在tensorflow中这样做吗?是否有任何应用程序来分割tfrecord数据?

1 个答案:

答案 0 :(得分:0)

我不知道如何指定tfrecord文件的结果大小。但是,您当然可以限制tfrecord文件中的功能数量。知道这并不是你要求的,它可以完成同样的工作。

以下是我过去处理此情况的示例代码(请参阅完整代码here):

fragment_size是一个tfrecord文件中的要素数量)

for video_count in range((num_videos)):

    if video_count % fragment_size == 0:
        if writer is not None:
            writer.close()
            filename = os.path.join(destination_path, name + str(
                current_batch_number) + '_of_' + str(
                total_batch_number) + '.tfrecords')
            print('Writing', filename)
            writer = tf.python_io.TFRecordWriter(filename)

        for image_count in range(num_images):
            path = 'blob' + '/' + str(image_count)
            image = data[video_count, image_count, :, :, :]
            image = image.astype(color_depth)
            image_raw = image.tostring()

            feature[path] = _bytes_feature(image_raw)
            feature['height'] = _int64_feature(height)
            feature['width'] = _int64_feature(width)
            feature['depth'] = _int64_feature(num_channels)

        example = tf.train.Example(features=tf.train.Features(feature=feature))
        writer.write(example.SerializeToString())
if writer is not None:
    writer.close()