如何从张量流数据集生成动态数量的样本

时间:2018-09-06 22:41:29

标签: python tensorflow tensorflow-datasets

我的目标是允许我的Tensorflow数据集管道允许接近任意大小的输入,这些输入将转换为统一(在“编译”时已知的)大小的样本,其数量比原始样本多。因此,我有一个py_func(类似于1的想法,即一对多映射),旨在返回一个数据集以用于flat_map

def split_fn(x, y):
    """ Splits X into a number of subsamples, each labeled y"""
    full_width = x.shape[1]
    full_height = x.shape[0]
    print(full_width)
    print(full_height)

    slice_width = SLICE_WIDTH
    slice_height = SLICE_HEIGHT

    # The splits created by these offset cover the complete input image
    offsets1 = [[x,0] for x in range(0, full_width-slice_width, slice_width)]
    if full_width % slice_width != 0:
        offsets1.append([full_width-slice_width, 0])

    # The splits from these offsets are random, intended for data augmentation
    offsets2 = [[x,0] for x in random.sample(range(0,full_width-slice_width), 5)]

    #Combine the two lists of offsets
    offsets = offsets1 + offsets2


    image = x.reshape(1, full_height, full_width, 1)

    #This creates a list of the slices corresponding to the offsets
    ts = list(map(lambda offset: tf.image.crop_to_bounding_box(image,
                                                               offset[1],
                                                               offset[0],
                                                               slice_height,
                                                               slice_width),
                  offsets))
    #Create and concatenate a dataset for each of the samples
    datasets = map(lambda d: tf.data.Dataset.from_tensors((d, y)), ts)
    ds = reduce((lambda x, y: x.concatenate(y)), datasets)
    return ds

但是,在哪里定义offsets1

  

TypeError:__index__返回了非整数(类型NoneType)

。我试图通过将其包装在返回数据集的py_func中来解决此问题

dataset = dataset.flat_map(
lambda image, label:  tuple(tf.py_func(
split_fn, [image, label], [tf.data.Dataset])))

但是我似乎无法正常工作:

  

TypeError:参数'Tout'的期望数据类型不是

我该怎么做才能使它正常工作?

谢谢

0 个答案:

没有答案