我的目标是允许我的Tensorflow数据集管道允许接近任意大小的输入,这些输入将转换为统一(在“编译”时已知的)大小的样本,其数量比原始样本多。因此,我有一个py_func(类似于1的想法,即一对多映射),旨在返回一个数据集以用于flat_map
def split_fn(x, y):
""" Splits X into a number of subsamples, each labeled y"""
full_width = x.shape[1]
full_height = x.shape[0]
print(full_width)
print(full_height)
slice_width = SLICE_WIDTH
slice_height = SLICE_HEIGHT
# The splits created by these offset cover the complete input image
offsets1 = [[x,0] for x in range(0, full_width-slice_width, slice_width)]
if full_width % slice_width != 0:
offsets1.append([full_width-slice_width, 0])
# The splits from these offsets are random, intended for data augmentation
offsets2 = [[x,0] for x in random.sample(range(0,full_width-slice_width), 5)]
#Combine the two lists of offsets
offsets = offsets1 + offsets2
image = x.reshape(1, full_height, full_width, 1)
#This creates a list of the slices corresponding to the offsets
ts = list(map(lambda offset: tf.image.crop_to_bounding_box(image,
offset[1],
offset[0],
slice_height,
slice_width),
offsets))
#Create and concatenate a dataset for each of the samples
datasets = map(lambda d: tf.data.Dataset.from_tensors((d, y)), ts)
ds = reduce((lambda x, y: x.concatenate(y)), datasets)
return ds
但是,在哪里定义offsets1
TypeError:__index__返回了非整数(类型NoneType)
。我试图通过将其包装在返回数据集的py_func中来解决此问题
dataset = dataset.flat_map(
lambda image, label: tuple(tf.py_func(
split_fn, [image, label], [tf.data.Dataset])))
但是我似乎无法正常工作:
TypeError:参数'Tout'的期望数据类型不是
。
我该怎么做才能使它正常工作?
谢谢