Question

当我使用.fit()图层训练模型时，参数shuffle预设为True。

假设我的数据集有100个样本且批量大小为10.当我设置shuffle = True时，keras首先随机选择样本（现在100个样本具有不同的顺序）和新订单它将开始创建批次：批次1：1-10，批次2：11-20等。

如果我设置shuffle = 'batch'它应该如何在后台运行？直观地使用前面的100个样本数据集的例子，批量大小= 10，我的猜测是keras首先将样本分配给批次（即批次1：数据集原始订单后的样本1-10，批次2：11-20以下数据集原始顺序，批次3 ......等等）然后洗牌批次的顺序。因此，模型现在将按随机订购的批次进行培训，例如：3（包含样品21 - 30），4（包含样品31 - 40），7（包含样品61 - 70），1（包含样品1 - 10）），...（我编制了批次的顺序）。

我的想法是对的，还是我错过了什么？

谢谢！

Answer 1

查看此link（training.py第349行）的实施情况，答案似乎是正面的。

尝试使用以下代码进行检查：

import numpy as np
def batch_shuffle(index_array, batch_size):
    """Shuffles an array in a batch-wise fashion.
    Useful for shuffling HDF5 arrays
    (where one cannot access arbitrary indices).
    # Arguments
        index_array: array of indices to be shuffled.
        batch_size: integer.
    # Returns
        The `index_array` array, shuffled in a batch-wise fashion.
    """
    batch_count = int(len(index_array) / batch_size)
    # to reshape we need to be cleanly divisible by batch size
    # we stash extra items and reappend them after shuffling
    last_batch = index_array[batch_count * batch_size:]
    index_array = index_array[:batch_count * batch_size]
    index_array = index_array.reshape((batch_count, batch_size))
    np.random.shuffle(index_array)
    index_array = index_array.flatten()
    return np.append(index_array, last_batch)


x = np.array(range(100))
x_s = batch_shuffle(x,10)

.fit（）层的shuffle ='batch'参数如何在后台运行？

1 个答案: