Question

目前我偶然发现了变量自动编码器，并尝试使用keras使它们在MNIST上运行。我在github找到了一个教程。

我的问题涉及以下几行代码：

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)

# Compile
vae.add_loss(vae_loss)
vae.compile(optimizer='rmsprop')

为什么使用add_loss而不是将其指定为编译选项？像vae.compile(optimizer='rmsprop', loss=vae_loss)这样的东西似乎不起作用并引发以下错误：

ValueError: The model cannot be compiled because it has no loss to optimize.

这个函数和自定义丢失函数有什么区别，我可以添加它作为Model.fit（）的参数？

提前致谢！

P.S。：我知道在github上存在几个与此有关的问题，但大多数问题都是开放的，没有注释。如果已经解决了这个问题，请分享链接！

修改我删除了将损失添加到模型中的行，并使用了编译函数的loss参数。它现在看起来像这样：

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss
xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)

# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)

这会引发TypeError：

TypeError: Using a 'tf.Tensor' as a Python 'bool' is not allowed. Use 'if t is not None:' instead of 'if t:' to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

EDIT2：替代方法 感谢@ MarioZ的努力，我能够找到解决方法。

# Build model
vae = Model(x, x_decoded_mean)

# Calculate custom loss in separate function
def vae_loss(x, x_decoded_mean):
    xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean)
    kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
    vae_loss = K.mean(xent_loss + kl_loss)
    return vae_loss

# Compile
vae.compile(optimizer='rmsprop', loss=vae_loss)

...

vae.fit(x_train, 
    x_train,        # <-- did not need this previously
    shuffle=True,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(x_test, x_test))     # <-- worked with (x_test, None) before

由于一些奇怪的原因，我必须在拟合模型时明确指定y和y_test。最初，我不需要这样做。生产的样品对我来说似乎很合理。

虽然我可以解决这个问题，但我仍然不知道这两种方法的差异/（dis-）优势是什么（除了需要不同的语法）。有人能给我更多的见解吗？谢谢！

Answer 1

我将尝试回答为什么使用model.add_loss()而不是为model.compile(loss=...)指定自定义损失函数的原始问题。

Keras中的所有损失函数始终采用两个参数y_true和y_pred。看一下Keras中可用的各种标准损失函数的定义，它们都有这两个参数。它们是“目标”（许多教科书中的Y变量）和模型的实际输出。大多数标准损失函数可以写为这两个张量的表达式。但是，不能以这种方式写出一些更复杂的损失。对于您的VAE示例，就是这种情况，因为损失函数还取决于附加张量，即z_log_var和z_mean，这些张量对于损失函数不可用。使用model.add_loss()没有这样的限制，并且允许您编写更复杂的，依赖于许多其他张量的损失，但是它具有更依赖于模型的不便，而标准损失函数仅适用于任何模型。

（注意：此处其他答案中提出的代码在某种程度上是作弊的，因为它们只是使用全局变量来潜入其他必需的依赖项。这使得损失函数在数学上不是真正的函数。我认为这减少代码的简洁性，我希望它更容易出错。）

Answer 2

试试这个：

import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
from scipy import stats
import tensorflow as tf
import seaborn as sns
from pylab import rcParams
from sklearn.model_selection import train_test_split
from keras.models import Model, load_model, Sequential
from keras.layers import Input, Lambda, Dense, Dropout, Layer, Bidirectional, Embedding, Lambda, LSTM, RepeatVector, TimeDistributed, BatchNormalization, Activation, Merge
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras import regularizers
from keras import backend as K
from keras import metrics
from scipy.stats import norm
from keras.utils import to_categorical
from keras import initializers
bias = bias_initializer='zeros'

from keras import objectives




np.random.seed(22)



data1 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')

data2 = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')


data3 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0], dtype='int32')

#train = np.zeros(shape=(992,54))
#test = np.zeros(shape=(921,54))

train = np.zeros(shape=(300,54))
test = np.zeros(shape=(300,54))

for n, i in enumerate(train):
    if (n<=100):
        train[n] = data1
    elif (n>100 and n<=200):
        train[n] = data2
    elif(n>200):
        train[n] = data3


for n, i in enumerate(test):
    if (n<=100):
        test[n] = data1
    elif(n>100 and n<=200):
        test[n] = data2
    elif(n>200):
        test[n] = data3


batch_size = 5
original_dim = train.shape[1]

intermediate_dim45 = 45
intermediate_dim35 = 35
intermediate_dim25 = 25
intermediate_dim15 = 15
intermediate_dim10 = 10
intermediate_dim5 = 5
latent_dim = 3
epochs = 50
epsilon_std = 1.0

def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim), mean=0.,
                              stddev=epsilon_std)
    return z_mean + K.exp(z_log_var / 2) * epsilon

x = Input(shape=(original_dim,), name = 'first_input_mario')

h1 = Dense(intermediate_dim45, activation='relu', name='h1')(x)
hD = Dropout(0.5)(h1)
h2 = Dense(intermediate_dim25, activation='relu', name='h2')(hD)
h3 = Dense(intermediate_dim10, activation='relu', name='h3')(h2)
h = Dense(intermediate_dim5, activation='relu', name='h')(h3) #bilo je relu
h = Dropout(0.1)(h)

z_mean = Dense(latent_dim, activation='relu')(h)
z_log_var = Dense(latent_dim, activation='relu')(h)

z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])

decoder_h = Dense(latent_dim, activation='relu')
decoder_h1 = Dense(intermediate_dim5, activation='relu')
decoder_h2 = Dense(intermediate_dim10, activation='relu')
decoder_h3 = Dense(intermediate_dim25, activation='relu')
decoder_h4 = Dense(intermediate_dim45, activation='relu')

decoder_mean = Dense(original_dim, activation='sigmoid')


h_decoded = decoder_h(z)
h_decoded1 = decoder_h1(h_decoded)
h_decoded2 = decoder_h2(h_decoded1)
h_decoded3 = decoder_h3(h_decoded2)
h_decoded4 = decoder_h4(h_decoded3)

x_decoded_mean = decoder_mean(h_decoded4)

vae = Model(x, x_decoded_mean)


def vae_loss(x, x_decoded_mean):
    xent_loss = objectives.binary_crossentropy(x, x_decoded_mean)
    kl_loss = -0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var))
    loss = xent_loss + kl_loss
    return loss

vae.compile(optimizer='rmsprop', loss=vae_loss)

vae.fit(train, train, batch_size = batch_size, epochs=epochs, shuffle=True,
        validation_data=(test, test))


vae = Model(x, x_decoded_mean)

encoder = Model(x, z_mean)

decoder_input = Input(shape=(latent_dim,))

_h_decoded = decoder_h  (decoder_input)
_h_decoded1 = decoder_h1  (_h_decoded)
_h_decoded2 = decoder_h2  (_h_decoded1)
_h_decoded3 = decoder_h3  (_h_decoded2)
_h_decoded4 = decoder_h4  (_h_decoded3)

_x_decoded_mean = decoder_mean(_h_decoded4)
generator = Model(decoder_input, _x_decoded_mean)
generator.summary()

Answer 3

JIH的答案当然是正确的，但是添加以下内容可能会很有用：

model.add_loss（）没有任何限制，但是它也降低了在model.fit（）中使用目标的便利性

如果损失取决于模型，其他模型或外部变量的其他参数，则仍可以通过在其中传递所有其他参数的封装函数来使用keras类型的封装损失函数：

def loss_carrier(extra_param1, extra_param2):
    def loss(y_true,y_pred):
        #x = complicated math involving extra_param1, extraparam2, y_true, y_pred
        #remember to use tensor objects, so for example keras.sum, keras.square, keras.mean
        #also remember that if extra_param1, extra_maram2 are variable tensors instead of simple floats,
        #you need to have them defined as inputs=(main,extra_param1, extraparam2) in your keras.model instantiation.
        #and have them defind as keras.Input or tf.placeholder with the right shape.
        return x
    return loss

model.compile(optimizer='adam', loss=loss_carrier)

技巧是最后一行，您将返回函数，因为keras希望它们仅带有两个参数y_true和y_pred

可能看起来比model.add_loss版本更复杂，但损失保持模块化。

Answer 4

我也想知道相同的查询和一些相关的东西，比如如何在中间层中添加损失函数。在这里我分享一些观察到的信息，希望它可以帮助其他人。确实，标准的 keras 损失函数只有两个参数，y_true 和 y_pred。但是在实验过程中，在使用这两个值 (y_true, y_pred) 进行计算时，可能会出现一些需要一些外部参数或系数的情况。这可以像往常一样在最后一层或模型层中间的某个地方需要。

`model.add_loss()`

接受的答案正确地说明了 model.add_loss() 函数。它可能取决于层输入（张量）。根据官方的doc，在编写自定义层或子类模型的 call 方法时，我们可能想要计算我们希望在训练期间最小化的标量（例如 regularization losses） .我们可以使用 add_loss() 层方法来跟踪此类损失项。例如，活动正则化损失取决于调用层时传递的输入。下面是一个基于输入的 L2 范数添加稀疏正则化损失的层示例：

from tensorflow.keras.layers import Layer

class MyActivityRegularizer(Layer):
  """Layer that creates an activity sparsity regularization loss."""

  def __init__(self, rate=1e-2):
    super(MyActivityRegularizer, self).__init__()
    self.rate = rate

  def call(self, inputs):
    # We use `add_loss` to create a regularization loss
    # that depends on the inputs.
    self.add_loss(self.rate * tf.reduce_sum(tf.square(inputs)))
    return inputs

通过 add_loss 添加的损失值可以在任何 .losses 或 Layer 的 Model 列表属性中检索（它们从每个底层递归检索）：

from tensorflow.keras import layers

class SparseMLP(Layer):
  """Stack of Linear layers with a sparsity regularization loss."""

  def __init__(self, output_dim):
      super(SparseMLP, self).__init__()
      self.dense_1 = layers.Dense(32, activation=tf.nn.relu)
      self.regularization = MyActivityRegularizer(1e-2)
      self.dense_2 = layers.Dense(output_dim)

  def call(self, inputs):
      x = self.dense_1(inputs)
      x = self.regularization(x)
      return self.dense_2(x)


mlp = SparseMLP(1)
y = mlp(tf.ones((10, 10)))

print(mlp.losses)  # List containing one float32 scalar

另请注意，使用 model.fit() 时，此类损失项会自动处理。在编写自定义训练循环时，我们应该从 model.losses 手动检索这些术语，如下所示：

loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Iterate over the batches of a dataset.
for x, y in dataset:
    with tf.GradientTape() as tape:
        # Forward pass.
        logits = model(x)
        # Loss value for this batch.
        loss_value = loss_fn(y, logits)
        # Add extra loss terms to the loss value.
        loss_value += sum(model.losses) # < ------------- HERE ---------

    # Update the weights of the model to minimize the loss value.
    gradients = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

`Custom losses`

使用 model.add_loss(), (AFAIK)，我们可以在网络中间的某个地方使用它。在这里，我们不再只绑定两个参数，即 y_true、y_pred。但是如果我们还想将外部参数或系数归入网络的最后一层损失函数呢？ Nric 的答案是正确的。但也可以通过实现以下两个方法，通过子类化 tf.keras.losses.Loss 类来实现：

__init__(self)：接受在调用损失函数期间传递的参数
call(self, y_true, y_pred)：使用目标 (y_true) 和模型预测 (y_pred) 来计算模型的损失

以下是通过继承 MSE 类的自定义 tf.keras.losses.Loss 示例。在这里，我们也不再只绑定两个参数，即 y_ture、y_pred。

class CustomMSE(keras.losses.Loss):
    def __init__(self, regularization_factor=0.1, name="custom_mse"):
        super().__init__(name=name)
        self.regularization_factor = regularization_factor

    def call(self, y_true, y_pred):
        mse = tf.math.reduce_mean(tf.square(y_true - y_pred))
        reg = tf.math.reduce_mean(tf.square(0.5 - y_pred))
        return mse + reg * self.regularization_factor

model.compile(optimizer=..., loss=CustomMSE())

Answer 5

您需要将编译行更改为 vae.compile（optimizer =＆＃39; rmsprop＆＃39;，loss = vae_loss）

keras中的add_loss函数

5 个答案:

`model.add_loss()`

`Custom losses`