Using `model.set_weights()` yields incorrect behavior when MLC is enabled #261

faustomorales · 2021-05-14T16:05:28Z

EDIT: This issue is related to any use of model.set_weights() (see next comment).

When using tf.keras.callbacks.EarlyStopping with restore_best_weights=True, it seems that the model inference post-training does not work properly. This would seem like something that would be a problem in mainline TensorFlow but I've tried to reproduce with a plain install on Linux and have been unsuccessful. Consider the following trivial example where we train a linear model to learn the function y = 1.5x - 1.

import numpy as np
import tensorflow as tf

X = np.tile(np.arange(10), reps=100)
y = 1.5 * X - 1
X_train, y_train, X_val, y_val, X_test, y_test = X[:500], y[:500], X[500:750], y[500:750], X[750:], y[750:]

print("tf:", tf.__version__)
for restore_best_weights in [True, False]:
    print("restore_best_weights:", restore_best_weights)
    model = tf.keras.models.Sequential([tf.keras.layers.Input((1, )), tf.keras.layers.Dense(1)])
    model.compile(loss="mse", optimizer="rmsprop")
    model.fit(
        X_train,
        y_train,
        validation_data=(X_val, y_val),
        callbacks=[
            tf.keras.callbacks.EarlyStopping(
                monitor="val_loss",
                min_delta=1e-3,
                patience=50,
                restore_best_weights=restore_best_weights
            ),
        ],
        epochs=500,
        verbose=0
    )
    print(
        "mse:", round(model.evaluate(X_test, y_test)),
        "weights:", [a.flatten()[0] for a in model.get_weights()]
    )

Here is the output using tensorflow_macos.

tf: 2.4.0-rc0
restore_best_weights: True
8/8 [==============================] - 0s 284us/step - loss: 387007348736.0000
mse: 387007348736 weights: [1.4947473, -0.967283]
restore_best_weights: False
8/8 [==============================] - 0s 289us/step - loss: 1.6617e-06
mse: 0 weights: [1.4997954, -1.0002267]

Here is the output using mainline TensorFlow on a Google Colab instance (using pip install tensorflow==2.4.0rc0 to make it apples-to-apples).

tf: 2.4.0-rc0
restore_best_weights: True
8/8 [==============================] - 0s 2ms/step - loss: 3.1445e-04
mse: 0 weights: [1.4946454, -0.9670778]
restore_best_weights: False
8/8 [==============================] - 0s 2ms/step - loss: 7.9514e-08
mse: 0 weights: [1.499929, -0.9998748]

This is pretty puzzling to me! The MacOS version is getting the right weights (1.5 and -1 for the kernel and slope, respectively) at the end of training regardless of the value of restore_best_weights. But, at inference time, we seem to be getting garbage out of the model. 🤔 Any ideas on what to investigate first?

The text was updated successfully, but these errors were encountered:

faustomorales · 2021-05-14T16:41:26Z

The plot thickens ... I was curious so added model.set_weights(model.get_weights()) just before the call to model.evaluate(), hoping that would knock something loose in the model's internal state and fix things. Instead, it actually made it so that model inference resulted in garbage regardless of the value of restore_best_weights. This makes reproducing this issue even simpler.

import numpy as np
import tensorflow as tf

X = np.tile(np.arange(10), reps=100)
y = 1.5 * X - 1

print("tf:", tf.__version__)
model = tf.keras.models.Sequential([tf.keras.layers.Input((1, )), tf.keras.layers.Dense(1)])
model.compile(loss="mse", optimizer="rmsprop")
model.fit(X, y, epochs=200, verbose=0)
print("before set_weights, mse:", model.evaluate(X, y), "weights:", [a.flatten()[0] for a in model.get_weights()])
model.set_weights(model.get_weights())
print("after set_weights, mse:", model.evaluate(X, y), "weights:", [a.flatten()[0] for a in model.get_weights()])

Output using tensorflow_macos ...

tf: 2.4.0-rc0
32/32 [==============================] - 0s 215us/step - loss: 1.8945e-06
before set_weights, mse: 1.8945354440802475e-06 weights: [1.4997802, -1.000234]
32/32 [==============================] - 0s 231us/step - loss: inf
after set_weights, mse: inf weights: [1.4997802, -1.000234]

Output using Google Colab ...

tf: 2.4.0-rc0
32/32 [==============================] - 0s 874us/step - loss: 2.0693e-07
before set_weights, mse: 2.0693499891422107e-07 weights: [1.4999211, -1.0000395]
32/32 [==============================] - 0s 785us/step - loss: 2.0693e-07
after set_weights, mse: 2.0693499891422107e-07 weights: [1.4999211, -1.0000395]

So it seems that any use of model.set_weights() (which is probably being used under the hood by restore_best_weights) results in a broken model state.

faustomorales · 2021-05-14T16:46:19Z

The issue can be avoided (albeit defeating the purpose of the fork) using:

import os
os.environ["TF_DISABLE_MLC"] = "1"

faustomorales changed the title ~~Using restore_best_weights=True yields incorrect behavior~~ Using model.set_weights() yields incorrect behavior May 14, 2021

faustomorales changed the title ~~Using model.set_weights() yields incorrect behavior~~ Using model.set_weights() yields incorrect behavior when MLC is enabled. May 14, 2021

faustomorales changed the title ~~Using model.set_weights() yields incorrect behavior when MLC is enabled.~~ Using model.set_weights() yields incorrect behavior when MLC is enabled May 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `model.set_weights()` yields incorrect behavior when MLC is enabled #261

Using `model.set_weights()` yields incorrect behavior when MLC is enabled #261

faustomorales commented May 14, 2021 •

edited

Loading

faustomorales commented May 14, 2021

faustomorales commented May 14, 2021

Using model.set_weights() yields incorrect behavior when MLC is enabled #261

Using model.set_weights() yields incorrect behavior when MLC is enabled #261

Comments

faustomorales commented May 14, 2021 • edited Loading

faustomorales commented May 14, 2021

faustomorales commented May 14, 2021

Using `model.set_weights()` yields incorrect behavior when MLC is enabled #261

Using `model.set_weights()` yields incorrect behavior when MLC is enabled #261

faustomorales commented May 14, 2021 •

edited

Loading