Skip to content
This repository was archived by the owner on Jun 9, 2021. It is now read-only.

Using model.set_weights() yields incorrect behavior when MLC is enabled #261

Open
faustomorales opened this issue May 14, 2021 · 2 comments

Comments

@faustomorales
Copy link

faustomorales commented May 14, 2021

EDIT: This issue is related to any use of model.set_weights() (see next comment).

When using tf.keras.callbacks.EarlyStopping with restore_best_weights=True, it seems that the model inference post-training does not work properly. This would seem like something that would be a problem in mainline TensorFlow but I've tried to reproduce with a plain install on Linux and have been unsuccessful. Consider the following trivial example where we train a linear model to learn the function y = 1.5x - 1.

import numpy as np
import tensorflow as tf

X = np.tile(np.arange(10), reps=100)
y = 1.5 * X - 1
X_train, y_train, X_val, y_val, X_test, y_test = X[:500], y[:500], X[500:750], y[500:750], X[750:], y[750:]

print("tf:", tf.__version__)
for restore_best_weights in [True, False]:
    print("restore_best_weights:", restore_best_weights)
    model = tf.keras.models.Sequential([tf.keras.layers.Input((1, )), tf.keras.layers.Dense(1)])
    model.compile(loss="mse", optimizer="rmsprop")
    model.fit(
        X_train,
        y_train,
        validation_data=(X_val, y_val),
        callbacks=[
            tf.keras.callbacks.EarlyStopping(
                monitor="val_loss",
                min_delta=1e-3,
                patience=50,
                restore_best_weights=restore_best_weights
            ),
        ],
        epochs=500,
        verbose=0
    )
    print(
        "mse:", round(model.evaluate(X_test, y_test)),
        "weights:", [a.flatten()[0] for a in model.get_weights()]
    )

Here is the output using tensorflow_macos.

tf: 2.4.0-rc0
restore_best_weights: True
8/8 [==============================] - 0s 284us/step - loss: 387007348736.0000
mse: 387007348736 weights: [1.4947473, -0.967283]
restore_best_weights: False
8/8 [==============================] - 0s 289us/step - loss: 1.6617e-06
mse: 0 weights: [1.4997954, -1.0002267]

Here is the output using mainline TensorFlow on a Google Colab instance (using pip install tensorflow==2.4.0rc0 to make it apples-to-apples).

tf: 2.4.0-rc0
restore_best_weights: True
8/8 [==============================] - 0s 2ms/step - loss: 3.1445e-04
mse: 0 weights: [1.4946454, -0.9670778]
restore_best_weights: False
8/8 [==============================] - 0s 2ms/step - loss: 7.9514e-08
mse: 0 weights: [1.499929, -0.9998748]

This is pretty puzzling to me! The MacOS version is getting the right weights (1.5 and -1 for the kernel and slope, respectively) at the end of training regardless of the value of restore_best_weights. But, at inference time, we seem to be getting garbage out of the model. 🤔 Any ideas on what to investigate first?

@faustomorales
Copy link
Author

The plot thickens ... I was curious so added model.set_weights(model.get_weights()) just before the call to model.evaluate(), hoping that would knock something loose in the model's internal state and fix things. Instead, it actually made it so that model inference resulted in garbage regardless of the value of restore_best_weights. This makes reproducing this issue even simpler.

import numpy as np
import tensorflow as tf

X = np.tile(np.arange(10), reps=100)
y = 1.5 * X - 1

print("tf:", tf.__version__)
model = tf.keras.models.Sequential([tf.keras.layers.Input((1, )), tf.keras.layers.Dense(1)])
model.compile(loss="mse", optimizer="rmsprop")
model.fit(X, y, epochs=200, verbose=0)
print("before set_weights, mse:", model.evaluate(X, y), "weights:", [a.flatten()[0] for a in model.get_weights()])
model.set_weights(model.get_weights())
print("after set_weights, mse:", model.evaluate(X, y), "weights:", [a.flatten()[0] for a in model.get_weights()])

Output using tensorflow_macos ...

tf: 2.4.0-rc0
32/32 [==============================] - 0s 215us/step - loss: 1.8945e-06
before set_weights, mse: 1.8945354440802475e-06 weights: [1.4997802, -1.000234]
32/32 [==============================] - 0s 231us/step - loss: inf
after set_weights, mse: inf weights: [1.4997802, -1.000234]

Output using Google Colab ...

tf: 2.4.0-rc0
32/32 [==============================] - 0s 874us/step - loss: 2.0693e-07
before set_weights, mse: 2.0693499891422107e-07 weights: [1.4999211, -1.0000395]
32/32 [==============================] - 0s 785us/step - loss: 2.0693e-07
after set_weights, mse: 2.0693499891422107e-07 weights: [1.4999211, -1.0000395]

So it seems that any use of model.set_weights() (which is probably being used under the hood by restore_best_weights) results in a broken model state.

@faustomorales faustomorales changed the title Using restore_best_weights=True yields incorrect behavior Using model.set_weights() yields incorrect behavior May 14, 2021
@faustomorales faustomorales changed the title Using model.set_weights() yields incorrect behavior Using model.set_weights() yields incorrect behavior when MLC is enabled. May 14, 2021
@faustomorales faustomorales changed the title Using model.set_weights() yields incorrect behavior when MLC is enabled. Using model.set_weights() yields incorrect behavior when MLC is enabled May 14, 2021
@faustomorales
Copy link
Author

The issue can be avoided (albeit defeating the purpose of the fork) using:

import os
os.environ["TF_DISABLE_MLC"] = "1" 

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant