-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNN training on MNIST does not converge #145
Labels
bug
Something isn't working
Comments
Here is an example output that I am getting: $ fpm run --example cnn_mnist --profile release --flag "-fno-frontend-optimize -I$CONDA_PREFIX/include -L$CONDA_PREFIX/lib -Wl,-rpath -Wl,$CONDA_PREFIX/lib"
Layer: input
------------------------------------------------------------
Output shape: 784
Parameters: 0
Layer: reshape
------------------------------------------------------------
Input shape: 784
Output shape: 1 28 28
Parameters: 0
Activation:
Layer: conv2d
------------------------------------------------------------
Input shape: 1 28 28
Output shape: 8 26 26
Parameters: 80
Activation: relu
Layer: maxpool2d
------------------------------------------------------------
Input shape: 8 26 26
Output shape: 8 13 13
Parameters: 0
Activation:
Layer: conv2d
------------------------------------------------------------
Input shape: 8 13 13
Output shape: 16 11 11
Parameters: 1168
Activation: relu
Layer: maxpool2d
------------------------------------------------------------
Input shape: 16 11 11
Output shape: 16 5 5
Parameters: 0
Activation:
Layer: flatten
------------------------------------------------------------
Input shape: 16 5 5
Output shape: 400
Parameters: 0
Activation:
Layer: dense
------------------------------------------------------------
Input shape: 400
Output shape: 10
Parameters: 4010
Activation: softmax
Epoch 1 done, Accuracy: 9.91 %
Epoch 2 done, Accuracy: 9.91 %
Epoch 3 done, Accuracy: 9.91 %
Epoch 4 done, Accuracy: 9.91 %
... It will stay at this percentage. |
Git bisect reveals #142:
|
Merged
milancurcic
changed the title
CNN training does not converge
CNN training on MNIST does not converge
Apr 18, 2024
Tests with minimal CNN on randomly selected constant inputs/outputs converge fine (#174). The problem with training CNN on MNIST may be elsewhere, or the bug is more subtle than I previously suspected. Needs more intermediate complexity tests to understand more. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
cnn_mnist
example which trains a CNN network on MNIST data stays at random (10%) accuracy over epochs;cnn_from_keras
example which loads a pre-trained CNN from Keras and achieves expected high accuracy (90.14%)The above suggests that the forward passes of
conv2d
,maxpool2d
, andflatten
layers are implemented correctly.The culprit may be in the implementation of
backward
methods for any of these layers, or in the backward flow of data.This should be fixed before the release of v0.13.0.
The text was updated successfully, but these errors were encountered: