Wide Residual Networks | Densely Connected Networks |
---|---|
+ +| Optimizer | Test Accuracy (%) | +| --------- | :--------------: | +| Adagrad | 86.07 | +| Adam | 84.86 | +| AMSGrad | 86.08 | +| BPGrad | 88.62 | +| **DFW** | **90.18** | +| SGD | 90.08 | + + | + +| Optimizer | Test Accuracy (%) | +| --------- | :--------------: | +| Adagrad | 87.32 | +| Adam | 88.44 | +| AMSGrad | 90.53 | +| **BPGrad**| **90.85** | +| DFW | 90.22 | +| **SGD** | **92.02** | + + |
Wide Residual Networks | Densely Connected Networks |
---|---|
+ +| Optimizer | Test Accuracy (%) | +| --------- | :--------------: | +| Adagrad | 57.64 | +| Adam | 58.46 | +| AMSGrad | 60.73 | +| BPGrad | 60.31 | +| **DFW** | **67.83** | +| SGD | 66.78 | + + | + +| Optimizer | Test Accuracy (%) | +| --------- | :--------------: | +| Adagrad | 56.47 | +| Adam | 64.61 | +| AMSGrad | 68.32 | +| BPGrad | 59.36 | +| **DFW** | **69.55** | +| **SGD** | **70.33** | + + |
CE Loss | SVM Loss |
---|---|
+ +| Optimizer | Test Accuracy (%) | +| --------- | :--------------: | +| Adagrad | 83.8 | +| Adam | 84.5 | +| AMSGrad | 84.2 | +| BPGrad | 83.6 | +| DFW | - | +| SGD | 84.7 | +| SGD* | 84.5 | + + | + +| Optimizer | Test Accuracy (%) | +| --------- | :--------------: | +| Adagrad | 84.6 | +| Adam | 85.0 | +| AMSGrad | 85.1 | +| BPGrad | 84.2 | +| **DFW** | **85.2** | +| **SGD** | **85.2** | +| SGD* | - | + + |