WARNING: The GPU version is only experimental and might have unknown bugs. The GPU version is not dicussed in the original paper. We only tested it with small-scaled models, which is by no means thorough.
DrMAD-Theano uses Lasagne
to build a simple MLP.
Run:
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32 python simple_mlp.py
simple_mlp.py
includes three phases:- Phase 1: Algorithm 1.
- Phase 2: obtain the validation loss on validation set.
(Since there're multiple iterations, we have to output the gradients
and take the average of
grads
across different iterations.) - Phase 3: Algorithm 2.
- We use Lop() to obtain the hessian-vector products in line 6-7
of Algo. 2., which is defined in
hypergrad.py
.
- We use Lop() to obtain the hessian-vector products in line 6-7
of Algo. 2., which is defined in
args.py
configuration for DrMADlayers.py
provides classDenseLayerWithReg()
to build up a simple MLP.models.py
provides classMLP()
.updates.py
provides update rules for different theano functions.