- Fix DDP (on 2 RTXA4000 Ada, runtime compared to a single RTX 4090 is decreased by at least
$25 %$ . - Use
hydra
both as orchestration tool and for setting up the logs. - Remove setup of manual logging, now that
hydra
is used. - Simplify calculation of losses, use as loss summed categorical cross entropy and then take mean for backprop.