Skip to content

Neural Networks Tuning And Training

Michal Töpfer edited this page Nov 3, 2021 · 2 revisions

When a new prediction model is created, it must first be trained on the available data before it can be used to compute the predictions. For neural-network-based predictions in IVIS, the training process consists of two layers: hyperparameter tuning and network training. The hyperparameters are the properties of the network itself, such as the number of hidden layers and neurons in them, while the network training process optimizes the weights of the connections inside the network.

Hyperparameter Tuning

Each of the available neural network architectures has a set of hyperparameters. These are for example the number of hidden layers, the number of neurons in the hidden layer, and so on. Furthermore, some of the parameters for the training of the network are hyperparameter and can be tuned, such as the learning rate.

Our approach uses the Keras Tuner [1] to search for the best available set of hyperparameters in terms of the RMSE loss of the trained network. The tuner runs several trials with different configurations of the hyperparameters and selects the best configuration. In each trial, a neural network is trained (see details below) and tested to assess its performance. The number of trials can be set by the Max tuning trials setting when creating the prediction.

We currently use the bayesian optimization tuner from the Keras Tuner framework, but support for the other available tuners can be added later.

Network training

For each set of hyperparameters provided by the tuner, a new neural network is created and must be trained. The process of training the network means setting the weights of the connections in the network in order to provide good predictions. The standard training approaches based on the gradient descent algorithm are used. We use the TensorFlow framework [2] to create and train the network.

We use the Adam optimizer algorithm [3] to train the network, as it "compares favorably to other stochastic optimization methods". The Adam method is adaptive: "The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients", we thus decided it is not necessary to implement any learning rate decay (schedule) for now.

After the training, the networks predictions are tested on the validation data and the loss is returned to the tuner. Based on that, the best set of hyperparameters is selected. In order to provide a better estimate of the validation loss, the same hyperparameters can be tested several times (by creating and training several neural networks with different random initial weights). This can be set by the Executions per trial setting when creating the prediction.

References

[1] O’Malley, Tom, Elie Bursztein, James Long, François Chollet, Haifeng Jin, Luca Invernizzi, and others. “Keras Tuner,” 2019. https://github.com/keras-team/keras-tuner.

[2] TensorFlow Developers. TensorFlow. Zenodo, 2021. https://doi.org/10.5281/zenodo.5189249.

[3] Kingma, Diederik P., and Jimmy Ba. “Adam: A Method for Stochastic Optimization.” ArXiv:1412.6980 [Cs], January 29, 2017. http://arxiv.org/abs/1412.6980.

Clone this wiki locally