These are the neural network layers supported by XNN. To see how to specify these layers and their parameters in the network architecture file, check Architecture parameters document.
- Input layer
- Convolutional layer
- Response Normalization layer
- Max Pool layer
- Standard layer
- Dropout layer
- SoftMax layer
- Output layer
Additionally, for layers with weights such as Convolutional and Standard layers, here is more about some common parameters:
Input layer is required as a first layer in the network. Its purpose is to load input data from disk, resize and normalize it if necessary, and serve it to the network. Data loading from disk is done in parallel with network propagation in order to save time.
Parameters:
dataType
Type of input data. Right now only image data and data features in a textual file are supported.numChannels
Number of input data channels.originalDataWidth
Width of the original data from disk.originalDataHeight
Height of the original data from disk.inputDataWidth
Width of the final input data. If it is different fromoriginalDataWidth
, data will be randomly cropped.inputDataHeight
Height of the final input data. If it is different fromoriginalDataHeight
, data will be randomly cropped.doRandomFlips
Should input data be randomly flipped.normalizeInputs
Should input data be normalized to specified mean and standard deviation.inputMeans
List of means to which to normalize each channel of input data.inputStDevs
List of standard deviations to which to normalize each channel of input data.numTestPatches
Number of test data patches to take for generating test predictions. Final prediction will be average of predictions on each of the patches.testOnFlips
Should flips of test data patches also be included into generation of test predictions.
Convolutional layer is the core building block of a CNN. The layers parameters consist of a set of learnable filters (or kernels), which have a small receptive field, but extend through the full depth of the input volume. During the forward pass, each filter is convolved across the width and height of the input volume, computing the dot product between the entries of the filter and the input and producing a 2-dimensional activation map of that filter. As a result, the network learns filters that activate when they see some specific type of feature at some spatial position in the input.
Parameters:
numFilters
Number of filters.filterWidth
Filter width.filterHeight
Filter height.paddingX
Horizontal padding to apply to input.paddingY
Vertical padding to apply to input.stride
Stride controls for how much should filters shift through the input during fordard propagation.- Weights initialization parameters
- Weights update parameters
- Activation function parameters
Response normalization layer implements a form of lateral inhibition inspired by the type found in real neurons, creating competition for big activities amongst neuron outputs computed using different kernels. For details, see Krizhevsky et al., ImageNet classification with deep convolutional neural networks (NIPS 2012).
It is implemented by a following formula:
Where A[i]
is activation with index i
, P[i]
is preactivation with index i
, and N
is number of channels.
Parameters:
depth
Depth of normalization.bias
Normalization bias.alphaCoeff
Normalization alpha coefficient (see the formula above).betaCoeff
Normalization beta coefficient (see the formula above).
Max pool layer partitions the input image into a set of regions, and for each such region outputs the maximum value of input activity. It helps to reduce dimensionality, but also teaches model to be more invariant to translation.
Parameters:
filterWidth
Pooling region width.filterHeight
Pooling region height.paddingX
Horizontal padding to apply to input.paddingY
Vertical padding to apply to input.stride
Stride controls for how much should pooling region shift through the input during fordard propagation.
Standard fully connected neural network layer.
Parameters:
numNeurons
Number of neurons.- Weights initialization parameters
- Weights update parameters
- Activation function parameters
Dropout layer provides efficient way to simulate combining multiple trained models to reduce test error and prevent overfitting. It works by dropping each neuron activity with certain probability, preventing complex coadaptations between neurons.
Parameters:
dropProbability
Probability to drop some neuron activation.
Soft Max layer calculates soft maximums of input activations, so they sum to 1 and can be used as probabilities of prediction.
This layer has no additional parameters.
Output layer is required as a last layer in the network. It calculates training loss and training/testing accuracy.
These loss functions are supported:
- Logistic regression - This loss function should be used for binary classification. When it is used it is expected to have before this layer one Standard layer with linear activation function and one neuron, in order to produce single activation on which to calculate logistic regression loss and accuracy.
- Cross entropy - This loss function should be used for classification when there are more than 2 classes. When it is used it is expected to have before this layer one SoftMax layer, and before it one Standard layer with linear activation function and number of neurons equal to number of classes.
Parameters:
lossFunction
Loss function to use.numGuesses
Number of guesses K network is allowed to make when calculating top-K accuracy.
These weights/biases initialization options and their parameters are supported by all layers with weights such as Convolutional and Standard layers:
- Constant initialization - Initializes all weights/biases to constant value specified by parameters
weightsInitialValue
andbiasesInitialValue
. - Normal (Gaussian) initialization - Initializes all weights/biases to values taken from Normal (Gaussian) distribution with mean and standard deviation specified by parameters
weightsMean
,weightsStdDev
,biasesMean
andbiasesStdDev
. - Uniform initialization - Initializes all weights/biases to values taken from Uniform distribution in range specified by parameters
weightsRangeStart
,weightsRangeEnd
,biasesRangeStart
andbiasesRangeEnd
. - Xavier initialization - Initializes all weights to values taken from Normal (Gaussian) distribution with mean
0
and standard deviation calculated assqrt(6.0 / (NumberOfActivationsInThisLayer + NumberOfActivationsInPreviousLayer))
. - He initialization - Initializes all weights to values taken from Normal (Gaussian) distribution with mean
0
and standard deviation calculated assqrt((ActivationType == ReLU ? 2.0 : 1.0) / NumberOfActivationsInPreviousLayer)
.
These parameters are supported by all layers with weights such as Convolutional and Standard layers and they control the weights/biases update after each backpropagation pass:
weightsMomentum
Momentum to apply to weights updates.weightsDecay
Decay to apply to weights updates.weightsStartingLR
Weights updates starting learning rate.weightsLRStep
Fraction of epochs after which to multiply weights learning rate withweightsLRFactor
.weightsLRFactor
Factor with which to multiply weights learning rate after specified fraction of epochs.biasesMomentum
Momentum to apply to biases updates.biasesDecay
Decay to apply to biases updates.biasesStartingLR
Biases updates starting learning rate.biasesLRStep
Fraction of epochs after which to multiply biases learning rate withweightsLRFactor
.biasesLRFactor
Factor with which to multiply biases learning rate after specified fraction of epochs.
These activation functions and their parameters are supported by all layers with weights such as Convolutional and Standard layers:
- Linear - Applies linear activation, i.e. just passes through the preactivation values.
- Sigmoid - Applies sigmoid activation.
- Tanh - Applies tanh (hyperbolic tangent) activation.
- ReLU - Applies ReLU activation.
- ELU - Applies ELU activation.
activationAlpha
parameter gets multiplied with preactivation in case preactivation is smaller than0
. - LeakyReLU - Applies LeakyReLU activation.
activationAlpha
parameter gets multiplied withexp(Preactivation) - 1
, in case preactivation is smaller than0
.