Quick explanation

Performing binary classification with the network from NeuralNetwork.py file where an arbitrary architecture (depth and width) can be selected. The data consists of one 2D training set of s_t = 10000 data points of x_i∈ R² for i=1,...,s with corresponding target labels t = ± 1 evaluated on a validation set of s_v = 5000 and one 3D training set of s_t = 12000 data points with corresponding validation set s_v=6000 with same target labels.

To run and test the network on the pre-defined configuration and data, just type

python run.py

The classification error is defined as

$\mathcal{C} = \frac{1}{2s}\sum_{\mu=1}^{s}\left|\text{sgn}\left(\mathcal{O}^{(\mu)}\right) - t^{(\mu)}\right|$

where $\mathcal{O}$ is the output of the network and s the size of the dataset. Furthermore are the tanh function used as activation functions with a local field such that the output of node i in layer l for input μ is defined as

$\tiny V_i^{(l,\mu)} = \tanh\left(\sum_{j=1}^{M_l}w_{ij}^{(l)}V_j^{(l-1, \mu)} - \theta_i^{(l)}\right) \longrightarrow \mathbf{V}^{(l,\mu)} = \tanh\bigg(\mathbf{W}^{(l)}\mathbf{V}^{(l-1, \mu)} - \mathbf{\Theta}^{(l)}\bigg)$

where M_l is the number of nodes in layer l.

The network is trained by stochastic gradient descent sequential learning implying that the parameters are updated as

$\begin{align*} \mathbf{W}^{(l)} &\longleftarrow \mathbf{W}^{(l)} + \eta\mathbf{{\Delta}}^{(l)}(\mathbf{V}^{(l-1)})^T \\ \mathbf{\Theta}^{(l)} &\longleftarrow \mathbf{\Theta}^{(l)} - \eta\mathbf{\Delta}^{(l)}, \end{align*}$

where η is the learning rate and $\mathbf{\Delta}^{(l)} = \left[\delta_1^{(l)},...,\delta_{M_l}^{(l)}\right]^T$ is the cost vector for each layer evaluated by the chain rule as

$\delta_{j}^{(l)} \longleftarrow \sum_{i=1}^{M_{l+1}}\frac{\partial V_j^{(l)}}{\partial b_j^{(l)}}\delta_i^{(l+1)}w_{ij}^{(l+1)},$

with

$\delta_i^{(L)} \longleftarrow \frac{\partial \mathcal{O}_i}{\partial b_i^{(L)}}\big(t_i - \mathcal{O}_i\big),$

being the cost value for the output layer.

Moreover, the weights are initiated with a modified glorot uniform initialization as

$\mathbf{W}^{(l)} \longleftarrow \mathcal{N}\big(\mu=0, \sigma=1\big)\sqrt{\frac{6}{M_{l} + M_{l+1}}},$

where $\mathcal{N}\big(\mu, \sigma\big)$ is the univariate normal (gaussian) distribution with mean μ and variance σ and M_l, M_l+1 is the number of nodes in layers l and l+1 respectively. The thresholds are initialized to zero.

Results

Initilization of network

Initializing the network for two hidden layers with n₁ = n₂ = 5 hidden neurons each and training for 300 epochs with an initial learning rate of $\eta = 2\cdot10^{-2}$ . The weights are initiated with the modified glorot initializer.

2D data

The results with the above initialization and on the 2D data, which looks like

is the following

And like that one can construct a custom machine learning classifier :)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Images		Images
datasets		datasets
LICENSE		LICENSE
NeuralNetwork.py		NeuralNetwork.py
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick explanation

Results

Initilization of network

2D data

About

Releases

Packages

Languages

License

olof98johansson/NeuralNetworkFromScratch

Folders and files

Latest commit

History

Repository files navigation

Quick explanation

Results

Initilization of network

2D data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages