Skip to content

Files

Latest commit

 

History

History
181 lines (120 loc) · 7.5 KB

binary-impl.md

File metadata and controls

181 lines (120 loc) · 7.5 KB
layout permalink title
splash
binary-impl:output_ext
Implementations for binary classifiers

Feedforward Neural Networks

This page explains various ways of implementing single-layer and multi-layer neural networks as a supplementary material of this lecture. The implementations appear in explicit to abstract order so that one can understand the black-boxed internal processing in deep learning frameworks.

In order to focus on the internals, this page uses a simple and classic example: threshold logic units. Supposing x = 0 as false and x = 1 as true, single-layer neural networks can realize logic units such as AND ( ), OR ( ), NOT ( ¬ ), and NAND ( | ). Multi-layer neural networks can realize logical compounds such as XOR.

x 1 x 2 AND OR NAND XOR
0 0 0 0 1 0
0 1 0 1 1 1
1 0 0 1 1 1
1 1 1 1 0 0

Single-layer perceptron

We consider a single layer perceptron that predicts a binary label y ^ 0 , 1 for a given input vector x R d ( d presents the number of dimensions of inputs) by using the following formula,

y ^ = g ( w x + b ) = g ( w 1 x 1 + w 2 x 2 + . . . + w d x d + b )

Here, w R d is a weight vector; b R is a bias weight; and g ( . ) denotes a Heaviside step function (we assume g ( 0 ) = 0 ).

Let's train a NAND gate with two inputs ( d = 2 ). More specifically, we want to find a weight vector w and a bias weight b of a single-layer perceptron that realizes the truth table of the NAND gate: { 0 , 1 } 2 { 0 , 1 } .

We convert the truth table into a training set consisting of all mappings of the NAND gate,

x 1 = ( 0 , 0 ) , y 1 = 1 x 2 = ( 0 , 1 ) , y 2 = 1 x 3 = ( 1 , 0 ) , y 3 = 1 x 4 = ( 1 , 1 ) , y 4 = 0

In order to train a weight vector and bias weight in a unified code, we include a bias term as an additional dimension to inputs. More concretely, we append 1 to each input,

x 1 = ( 0 , 0 , 1 ) , y 1 = 1 x 2 = ( 0 , 1 , 1 ) , y 2 = 1 x 3 = ( 1 , 0 , 1 ) , y 3 = 1 x 4 = ( 1 , 1 , 1 ) , y 4 = 0

Then, the formula of the single-layer perceptron becomes,

y ^ = g ( ( w 1 , w 2 , w 3 ) x ) = g ( w 1 x 1 + w 2 x 2 + w 3 )

In other words, w 1 and w 2 present weights for x 1 and x 2 , respectively, and w 3 does a bias weight.

The code below implements Rosenblatt's perceptron algorithm with a fixed number of iterations (100 times). We use a constant learning rate 0.5 for simplicity.

{% include notebook/binary/slp_rosenblatt.md %}

Single-layer perceptron with mini-batch

It is better to reduce the execusion run by the Python interpreter, which is relatively slow. The common technique to speed up a machine-learning code written in Python is to to execute computations within the matrix library (e.g., numpy).

The single-layer perceptron makes predictions for four inputs,

y ^ 1 = g ( x 1 w ) y ^ 2 = g ( x 2 w ) y ^ 3 = g ( x 3 w ) y ^ 4 = g ( x 4 w )

Here, we define Y ^ R 4 × 1 and X R 4 × d as,

Y ^ = ( y ^ 1 y ^ 2 y ^ 3 y ^ 4 ) , X = ( x 1 x 2 x 3 x 4 )

Then, we can write the four predictions in one dot-product computation, $$ \hat{Y} = X \cdot \boldsymbol{w} $$

The code below implements this idea. The function np.heaviside() yields a vector corresponding to the four predictions, applying the step function for every element of the argument.

This technique is frequently used in mini-batch training, where gradients for a small number (e.g., 4 to 128) of instances are computed.

{% include notebook/binary/slp_rosenblatt_batch.md %}

Stochastic gradient descent (SGD) with mini-batch

Next, we consider a single-layer feedforward neural network with sigmoid activation function. In essence, we replace Heaviside step function with sigmoid function when predicting Y ^ and to use the formula for stochastic gradient descent when updating w .

{% include notebook/binary/slp_sgd_numpy.md %}

Automatic differentiation

autograd

{% include notebook/binary/ad_autograd.md %}

PyTorch

{% include notebook/binary/ad_pytorch.md %}

Chainer

{% include notebook/binary/ad_chainer.md %}

TensorFlow

{% include notebook/binary/ad_tensorflow.md %}

MXNet

{% include notebook/binary/ad_mxnet.md %}

Single-layer neural network using automatic differentiation

{% include code.html tt1="PyTorch" tc1="notebook/binary/slp_ad_pytorch.md" tt2="Chainer" tc2="notebook/binary/slp_ad_chainer.md" tt3="TensorFlow" tc3="notebook/binary/slp_ad_tensorflow.md" tt4="MXNet" tc4="notebook/binary/slp_ad_mxnet.md" %}

Multi-layer neural network using automatic differentiation

{% include code.html tt1="PyTorch" tc1="notebook/binary/mlp_ad_pytorch.md" tt2="Chainer" tc2="notebook/binary/mlp_ad_chainer.md" tt3="TensorFlow" tc3="notebook/binary/mlp_ad_tensorflow.md" tt4="MXNet" tc4="notebook/binary/mlp_ad_mxnet.md" %}

Single-layer neural network with high-level NN modules

{% include code.html tt1="PyTorch" tc1="notebook/binary/slp_pytorch_sequential.md" tt2="Chainer" tc2="notebook/binary/slp_chainer_sequential.md" %}

Multi-layer neural network with high-level NN modules

{% include code.html tt1="PyTorch" tc1="notebook/binary/mlp_pytorch_sequential.md" tt2="Chainer" tc2="notebook/binary/mlp_chainer_sequential.md" %}

Single-layer neural network with an optimizer

{% include code.html tt1="PyTorch" tc1="notebook/binary/slp_pytorch_sequential_optim.md" tt2="Chainer" tc2="notebook/binary/slp_chainer_sequential_optimizers.md" tt3="TensorFlow" tc3="notebook/binary/slp_tensorflow_keras.md" tt4="MXNet" tc4="notebook/binary/slp_mxnet_sequential_trainer.md" %}

Multi-layer neural network with an optimizer

{% include code.html tt1="PyTorch" tc1="notebook/binary/mlp_pytorch_sequential_optim.md" tt2="Chainer" tc2="notebook/binary/mlp_chainer_sequential_optimizers.md" tt3="TensorFlow" tc3="notebook/binary/mlp_tensorflow_keras.md" tt4="MXNet" tc4="notebook/binary/mlp_mxnet_trainer.md" %}

Single-layer neural network with a customizable NN class.

{% include code.html tt1="PyTorch" tc1="notebook/binary/slp_pytorch_class.md" tt2="Chainer" tc2="notebook/binary/slp_chainer_class.md" %}

Multi-layer neural network with a customizable NN class.

{% include code.html tt1="PyTorch" tc1="notebook/binary/mlp_pytorch_class.md" tt2="Chainer" tc2="notebook/binary/mlp_chainer_class.md" %}

<script src="https://code.jquery.com/jquery-3.3.1.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]} }); </script> <script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML' async></script> <script> $(document).ready(function() { $('pre code[class="language-python"]').each(function(i, block) { hljs.highlightBlock(block); }); }); </script>