You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🙌 Seeking assistance! I'm looking for help to add support for missing languages.
If you can contribute, I'll gladly accept a PR and give proper credit 💫.
It's simpler than you might expect. Just take a look at one of the existing
implementations—it's mostly a few for loops. No need to worry about adding tests;
I can help with that part.
This repository aims to implement a vanilla neural network in all major
programming languages. It is the "hello world" of ai programming. We will
implement a fully connected network with a single hidden layer using the sigmoid
activation function for both the hidden and the output layer. This kind of
network can be used to do hand writing recognition, or other kinds of pattern
recognitions, categorizations, or predictions. This is intended as your entry
level into ai programming, i.e. for the enthusiast or hobby programmer (you and
me). More advanced use cases should look elsewhere as there are infinitely more
powerful methods available for the professional.
Disclaimer! Do not expect blazing fast performance. If you have such
requirements or expectations then you should definitely look elsewhere. Stay
here if you want to learn more about implementing a neural network!
We do not aim to justify the math involved (see [1] if you're interested). We
prefer to focus on the code itself and will happily copy a solution from one
programming language to another without worrying about the theoretical
background.
2. Usage
These usage examples are taken directly from our test implementations. The
general flow is to prepare a dataset, create a trainer which contains an empty
neural network, and then train the network until a desired prediction accuracy
is achived. All of these examples output the final predictions to the console.
For any larger dataset you will need to compute the prediction accuracy. One way
to do this is to compute the percentage of correct predictions and the average
"confidence" of the predictions.
For training and verifying our implementations we will use two datasets.
3.1. Logical Functions
The first is simple and will be these logical functions: xor, xnor, or, nor,
and, and nand. This truth table represents the values that the network will
learn, given two inputs; $i_1$ and $i_2$:
This test is interesting as it shows how flexible a simple neural network can
be. There are two inputs, 6 outputs, and it is sufficient to have two hidden
neurons. Such a network consists of a total of 24 weights:
💯 We expect each implementation to learn exactly the same network weights!
3.1.1. Lithmus Test
The logical functions example can be used as a "lithmus test" of neural network
implementations. A proper implementation will be able to learn the 6 functions
using the 24 weights as detailed above. An improper implementation (one that
doesn't implement biases correctly, for example) likely will need more hidden
nodes to learn successfully (if at all). A larger network means more
mathematical operations so keep this in mind when you evaluate other
implementations. You don't want to waste cpu cycles unnecessarily.
3.2. Hand Written Digits
The second dataset consists of thousands of hand written digits. This is
actually also a "toy" dataset but training a network to recognize all digits
correctly is still a bit of a challenge. This dataset was originally downloaded
from https://archive.ics.uci.edu/dataset/178/semeion+handwritten+digit.
Each line consists of 256 inputs (16x16 pixels) corresponding to one
hand written digit. At the end of the line are 10 digits which signify
the handwritten digit:
Parsing this dataset needs to be implemented for each language.
4. Learning
Our code will perform backpropagation to learn the weights. We update
the weights after each input. This is called stochastic learning, as
opposed to batch learning where multiple inputs are presented before
updating weights. Stochastic learning is generally preferred [2]. Note
that inputs need to be shuffled for effective learning.
5. Implementation Goals
One of our goals is to have as few or no dependencies. These implementations
should be easy to integrate and that requires dependency-free code. Another goal
is to implement fast code. Nifty, one-liners which look good but have bad
performance should be avoided. It is fine to use for loops for matrix
multiplication, as an example (i.e. no fancy linear algebra libraries are needed
unless this is available in the standard library of the programming language).
We strive for:
code that is easy to copy/paste for reuse
dependency-free code
straight forward code, no excessive object orientation which makes the code
look like an OOAD excercise from the 90s
adequate performance in favour of nifty (but slow) one-liners
making it easy to serialize weights for storing and loading, but leave it for
the users own preference
implementations in all major languages
simple tests that verify our implementations and secure them for the future
having fun exploring neural networks!
5.1. Simple Random Number Generator
Now, a note about random number generation. Training a neural network requires
that the initial weights are randomly assigned. We will specify a simple random
number generator algorithm that should be used in all implementations. We
actually want each implementation to learn the same weights. This makes it
easier to verify the implementation. Of course, whoever wants to integrate into
their own solution is free to pick another random number generator.
This was chosen to avoid any complexity! There are widely used algorithms for
better random number generation but it isn't important in this case. We simply
need some starting values and they don't have to be very random as long as
they are all different.
The code samples all contain an extension point where you can plug in your own
implementation, should you wish to do so (or just hardcode your choice!).
5.2. License
All code in this repository is licensed under MIT license.
This is a permissive license and you can use this code in your
personal projects, or commercial as well, without needing to share
anything back. MIT license is the most common license on GitHub.
If you would like to contribute to this repository, for example
by adding an implemention in another programming language,
then you must also license your implementation with MIT license.
All code in this repo must be licensed under the permissive MIT license.
Please add license header to every source file. No GPL allowed!
5.3. Language Implementations
This is the current status of the implementations available. We follow a maturity model based on these criteria:
Level 0: implement logical functions network
Level 1: use modules/files to make implementation easy to reuse by copy/paste
Level 2: implement a unit test to verify level 0 and make the code future safe
Level 3: implement digit recognition with the Semeion dataset
Level 4: implement a unit test to verify level 3 and make the code future safe
Note! The Python implementation is only here as a reference. If you are using Python you already
have access to all ai tools and libraries you need.
5.3.1. Sample Output
Digit recognition is done using only 14 hidden neurons, 10 learning epochs (an
epoch is a run through the entire dataset), and a learning rate of 0.5. Using
these hyper parameters we are able to recognize 99.1% of the Semeion digits
accurately. You may be able to improve by adding more hidden neurons, doing more
epochs, and annealing the learning rate (decrease slowly). However we are also
at risk of over learning which decreases our network's ability to generalize (it
learns too specific, i.e. the noise in the data set).
This output shows accuracy in predicting the correct digit, and average
confidence i.e. score of the largest output value:
For reference we have a Python implementation which uses NumPy,
and should be fairly easy to understand. Why Python? Because Python
has become the lingua franca of ai programming. It is also easy to
modify and fast to re-run, thus ideal for experiments.
We will now go through the reference implementation and include some math
diagrams for those that want to know what's going on. You'll see the how but
not the why (see references section for that).
Here, one forward and one backward propagation is shown. You can use these
values to verify your own calculations. The example is the logical functions
shown earlier with the inputs being both 1, i.e. 1 1. We will use 3 hidden
neurons and 6 outputs (xor, xnor, and, nand, or, nor).
6.1. Inputs and Randomized Starting Weights
These are the initial values for the input layer and the hidden layer. $w$ is
the weights, $b$ is the biases. Note that we are showing randomized biases here
to help understand the calculations. For the implementation we will initialize
biases to 0 per the recommendation here [3].
Now we have calculated output. These are off according to the expected output
and the purpose of the next step, backpropagation, is to correct the weights for
a slightly improved prediction in the next iteration. First step of
backpropagation is to compute the error gradient ($\nabla$) of the output layer.