Skip to content

Simple Elman network that learns and plays the happy birthday song.

License

Notifications You must be signed in to change notification settings

JGallegoPerez/happy_birthday_rnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

happy_birthday_rnn

Simple Elman network that learns and plays the Happy Birthday song

I hope the script is self-explanatory, I included extensive comments. An Elman network is modeled with two neurons as input and output. The "Happy Birthday" melody is taught to the network, whereby one of the input neurons receives the pitch (from the frequency in Hertz) of each tone that composes the melody, and the other neuron receives the duration (time) of the corresponding tones. I had the choice to input and output the tones as discrete values (for instance, receive an A# as input and expel a C), or to represent them "analogically" across the full frequency spectrum. I thought the latter could offer richer insights, so I went for that approach. It reminds me of the learning of violin and other string instruments that don't have keys or frets to execute the exact same notes. I wonder how it would have all worked out if the data had been represented in a discrete format.

Data transformations

The value ranges of the tone frequencies (e.g. an A equals 440Hz) and the durations (e.g. 1 second) are very different. I transformed them in a casual way to place them on similar scales, for both cases always between 0 and 1: I simply divided the frequencies by 1000, and the durations by 10. As a result, they had surprisingly similar scales, while preserving some of their internal properties (for instance, the relationship between notes and their corresponding frequencies is logarithmic).

Some results

If I understood the model correctly, it seems that the network can predict the melody quite well. The three things that I can vary in the model are the network size, the number of epochs and the learning rate (lr). While the three need to be tweaked to yield a good fit, I found the learning rate to be perhaps the most important. The default value I found from other tutorials (thus, other tasks) was 0.1. This value was too high for my task. Values greater than 0.03 made the error sometimes increase back too much as the network evolved through the epochs. I found that one "sweet spot" could be around 0.02. Below are two examples where the number of epochs and learning rate were kept constant, with only the network size varying.

A not so well predicted input. However, the size of the network was minimal. One neuron encodes the note pitch (frequency in Hz, graph above) and the other encodes the duration in seconds (graph below): input_size, hidden_size, output_size = 7, 5, 2, epochs = 2000, lr = 0.02

image image

The next trained network does a much better prediction. In fact, a few more thousand epochs would yield a virtually perfect prediction (see last example, after this one).
input_size, hidden_size, output_size = 27, 25, 2, epochs = 2000, lr = 0.02

image image

An almost perfect prediction: input_size, hidden_size, output_size = 27, 25, 2, epochs = 5000, lr = 0.02

image image

Instructions

I hope it is self-explanatory from the comments in the script. In any case, the parameters we can vary are quite at the top of the code. At the bottom of the code, the actual Happy Birthday melody is generated as audio, followed immediately, for comparison, by the predicted melody. If you intend to run the program many times, I recommend to comment or delete the line that plays the actual Happy Birthday.

Dependencies

torch

numpy

pylab

pyaudio

About

Simple Elman network that learns and plays the happy birthday song.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages