Skip to content

Weekly report 5

irenenikk edited this page Apr 21, 2018 · 7 revisions

Weekly report 5

Hours used: 8

This was a lousy week. The little time I had I spent debugging why my network still doesn't work, without finding out the reason.

Apparently the old example problem was not a good architecture (xor with relu as activation and mse as loss function), so I changed it to linear regression. I read some pytorch source code as I suspected that the problem might lie in my forward pass. It turned out I had implemented the forward pass a bit differently then they had, but after simulating the differences I still hadn't found what was wrong with my code. Their way certainly requires less transposing.

I also toyed with testing forward and backward pass with pytorch, but didn't finish it.

// the next day:

What do you know, the neural network seems to be working now. It can overfit to a simple linear regression problem. There is still a weird bug where sometimes it starts going to the wrong direction right from the beginning, and ends up spitting out nans and infs because the distance between the neural network's output and the desired output are too far apart, and so the derivative of the loss function ens up as infinity. I think this is due to the weight initialization with np.random.normal (from normal distribution). I have been advised to use the normal distribution, but just with np.random.random the awkward nan-error doesn't appear. However, with the normal distribution, once the network learns something, it does so with fewer epochs, and overfits into a more accurate result.

The problem was actually with both forward pass and backpropagation. Things were the right shape, but didn't do what they were supposed to. I transposed too much and tested too little. Managing to use pytorch for testing got me on the right track. I corrected the backpropagation notes, and wrote a bit about my bug there.

The next step is backpropagating biases and implementing cross entropy loss. If I have time I'll implement some of the matrix operations used.

Clone this wiki locally