-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Dropout #1
Comments
I forgot to reference this issue in the commit d3c2caa ! I have created a separate branch to work on implementing this feature. ModificationsFor my first attempt at implementing this feature, I added an optional parameter to function fit(rbm::RBM, X::Mat{Float64};
persistent=true, lr=0.1, n_iter=10, batch_size=100, n_gibbs=1,dorate=0.0) From here, I tried to take the approach I quoted earlier (§8.2 of Srivasta et al 2014) and apply a different dropout pattern for each training sample in the mini batch. I accomplish this within function gibbs(rbm::RBM, vis::Mat{Float64}; n_times=1,dorate=0.0)
suppressedUnits = rand(size(rbm.hbias,1),size(rbm.vis,2)) .< dorate
... I then modify function vis_means(rbm::RBM, hid::Mat{Float64}, suppressedUnits::Mat{Bool})
hid[suppressedUnits] = 0.0 # Suppress dropped hidden units
p = rbm.W' * hid .+ rbm.vbias
return logistic(p)
end This should, in total, accomplish the dropout. DoubtsNow, what isn't clear to me is whether or not the dropout pattern should be changing from epoch to epoch. The paper seems to indicate that the pattern should be changing from mini batch to mini batch, but it doesn't specify anything about the epoch. I am assuming that this pattern is updated at every mini batch computation, however. If anyone has any references to other RBM Dropout implementations, they might be helpful in clearing up this issue. |
Made a mistake, the hidden activations are matrices, not vectors, since we compute a whole mini batch at once.
Just putting this in to make sure that the package manager in julia is working properly.
Wanted to check the size of vis argument, not a parameter of the rbm class…
Okay, it had some issues, some bugs I put in, but now it is building and passing tests! I'll need to make a dropout test to ensure that everything is really working correctly. |
Adding in a dropout rate to the `fit()` call in the experiment code.
Okay! It works! The issues I was being with the keywords not being recognised were due to the workspace not being cleared before running the Thanks @alaa-saade ! |
So, it seems like there is still something to be desired in the Dropout performance. Currently there does not seem to be much difference between it and the pseudo-likelihood obtained when not using dropout, as shown in the following figure: I'm going to restructure where the dropout is enforced. I think that perhaps I'm not doing it in the right manner. Referring to This Lua/torch7 implementation, it seems that we need to make sure to suppress these units on the gradient update, as well. |
Interesting. But is it known that the effect of dropout can be seen on pseudo likely hood ?? |
@krzakala : I don't truly know if the effect can be seen on the PL or not. You could very well be right on this point. I'm working on a demo, now, which reports the estimated features (W), as well. I'll also include a histogram of the hidden activations, as was done in (Srivasta 2014), to show the discrepancy between the approaches. |
I think I had done something stupid and wasn’t passing the dropout rate argument properly. Hence, I was getting nearly identical results between the dropout and non-dropout cases…because they were doing the exact same thing.
According to the original dropout paper, the use of dropout should induce a sparse-er distribution of activations.
It seems like the weight decays are working properly as they should.
After making the correction to the momentum (not scaling it by the learning rate), we have some odd effects on the learning when the momentum is non-zero. At zero momentum, we get training performance that seems to make sense, but now, it seems weird. In fact, for non-zero momentums, we don’t get any kind of reasonable training.
Added in all the speedup and fix patches from the main branch.
Plot the activation histograms side-by-side instead of in a stacked format.
Adjusting the testing parameters for the dropout version of RBM training to be closer to the experiments shown in (Srivastava *et al*, 2014).
Now showing the learned weights (receptive fields) and their distributions.
Changed the test script to run the dropout comparison on the full dataset in an attempt to match the results in the 2014 paper.
Goal
One of the latest/best regularisation techniques for training RBMs is dropout. Unfortunately, the original Boltzmann.jl package does not implement this technique, so we should undertake this ourselves.
Technique
During the training phase of the RBM, each hidden node is present with only probably$p$ . Training is performed for this reduced model and then the resulting trained models are combined. The pertinent section from (Srivasta 2014) reads,
References
Srivasta et al, "Dropout: A simple way to prevent neural networks from overfitting," JMLR, vol. 15, 2014, pp. 1929-1958.
The text was updated successfully, but these errors were encountered: