Thanks a lot for enhancing our understanding with this well commented code. I have a query regarding the sparsity constraint imposed in RBM, i.e in the pretrainRBM function. In order to update the hidden biases according to the sparsity constraint, why have you multiplied the gradients with 2.
dsW = dsW + SparseLambda * 2.0 * bsxfun(@times, (SparseQ-mH)', svdH)';
dsB = dsB + SparseLambda * 2.0 * (SparseQ-mH) .* sdH;
This does not match any update equation given by Lee et al. Could you please elaborate on this? Many thanks!
Hi Yong Ho,
Thank you for your comment.
The linear mapping is just a option. You don't need to use that. But, the training requires the initial parameters. I think the linear mapping is one of candidates for initial parameters.
If you know better initial parameter setting, please let me know.
Your code is very helpful for me.
But during understanding your code, I wonder why you are using linear mapping for calculating weights between TrainLabels and last hidden nodes?
Is there any advantage using linear mapping?