Code covered by the BSD License  

Highlights from
A Numerical Tour of Signal Processing

from A Numerical Tour of Signal Processing by Gabriel Peyre
A set of Matlab experiments that illustrates advanced computational signal and image processing.

Dictionary Learning

Dictionary Learning

Instead of using a fixed data representation such as wavelets or Fourier, one can learn the representation (the dictionary) to optimize the sparsity of the representation for a large class of exemplar.

Contents

Here we apply dictionary learning to the problem of image denoising. The method used in this Numerical Tour follow quite closely the K-SVD approach described in this paper

M. Elad and M. Aharon, Image Denoising Via Sparse and Redundant representations over Learned Dictionaries, IEEE Trans. on Image Processing, Vol. 15, no. 12, pp. 3736-3745, December 2006

Installing toolboxes and setting up the path.

You need to download the general purpose toolbox and the signal toolbox.

You need to unzip these toolboxes in your working directory, so that you have toolbox_general/ and toolbox_signal/ in your directory.

For Scilab user: you must replace the Matlab comment '%' by its Scilab counterpart '//'.

Recommandation: You should create a text file named for instance numericaltour.sce (in Scilab) or numericaltour.m (in Matlab) to write all the Scilab/Matlab commands you want to execute. Then, simply run exec('numericaltour.sce'); (in Scilab) or numericaltour; (in Matlab) to run the commands.

Execute this line only if you are using Matlab.

getd = @(p)path(path,p); % scilab users must *not* execute this

Then you can add these toolboxes to the path.

% Add some directories to the path
getd('toolbox_signal/');
getd('toolbox_general/');

Loading a Noisy Image

We aim at applying the dictionary learning method to denoising. We thus consider a noisy image obtained by adding Gaussian noise to a clean, unknown image M0.

Load the image.

n = 256;
M0 = load_image('barb');
M0 = rescale( crop(M0,n) );

Noise level.

sigma = .06;

Add some noise.

M = M0 + sigma*randn(n);

Display.

clf;
imageplot(M0, 'Original', 1,2,1);
imageplot(M,  'Noisy', 1,2,2);

Patch Extraction

Since the learning is computationnaly intensive, one can only apply it to small patches extracted from an image.

Size of the patches.

w = 12;

Number of patches.

m = 40*w^2;

Random patch location.

x = floor( rand(1,1,m)*(n-w) )+1;
y = floor( rand(1,1,m)*(n-w) )+1;

Extract lots of patches.

[dY,dX] = meshgrid(0:w-1,0:w-1);
Xp = repmat(dX,[1 1 m]) + repmat(x, [w w 1]);
Yp = repmat(dY,[1 1 m]) + repmat(y, [w w 1]);
P = M(Xp+(Yp-1)*n);

We remove mean, since we are going to learn a dictionary of zero-mean and unit norm atom. The mean of the patches is close to being noise free anyway.

P = P - repmat( mean(mean(P)), [w w] );

Reshape so that each P(:,i) corresponds to a patch.

P = reshape(P, [w^2 m]);

Display a few random patches.

clf;
plot_dictionnary(P, [], [8 12]);

Sparse Coding

Given a dictionary D, we compute a set of coefficients X so that D*X(:,i) is a sparse approximation of P(:,i).

Number of atoms in the dictionary.

p = 2*w^2;

The initial dictionary is computed by a random selection of patches.

sel = randperm(m); sel = sel(1:p);
D = P(:,sel);

Normalize the atoms.

D = D ./ repmat( sqrt(sum(D.^2)), [w^2, 1] );

The sparse coding is obtained by minimizing a L1 penalized optimization.

X(:,i) = min_x 1/2*norm(D*x - P(:,i))^2 + lambda*sum(abs(x))

This is achieved using an iterative thresholding method (forward-backward iterations).

x <- thresh_{lambda*mu}( x + mu*D'*(P(:,i)-D*x) )

The value of lambda controls the sparsity of the coefficients. Since L1 regularization is similar to soft thresholding, we use the usual 3/2*sigma value.

lambda = 1.5*sigma;

The gradient descent step size mu is related to the operator norm of the dictionary.

mu = 1.9/norm(D)^2;

Initialize the coefficients.

X = zeros(p,m);

One step of iteration, for all the patches together.

X = perform_thresholding( X + mu*D'*(P-D*X), lambda*mu );

Number of iterations of the iterative thresholding (you can use a larger number of iterations to improve the results).

niter = 150;

Exercice 1: (the solution is exo1.m) Perform the iterative thresholding for all the patches. Monitor the decay of the energy minimized by the method.

exo1;

Automatic Set of the lambda Value

The value of lambda was chosen arbitrary, and was the same for all the patches. Here we use an independant value of lambda(i) for each patch P(:,i).

Initial value of lambda.

lambda = zeros(m,1) + 1.5*sigma;

We let lambda(i) evolves during the optimization so that one has norm(P(:,i)-D*X(:,i), 'fro') = rho*w*sigma, where rho is a damping factor close to 1. The value w*sigma is approximately the amount of noise that contaminate P(:,i).

rho = 1.4;
error_target = rho*w*sigma;

Using an idea originaly developped by Antonin Chambolle, we use the following update rule during the iterative thresholding.

lambda = lambda * error_target ./ sqrt( sum( (P-D*X).^2 ) )';

Intuitively, if norm(P(:,i)-D*X(:,i)) is too large, then lambda(:,i) should be decreased.

Exercice 2: (the solution is exo2.m) Peform the iterative thresholding while monitoring the evolution of lambda(i) for a few i, and the error norm(P(:,i)-D*X(:,i), 'fro'). Here we update the value of lambda only each 5 iterations. Warning: the thresholding update is different for each X(:,i), since lambda(i) varies.

exo2;

Update the Dictionary

Once the sparse coefficients X have been computed, one can udpate the dictionary.

This is achieve by minimizing norm(D*X-P,'fro') over all possible D. The solution is given by a pseudo-inverse D=P*X^+ where |X^+=X'*(X*X')^(-1) |

D = P * pinv(X);

The atoms are then normalized.

D = D ./ repmat( sqrt(sum(D.^2)), [w^2, 1] );

Dictionary Learning

The full dictionary learning is achieved by iteratively computing the coefficients X and then updating the dictionary D.

This corresponds to an (approximate) block coordinate descent of the sparse coding energy on both the coefficients and the dictionary, under the constraint of having unit norm atoms.

Since this energy is non-convex, one can only hope a local convergence to a stationary point of this energy.

Exercice 3: (the solution is exo3.m) Perform the dictionary learning. You can use like 40 iterations for the learning (outer loop) and 30 iterations for the computation of the L1 regularization (inner loop). Warning: you must update the value of mu each time the dictionary is updated.

exo3;

Display the dictionary.

clf;
plot_dictionnary(D,X, [8 12]);

Denoising by Sparse Coding

The denoising of the image is obtained by sparse coding a large collection of patches (ideally all the patches).

Overlap parameter (q=w implies no overlap).

q = 4;

Regularly space positions for the extraction of patches.

[y,x] = meshgrid(1:q:n-w/2, 1:q:n-w/2);
m = size(x(:),1);
Xp = repmat(dX,[1 1 m]) + repmat( reshape(x(:),[1 1 m]), [w w 1]);
Yp = repmat(dY,[1 1 m]) + repmat( reshape(y(:),[1 1 m]), [w w 1]);

Ensure boundary conditions (reflexion).

Xp(Xp>n) = 2*n-Xp(Xp>n);
Yp(Yp>n) = 2*n-Yp(Yp>n);

Extract a large sub-set of regularly sampled patches.

P = M(Xp+(Yp-1)*n);
P = reshape(P, [w^2, m]);

Save the mean of the patches appart, and remove it.

a = mean(P);
P = P - repmat( a, [w^2 1] );

Set the target error for denoising.

rho = 1;
error_target = rho*w*sigma;

Exercice 4: (the solution is exo4.m) Perform the sparse coding of this large collection of patches. Update the value of lambda. Use a large number of iterations e.g. niter=200.

exo4;

Display the repartition of lambda values.

hist(lambda(lambda<.05),100);
axis('tight');

Approximated patches.

PA = reshape(D*X, [w w m]);

Insert back the mean.

PA = PA - repmat( mean(mean(PA)), [w w] );
PA = PA + reshape(repmat( a, [w^2 1] ), [w w m]);

To obtain the denoising, we average the value of the approximated patches PA that overlap.

W = zeros(n,n);
M1 = zeros(n,n);
for i=1:m
    x = Xp(:,:,i); y = Yp(:,:,i);
    M1(x+(y-1)*n) = M1(x+(y-1)*n) + PA(:,:,i);
    W(x+(y-1)*n) = W(x+(y-1)*n) + 1;
end
M1 = M1 ./ W;

Display the result.

clf;
imageplot(clamp(M), ['Noisy, SNR=' num2str(snr(M0,M),4) 'dB'], 1, 2, 1);
imageplot(clamp(M1), ['Denoised, SNR=' num2str(snr(M0,M1),4) 'dB'], 1, 2, 2);

Exercice 5: (the solution is exo5.m) Compare the obtained result with translation invariant wavelet hard thresholding.

exo5;

Exercice 6: (the solution is exo6.m) Study the influence of the parameter rho on the quality of the denoising. Study the influence of the number p of atoms.

exo6;

Contact us at files@mathworks.com