Using pca function to reconstruct vectors within and outside test set

4 views (last 30 days)
I am working on a project where I have a set of vectors (images) that I am generating a vector basis (a set of principle compoenet) from using the pca function in MATLAB. I am then using the basis vectors to attempt to reconstruct vectors both within the test set as well images outside but extremely close to the vectors within the set.
For this test run, I initialized an nxc matrix d with n row vectors, each with c number of components (in this example, n = 6 and c = 40,000).
I then subtracted the mean of each column of d from d (I believe to center the data around zero, or for some similar logical reason regarding normalizing the data I had read about while doing research). I then used pca(d) to yield the coeff matrix of principal component coefficients.
To attempt to reconstruct vec, I reshaped vec into a column vector and divided coeff by vec to yield 5 coefficients (coeffi had 5 components). I then summed up the products of the principle components and their respective generated coefficients, and reshaped the sum image to the original dimensions of vec.
Hypothetically, if I had the logic correct, I should have been able to reconstruct vec to a large degree of accuracy, especially as I included one vector (vec2) that was extremely close to vec. However, when I use division to get the difference between the reconstructed image and vec, and take the standard deviation of the pixels, I get an extremely large number and the difference image has extremely distinct patterns indicating that the images are not similar at all.
My mentor and I were fairly sure that if we used a set of test vectors extremely close to the vector we were attempting to reconstruct, we should be able to do so with some degree of accuracy, but so far the results have not been ideal or even close to where we wanted them to be. Thus, I wanted to know if my way of approaching the problem is reasonable or logical, where errors in the code or my reasoning might exist, or if the method itself is not applicable to what we aim to do (reconstruct a vector close to the test vectors).
std2(vec./vec2) = 0.0136 (vec and vec2 are extremely similar images)
std2(vec./sum1) = 204.3604 (the reconstructed image and vec are extremely different, indicating that the method did not work correctly)
If needed, I can give more specific numbers, or attach what the difference between the reconstructed image and original looks like.
y = 0.001:0.005:1;
z = 1:200;
vec = .5*cos(.75*pi*z'*y); %image 1, try to reconstruct this
vec2 = .5*cos(.7499995*pi*z'*y); %image 2, extremely similar to image 1, added to test set
%generating test set of vectors to generate basis from
v1 = zeros(200, 200);
cons = abs(rand()*10);
for u = 1: 200
for v = 1:200
v1(u, v) = cons;
s = 0.001:0.005:1;
g = 1:200;
v2 = sin(rand()*pi*g'*s);
v3 = -sin(rand()*pi*g'*s);
v4 = cos(rand()*pi*g'*s);
v5 = -cos(rand()*pi*g'*s);
d = zeros(6, u*v);
d(1, :) = reshape(v1, [1 u*v]);
d(2, :) = reshape(v2, [1 u*v]);
d(3, :) = reshape(v3, [1 u*v]);
d(4, :) = reshape(v4, [1 u*v]);
d(5, :) = reshape(v5, [1 u*v]);
d(6, :) = reshape(vec2, [1 u*v]); %including extremely similar image in test set
numel = 6;
d = d - mean(d);
[coeff, score, ~, explained] = pca(d);
A = coeff;
b1 = reshape(vec, [u*v 1]);
%generating coefficients
c1 = A\b1;
%reconstructing vec using product of generated coefficients and respective
%basis vectors
sum1 = zeros(u*v, 1);
for i = 1:numel-1
sum1 = sum1 + c1(i,:)*A(:, i);
sum1 = reshape(sum1, [u, v]);
signal = sum1./vec;
sd2 = std2(signal);

Answers (1)

William Rose
William Rose on 15 Aug 2022
The code
v1 = zeros(200, 200);
cons = abs(rand()*10);
for u = 1: 200
for v = 1:200
v1(u, v) = cons;
is equivalent to
because rand is always U(0,1), so abs(rand()) is unnecessary, and bcause the nested loop assigns the same value to every element.
Image v1 will be a white square 90% of the time. In the other 10% of cases, it will be a square that is a uniform shade of gray. This happens because when Matlab renders a floating point array as an image, it assumes the values are gray scale values between 0 and 1. Values <=0 plot as black. Values >=1 plot as white. The value 10*rand() will convert to white, 90% of the time.
Images v2-v5 are comprised of numbers that vary between -1 and 1. All the negative valued pixels will be rendered as black, as described above. The PCA operates on the negative and positive values, but all negative values are black in the images. Therefore there is a strong nonlinearity in the relationship between the numbers in the array and the rendered image. This reduces the effectiveness of PCA in reproducing the images. For these reasons, I recommend generating basis images with values that vary between 0 and 1 only. For example, replace cos(x) with (1+cos x)/2; replace sin(x) with (1+sin x)/2.
Pixel values in vec and vec2 vary from -0.5 to +0.5. Is this intentional? I would choose variation from 0 to 1, to display the full dynamic range from black to pure white. This will also have the benefit that the range for vec and vec2 will match the ranges of the basis images, v1-v5, which vary from black to pure white.
Because negative numbers are rendered as black, I recommend that you not subtract the mean value from all the columns. I have obtained good results with PCA applied to images when I did not subtract the mean value. See here, for example.
It is good to view the images you are working with. Therefore I have added code to display some images. Images created by the original code are shown below. The appearance of these images is significantly affected by the size at which they are displayed. Moire patterns appear and disappear as one adjusts the image size.
Here (below) are images generated by the modified code. These images are different from the previous, because (1) the random numbers are different with every run, and (2) v1, which was formerly all white in 90% of cases, is now an intermediate shade of gray, and (3) I have replaced sin x with (1+sin x)/2, etc., for v2-v5, and (4) I have replaced (0.5*sin x) with (0.5 + 0.5*sin x) for vec and vec2. Changes (3) and (4) result in fewer black pixels, and eliminate extensive all-black regions.
At the end of your script, you compute sum1, which is supposed to be a reconstruction of vec, using the principal components. The figure below shows vec and sum1, as generated by your originally posted code.
The figure below shows vec and sum1, as generated by the modified code. vec differs from run to run, due to the random number used to generate it.
One way to compare the similarity of vec and sum1 is to convert each image to a column vector, then compute the Pearson correlation of the column vectors.
>> disp(corr(reshape(sum1,[40000,1]),reshape(vec,[40000,1])))
Here are the correlations for five runs of the original code.
0.6986 0.7188 0.7145 0.7014 0.8616
Here are correlations for five runs of the modified code.
0.9941 0.7252 0.9533 0.5947 0.9959
It is a small sample size. It seems like the modified code produces correlations with higher highs and lower lows .




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!