Find the principal components using the alternating
least squares (ALS) algorithm when there are missing values in the
data.

Load the sample data.

load hald

The ingredients data has 13 observations for 4 variables.

Perform principal component analysis using the ALS algorithm
and display the component coefficients.

[coeff,score,latent,tsquared,explained] = pca(ingredients);
coeff

coeff =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844

Introduce missing values randomly.

y = ingredients;
rng('default'); % for reproducibility
ix = random('unif',0,1,size(y))<0.30;
y(ix) = NaN

y =
7 26 6 NaN
1 29 15 52
NaN NaN 8 20
11 31 NaN 47
7 52 6 33
NaN 55 NaN NaN
NaN 71 NaN 6
1 31 NaN 44
2 NaN NaN 22
21 47 4 26
NaN 40 23 34
11 66 9 NaN
10 68 8 12

Approximately 30% of the data has missing values now, indicated
by `NaN`.

Perform principal component analysis using the ALS algorithm
and display the component coefficients.

[coeff1,score1,latent,tsquared,explained,mu1] = pca(y,...
'algorithm','als');
coeff1

coeff1 =
-0.0362 0.8215 -0.5252 0.2190
-0.6831 -0.0998 0.1828 0.6999
0.0169 0.5575 0.8215 -0.1185
0.7292 -0.0657 0.1261 0.6694

Display the estimated mean.

mu1

mu1 =
8.9956 47.9088 9.0451 28.5515

Reconstruct the observed data.

t = score1*coeff1' + repmat(mu1,13,1)

t =
7.0000 26.0000 6.0000 51.5250
1.0000 29.0000 15.0000 52.0000
10.7819 53.0230 8.0000 20.0000
11.0000 31.0000 13.5500 47.0000
7.0000 52.0000 6.0000 33.0000
10.4818 55.0000 7.8328 17.9362
3.0982 71.0000 11.9491 6.0000
1.0000 31.0000 -0.5161 44.0000
2.0000 53.7914 5.7710 22.0000
21.0000 47.0000 4.0000 26.0000
21.5809 40.0000 23.0000 34.0000
11.0000 66.0000 9.0000 5.7078
10.0000 68.0000 8.0000 12.0000

The ALS algorithm estimates the missing values in the data.

Another way to compare the results is to find the angle
between the two spaces spanned by the coefficient vectors. Find the
angle between the coefficients found for complete data and data with
missing values using ALS.

subspace(coeff,coeff1)

ans =
2.2925e-16

This is a small value. It indicates that the results if you
use `pca` with `'Rows','complete'` name-value
pair argument when there is no missing data and if you use `pca` with `'algorithm','als'` name-value
pair argument when there is missing data are close to each other.

Perform the principal component analysis using `'Rows','complete'` name-value
pair argument and display the component coefficients.

[coeff2,score2,latent,tsquared,explained,mu2] = pca(y,...
'Rows','complete');
coeff2

coeff2 =
-0.2054 0.8587 0.0492
-0.6694 -0.3720 0.5510
0.1474 -0.3513 -0.5187
0.6986 -0.0298 0.6518

In this case, `pca` removes the rows with missing
values, and `y` has only four rows with no missing
values. `pca` returns only three principal components.
You cannot use the `'Rows','pairwise'` option because
the covariance matrix is not positive semidefinite and `pca` returns
an error message.

Find the angle between the coefficients found for complete
data and data with missing values using listwise deletion (when `'Rows','complete'`).

subspace(coeff(:,1:3),coeff2)

ans =
0.3576

The angle between the two spaces is substantially larger. This
indicates that these two results are different.

Display the estimated mean.

mu2

mu2 =
7.8889 46.9091 9.8750 29.6000

In this case, the mean is just the sample mean of `y`.

Reconstruct the observed data.

score2*coeff2'

ans =
NaN NaN NaN NaN
-7.5162 -18.3545 4.0968 22.0056
NaN NaN NaN NaN
NaN NaN NaN NaN
-0.5644 5.3213 -3.3432 3.6040
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
12.8315 -0.1076 -6.3333 -3.7758
NaN NaN NaN NaN
NaN NaN NaN NaN
1.4680 20.6342 -2.9292 -18.0043

This shows that deleting rows containing `NaN` values
does not work as well as the ALS algorithm. Using ALS is better when
the data has too many missing values.