nnmf

Nonnegative matrix factorization

Syntax

[W,H] = nnmf(A,k)

[W,H] = nnmf(A,k,Name,Value)

[W,H,D] = nnmf(___)

Description

[W,H] = nnmf(A,k) factors the n-by-m matrix A into nonnegative factors W (n-by-k) and H (k-by-m). The factorization is not exact; W*H is a lower-rank approximation to A. The factors W and H minimize the root mean square residual D between A and W*H.

D = norm(A - W*H,'fro')/sqrt(n*m)

The factorization uses an iterative algorithm starting with random initial values for W and H. Because the root mean square residual D might have local minima, repeated factorizations might yield different W and H. Sometimes the algorithm converges to a solution of lower rank than k, which can indicate that the result is not optimal.

example

[W,H] = nnmf(A,k,Name,Value) modifies the factorization using one or more name-value pair arguments. For example, you can request repeated factorizations by setting 'Replicates' to an integer value greater than 1.

[W,H,D] = nnmf(___) also returns the root mean square residual D using any of the input argument combinations in the previous syntaxes.

Examples

collapse all

Nonnegative Rank-Two Approximation and Biplot

Open Live Script

Load the sample data.

load fisheriris

Compute a nonnegative rank-two approximation of the measurements of the four variables in Fisher's iris data.

rng(1) % For reproducibility
[W,H] = nnmf(meas,2);
H

H = 2×4

    0.6945    0.2856    0.6220    0.2218
    0.8020    0.5683    0.1834    0.0149

The first and third variables in meas (sepal length and petal length, with coefficients 0.6945 and 0.6220, respectively) provide relatively strong weights to the first column of W . The first and second variables in meas (sepal length and sepal width, with coefficients 0.8020 and 0.5683, respectively) provide relatively strong weights to the second column of W.

Create a biplot of the data and the variables in meas in the column space of W.

biplot(H','Scores',W,'VarLabels',{'sl','sw','pl','pw'});
axis([0 1.1 0 1.1])
xlabel('Column 1')
ylabel('Column 2')

Change Algorithm

Open Live Script

Starting from a random array X with rank 20, try a few iterations at several replicates using the multiplicative algorithm.

rng default % For reproducibility
X = rand(100,20)*rand(20,50);
opt = statset('MaxIter',5,'Display','final');
[W0,H0] = nnmf(X,5,'Replicates',10,...
                   'Options',opt,...
                   'Algorithm','mult');

    rep	   iteration	   rms resid	  |delta x|
      1	       5	    0.560887	   0.0245182
      2	       5	     0.66418	   0.0364471
      3	       5	    0.609125	   0.0358355
      4	       5	    0.608894	   0.0415491
      5	       5	    0.619291	   0.0455135
      6	       5	    0.621549	   0.0299965
      7	       5	    0.640549	   0.0438758
      8	       5	    0.673015	   0.0366856
      9	       5	    0.606835	   0.0318931
     10	       5	    0.633526	   0.0319591
Final root mean square residual = 0.560887

Continue with more iterations from the best of these results using alternating least squares.

opt = statset('Maxiter',1000,'Display','final');
[W,H] = nnmf(X,5,'W0',W0,'H0',H0,...
                 'Options',opt,...
                 'Algorithm','als');

    rep	   iteration	   rms resid	  |delta x|
      1	      24	    0.257336	  0.00271859
Final root mean square residual = 0.257336

Input Arguments

collapse all

`A` — Matrix to factorize
real matrix

Matrix to factorize, specified as a real matrix.

Example: rand(20,30)

Data Types: single | double

`k` — Rank of factors
positive integer

Rank of factors, specified as a positive integer. The resulting factors W and H have k columns and rows, respectively.

Example: 3

Data Types: single | double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: [W,H] = nnmf(A,k,'Algorithm','mult','Replicates',10) chooses the multiplicative update algorithm and ten replicates to improve the result

`Algorithm` — Factorization algorithm
`'als'` (default) | `'mult'`

Factorization algorithm, specified as the comma-separated pair consisting of 'Algorithm' and 'als' (alternating least squares) or 'mult' (a multiplicative update algorithm).

The 'als' algorithm typically is more stable and converges in fewer iterations. Each iteration takes longer. Therefore, the default maximum is 50, which usually gives satisfactory results in internal testing.

The 'mult' algorithm typically has faster iterations and requires more of them. The default maximum is 100. This algorithm tends to be more sensitive to starting values and, therefore, seems to benefit more from running multiple replications.

Example: 'Algorithm','mult'

Data Types: char | string

`W0` — Initial value of `W`
n-by-`k` matrix

Initial value of W, specified as the comma-separated pair consisting of 'W0' and an n-by-k matrix, where n is the number of rows of A, and k is the second input argument of nnmf.

Data Types: single | double

`H0` — Initial value of `H`
`k`-by-m matrix

Initial value of H, specified as the comma-separated pair consisting of 'H0' and a k-by-m matrix, where k is the second input argument of nnmf, and m is the number of columns of A.

Data Types: single | double

`Options` — Algorithm options
`[]` (default) | structure returned by `statset`

Algorithm options, specified as the comma-separated pair consisting of 'Options' and a structure returned by the statset function. nnmf uses the following fields of the options structure.

Field	Description	Values
`Display`	Level of iterative display	`'off'` (default) — No display `'final'` — Display of final result `'iter'` — Iterative display of intermediate results
`MaxIter`	Maximum number of iterations	Positive integer. The default is `50` for the `'als'` algorithm and `100` for the `'mult'` algorithm. Unlike in optimization settings, reaching `MaxIter` iterations is treated as convergence.
`TolFun`	Termination tolerance on the change in size of the residual	Nonnegative value. The default is `1e-4`.
`TolX`	Termination tolerance on the relative change in the elements of `W` and `H`	Nonnegative value. The default is `1e-4`.
`UseParallel`	Indication to compute in parallel	Logical value. The default `false` indicates not to compute in parallel, and `true` indicates to compute in parallel. Computing in parallel requires a Parallel Computing Toolbox™ license.
`UseSubstreams`	Type of reproducibility when computing in parallel	`false` (default) — Do not compute reproducibly `'mlfg6331_64'` `'mrg32k3a'` For details, see Reproducibility in Parallel Statistical Computations.
`Streams`	A `RandStream` object or cell array of such objects	If you do not specify `Streams`, `nnmf` uses the default stream or streams. If `UseParallel` is `true` and `UseSubstreams` is `false`, specify a cell array of `RandStream` objects the same size as the Parallel pool. Otherwise, specify a single `RandStream` object.

Example: 'Options',statset('Display','iter','MaxIter',50)

Data Types: struct

`Replicates` — Number of times to repeat factorization
`1` (default) | positive integer

Number of times to repeat the factorization, specified as the comma-separated pair consisting of 'Replicates' and a positive integer. The algorithm chooses new random starting values for W and H at each replication, except at the first replication if you specify 'W0' and 'H0'. If you specify a value greater than 1, you can obtain better results by setting Algorithm to 'mult'. See Change Algorithm.

Example: 10

Data Types: single | double

Output Arguments

collapse all

`W` — Nonnegative left factor of `A`
n-by-`k` matrix

Nonnegative left factor of A, returned as an n-by-k matrix. n is the number of rows of A, and k is the second input argument of nnmf.

W and H are normalized so that the rows of H have unit length. The columns of W are ordered by decreasing length.

`H` — Nonnegative right factor of `A`
`k`-by-m matrix

Nonnegative right factor of A, returned as a k-by-m matrix. k is the second input argument of nnmf, and m is the number of columns of A.

W and H are normalized so that the rows of H have unit length. The columns of W are ordered by decreasing length.

`D` — Root mean square residual
nonnegative scalar

Root mean square residual, returned as a nonnegative scalar.

D = norm(A - W*H,'fro')/sqrt(n*m)

References

[1] Berry, Michael W., Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. “Algorithms and Applications for Approximate Nonnegative Matrix Factorization.” Computational Statistics & Data Analysis 52, no. 1 (September 2007): 155–73. https://doi.org/10.1016/j.csda.2006.11.006.

Extended Capabilities

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, specify the Options name-value argument in the call to this function and set the UseParallel field of the options structure to true using statset:

Options=statset(UseParallel=true)

For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

Version History

Introduced in R2008a

nnmf

Syntax

Description

Examples

Nonnegative Rank-Two Approximation and Biplot

Change Algorithm

Input Arguments

`A` — Matrix to factorize
real matrix

`k` — Rank of factors
positive integer

Name-Value Arguments

`Algorithm` — Factorization algorithm
`'als'` (default) | `'mult'`

`W0` — Initial value of `W`
n-by-`k` matrix

`H0` — Initial value of `H`
`k`-by-m matrix

`Options` — Algorithm options
`[]` (default) | structure returned by `statset`

`Replicates` — Number of times to repeat factorization
`1` (default) | positive integer

Output Arguments

`W` — Nonnegative left factor of `A`
n-by-`k` matrix

`H` — Nonnegative right factor of `A`
`k`-by-m matrix

`D` — Root mean square residual
nonnegative scalar

References

Extended Capabilities

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

Version History

See Also

Topics

nnmf

Syntax

Description

Examples

Nonnegative Rank-Two Approximation and Biplot

Change Algorithm

Input Arguments

A — Matrix to factorize real matrix

k — Rank of factors positive integer

Name-Value Arguments

Algorithm — Factorization algorithm 'als' (default) | 'mult'

W0 — Initial value of W n-by-k matrix

H0 — Initial value of H k-by-m matrix

Options — Algorithm options [] (default) | structure returned by statset

Replicates — Number of times to repeat factorization 1 (default) | positive integer

Output Arguments

W — Nonnegative left factor of A n-by-k matrix

H — Nonnegative right factor of A k-by-m matrix

D — Root mean square residual nonnegative scalar

References

Extended Capabilities

Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

Version History

See Also

Topics

`A` — Matrix to factorize
real matrix

`k` — Rank of factors
positive integer

`Algorithm` — Factorization algorithm
`'als'` (default) | `'mult'`

`W0` — Initial value of `W`
n-by-`k` matrix

`H0` — Initial value of `H`
`k`-by-m matrix

`Options` — Algorithm options
`[]` (default) | structure returned by `statset`

`Replicates` — Number of times to repeat factorization
`1` (default) | positive integer

`W` — Nonnegative left factor of `A`
n-by-`k` matrix

`H` — Nonnegative right factor of `A`
`k`-by-m matrix

`D` — Root mean square residual
nonnegative scalar

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.