Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
SVD with Missing Values

Subject: SVD with Missing Values

From: Samuel

Date: 4 Dec, 2008 04:06:04

Message: 1 of 10

In MATLAB, what is the best way to handle a single value decomposition where k is much less then m or n (MxN matrix) for a data set with many missing values such that the missing values have a minimal effect on the decomposition. Thanks.

Subject: SVD with Missing Values

From: BHUPALA

Date: 4 Dec, 2008 11:12:26

Message: 2 of 10

On Dec 4, 9:06=A0am, "Samuel " <sdods...@jhu.edu> wrote:
> In MATLAB, what is the best way to handle a single value decomposition wh=
ere k is much less then m or n (MxN matrix) for a data set with many missin=
g values such that the missing values have a minimal effect on the decompos=
ition. Thanks.

Try using svd(X,0) or svd(X,'econ') which will give you economy
singular values.

bhupala

Subject: SVD with Missing Values

From: Samuel

Date: 5 Dec, 2008 01:39:02

Message: 3 of 10

BHUPALA <bhupala@gmail.com> wrote in message <07963786-05dd-4dde-8381-fb56604d2ea8@k36g2000pri.googlegroups.com>...
> On Dec 4, 9:06=A0am, "Samuel " <sdods...@jhu.edu> wrote:
> > In MATLAB, what is the best way to handle a single value decomposition wh=
> ere k is much less then m or n (MxN matrix) for a data set with many missin=
> g values such that the missing values have a minimal effect on the decompos=
> ition. Thanks.
>
> Try using svd(X,0) or svd(X,'econ') which will give you economy
> singular values.
>
> bhupala

The matrix for reference is 241x241 so an economy SVD won't do the trick. I'm looking for an alternative to mean imputation for the missing values. Is there any method I can use where I can treat the values as true unknown NaN values.

Subject: SVD with Missing Values

From: Peter Perkins

Date: 5 Dec, 2008 14:20:20

Message: 4 of 10

Samuel wrote:
> In MATLAB, what is the best way to handle a single value decomposition where k is much less then m or n (MxN matrix) for a data set with many missing values such that the missing values have a minimal effect on the decomposition. Thanks.

Samuel, SVD is an algorithm in computational linear algebra. Many statistical models/methods use SVD as a computational tool, but it is not a statistical model pe se. You're asking about missing data, which is a statistical issue. It's impossible to give advice about statistical issues without knowing what what you're really doing, statistically. It may be Principal Components Analysis, it may be something else entirely.

Subject: SVD with Missing Values

From: Samuel

Date: 5 Dec, 2008 16:23:02

Message: 5 of 10

Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message <ghbdb4$2sf$1@fred.mathworks.com>...
> Samuel wrote:
> > In MATLAB, what is the best way to handle a single value decomposition where k is much less then m or n (MxN matrix) for a data set with many missing values such that the missing values have a minimal effect on the decomposition. Thanks.
>
> Samuel, SVD is an algorithm in computational linear algebra. Many statistical models/methods use SVD as a computational tool, but it is not a statistical model pe se. You're asking about missing data, which is a statistical issue. It's impossible to give advice about statistical issues without knowing what what you're really doing, statistically. It may be Principal Components Analysis, it may be something else entirely.

Thanks Peter for the response. What I am doing is most analogous to the Netflix Competition, albeit with a much smaller less sparse matrix. I have users and ratings of products and missing values for the products they have not rated. I am then trying to get a prediction of the values of missing products using a thin SVD approach. If I use thin SVDS in MATLAB, I have to use a numerical imputation for the missing values, which I think will disrupt the results when I multiply U*S*V' to generate the predictions (if I am understanding that correctly). I would really appreciate any thoughts on how to handle the missing values better for prediction purposes, or if I am missing the point altogether. Thanks.

Subject: SVD with Missing Values

From: Johan Carlson

Date: 5 Dec, 2008 16:37:02

Message: 6 of 10

"Samuel " <sdodson2@jhu.edu> wrote in message <ghbkh6$ski$1@fred.mathworks.com>...
> Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message <ghbdb4$2sf$1@fred.mathworks.com>...
> > Samuel wrote:
> > > In MATLAB, what is the best way to handle a single value decomposition where k is much less then m or n (MxN matrix) for a data set with many missing values such that the missing values have a minimal effect on the decomposition. Thanks.
> >
> > Samuel, SVD is an algorithm in computational linear algebra. Many statistical models/methods use SVD as a computational tool, but it is not a statistical model pe se. You're asking about missing data, which is a statistical issue. It's impossible to give advice about statistical issues without knowing what what you're really doing, statistically. It may be Principal Components Analysis, it may be something else entirely.
>
> Thanks Peter for the response. What I am doing is most analogous to the Netflix Competition, albeit with a much smaller less sparse matrix. I have users and ratings of products and missing values for the products they have not rated. I am then trying to get a prediction of the values of missing products using a thin SVD approach. If I use thin SVDS in MATLAB, I have to use a numerical imputation for the missing values, which I think will disrupt the results when I multiply U*S*V' to generate the predictions (if I am understanding that correctly). I would really appreciate any thoughts on how to handle the missing values better for prediction purposes, or if I am missing the point altogether. Thanks.



Well, computing the SVD without replacing the missing values is, as far as I know, not possible, since it is really a matrix factorization.

So, the question would then become: With what should you replace the missing values. The answer is not that easy, and it is indeed more of a statistical question than a numerical one. Some statistics packages replace missing data with means of the columns, which may or may not be a good idea. Another approach would be to use existing data to predict the missing values, using for example cross-validation. This is sometimes done in Principal Component Analysis (which is really nothing but an SVD, numerically speaking). Some variants of the NIPALS algorithm can handle this. I suggest you do a literature search on NIPALS and "missing data" and see what you come up with. Please let us know how it works out!

/JC

Subject: SVD with Missing Values

From: Samuel

Date: 5 Dec, 2008 17:37:02

Message: 7 of 10

"Johan Carlson" <Johan.E.Carlson@gmail.com> wrote in message <ghblbd$ahl$1@fred.mathworks.com>...
> "Samuel " <sdodson2@jhu.edu> wrote in message <ghbkh6$ski$1@fred.mathworks.com>...
> > Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message <ghbdb4$2sf$1@fred.mathworks.com>...
> > > Samuel wrote:
> > > > In MATLAB, what is the best way to handle a single value decomposition where k is much less then m or n (MxN matrix) for a data set with many missing values such that the missing values have a minimal effect on the decomposition. Thanks.
> > >
> > > Samuel, SVD is an algorithm in computational linear algebra. Many statistical models/methods use SVD as a computational tool, but it is not a statistical model pe se. You're asking about missing data, which is a statistical issue. It's impossible to give advice about statistical issues without knowing what what you're really doing, statistically. It may be Principal Components Analysis, it may be something else entirely.
> >
> > Thanks Peter for the response. What I am doing is most analogous to the Netflix Competition, albeit with a much smaller less sparse matrix. I have users and ratings of products and missing values for the products they have not rated. I am then trying to get a prediction of the values of missing products using a thin SVD approach. If I use thin SVDS in MATLAB, I have to use a numerical imputation for the missing values, which I think will disrupt the results when I multiply U*S*V' to generate the predictions (if I am understanding that correctly). I would really appreciate any thoughts on how to handle the missing values better for prediction purposes, or if I am missing the point altogether. Thanks.
>
>
>
> Well, computing the SVD without replacing the missing values is, as far as I know, not possible, since it is really a matrix factorization.
>
> So, the question would then become: With what should you replace the missing values. The answer is not that easy, and it is indeed more of a statistical question than a numerical one. Some statistics packages replace missing data with means of the columns, which may or may not be a good idea. Another approach would be to use existing data to predict the missing values, using for example cross-validation. This is sometimes done in Principal Component Analysis (which is really nothing but an SVD, numerically speaking). Some variants of the NIPALS algorithm can handle this. I suggest you do a literature search on NIPALS and "missing data" and see what you come up with. Please let us know how it works out!
>
> /JC

Thanks for your thoughts; theyve been really helpful. I read a bit about NIPALS and it kind of led me to the thought that nearest neighbor interpolation might be helpful in replacing the missing values. Im not sure yet though as im very unfamiliar with the concept. I was under the impression also that some form of the Lanczos method could handle missing values, but im incredibly uncertain how or if to implement that.

Subject: SVD with Missing Values

From: Bruno Luong

Date: 5 Dec, 2008 18:00:20

Message: 8 of 10

"Samuel " <sdodson2@jhu.edu> wrote in message <ghboru$60u$1@fred.mathworks.com>...

> I was under the impression also that some form of the Lanczos method could handle missing values, but im incredibly uncertain how or if to implement that.

No Lanczos is simply a specific algorithm to compute the eigen spaces of symmetric matrix, and it can be used to compute the SVD (because SVD of M is closely related to eigen spaces of M'*M and M*M'). No more no less. It more suitable for sparse matrix.

As Peter said, those are linear algebra *tools*.

Bruno

Subject: SVD with Missing Values

From: Johan Carlson

Date: 5 Dec, 2008 18:46:02

Message: 9 of 10

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ghbq7k$o5r$1@fred.mathworks.com>...
> "Samuel " <sdodson2@jhu.edu> wrote in message <ghboru$60u$1@fred.mathworks.com>...
>
> > I was under the impression also that some form of the Lanczos method could handle missing values, but im incredibly uncertain how or if to implement that.
>
> No Lanczos is simply a specific algorithm to compute the eigen spaces of symmetric matrix, and it can be used to compute the SVD (because SVD of M is closely related to eigen spaces of M'*M and M*M'). No more no less. It more suitable for sparse matrix.
>
> As Peter said, those are linear algebra *tools*.
>
> Bruno

Depending on what type of problem you're dealing with, cross-validation might also be an option. That is, if you're trying to find A in a model like
AX = Y
where you're missing data in either X or Y. If you remove the rows and columns of X or Y where data is missing you can try to predict the missing values using what's left of your system. For prediction you could use either a principal components approach (pseudo inverse) or PLS regression (which is somewhat messier but with better predictive performance if part of X and Y are correlated, but parts of either X or Y aren't).

/JC

Subject: SVD with Missing Values

From: Peter Perkins

Date: 9 Dec, 2008 15:19:21

Message: 10 of 10

Samuel wrote:

> Thanks Peter for the response. What I am doing is most analogous to the Netflix Competition, albeit with a much smaller less sparse matrix. I have users and ratings of products and missing values for the products they have not rated. I am then trying to get a prediction of the values of missing products using a thin SVD approach. If I use thin SVDS in MATLAB, I have to use a numerical imputation for the missing values, which I think will disrupt the results when I multiply U*S*V' to generate the predictions (if I am understanding that correctly). I would really appreciate any thoughts on how to handle the missing values better for prediction purposes, or if I am missing the point altogether. Thanks.

Samuel, I am not even remotely up on this subject, and may be completely misunderstanding what you've said. But it seems to me that what you describe is predicting the missing values in a matrix using an SVD on that same matrix that has had its missing values somehow filled in. And if it's like the NetFlix case, most entries in that matrix are missing. This would indeed seem to depend crucially on the way you fill in those values.

I suspect (do a google search on "missing svd netflix") that rather than thinking in terms of a single imputation and prediction, people use something like the E-M algorithm to do what you describe iteratively until convergence. Presumably the big question would be "What's the M step?". I don't know enough about this area to help.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us