Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: SVD with Missing Values
Date: Fri, 5 Dec 2008 16:23:02 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 7
Message-ID: <ghbkh6$ski$1@fred.mathworks.com>
References: <gh7kvc$anh$1@fred.mathworks.com> <ghbdb4$2sf$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-05-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1228494182 29330 172.30.248.35 (5 Dec 2008 16:23:02 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Fri, 5 Dec 2008 16:23:02 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1626144
Xref: news.mathworks.com comp.soft-sys.matlab:505218


Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message <ghbdb4$2sf$1@fred.mathworks.com>...
> Samuel wrote:
> > In MATLAB, what is the best way to handle a single value decomposition where k is much less then m or n (MxN matrix) for a data set with many missing values such that the missing values have a minimal effect on the decomposition. Thanks.
> 
> Samuel, SVD is an algorithm in computational linear algebra.  Many statistical models/methods use SVD as a computational tool, but it is not a statistical model pe se.  You're asking about missing data, which is a statistical issue.  It's impossible to give advice about statistical issues without knowing what what you're really doing, statistically.  It may be Principal Components Analysis, it may be something else entirely.

Thanks Peter for the response. What I am doing is most analogous to the Netflix Competition, albeit with a much smaller less sparse matrix. I have users and ratings of products and missing values for the products they have not rated. I am then trying to get a prediction of the values of missing products using a thin SVD approach. If I use thin SVDS in MATLAB, I have to use a numerical imputation for the missing values, which I think will disrupt the results when I multiply U*S*V' to generate the predictions (if I am understanding that correctly). I would really appreciate any thoughts on how to handle the missing values better for prediction purposes, or if I am missing the point altogether. Thanks.