Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: SVD with Missing Values
Date: Fri, 5 Dec 2008 16:37:02 +0000 (UTC)
Organization: Lulea University of Technology
Lines: 16
Message-ID: <ghblbd$ahl$1@fred.mathworks.com>
References: <gh7kvc$anh$1@fred.mathworks.com> <ghbdb4$2sf$1@fred.mathworks.com> <ghbkh6$ski$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-05-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1228495022 10805 172.30.248.35 (5 Dec 2008 16:37:02 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Fri, 5 Dec 2008 16:37:02 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1595763
Xref: news.mathworks.com comp.soft-sys.matlab:505220


"Samuel " <sdodson2@jhu.edu> wrote in message <ghbkh6$ski$1@fred.mathworks.com>...
> Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message <ghbdb4$2sf$1@fred.mathworks.com>...
> > Samuel wrote:
> > > In MATLAB, what is the best way to handle a single value decomposition where k is much less then m or n (MxN matrix) for a data set with many missing values such that the missing values have a minimal effect on the decomposition. Thanks.
> > 
> > Samuel, SVD is an algorithm in computational linear algebra.  Many statistical models/methods use SVD as a computational tool, but it is not a statistical model pe se.  You're asking about missing data, which is a statistical issue.  It's impossible to give advice about statistical issues without knowing what what you're really doing, statistically.  It may be Principal Components Analysis, it may be something else entirely.
> 
> Thanks Peter for the response. What I am doing is most analogous to the Netflix Competition, albeit with a much smaller less sparse matrix. I have users and ratings of products and missing values for the products they have not rated. I am then trying to get a prediction of the values of missing products using a thin SVD approach. If I use thin SVDS in MATLAB, I have to use a numerical imputation for the missing values, which I think will disrupt the results when I multiply U*S*V' to generate the predictions (if I am understanding that correctly). I would really appreciate any thoughts on how to handle the missing values better for prediction purposes, or if I am missing the point altogether. Thanks.



Well, computing the SVD without replacing the missing values is, as far as I know, not possible, since it is really a matrix factorization.

So, the question would then become: With what should you replace the missing values. The answer is not that easy, and it is indeed more of a statistical question than a numerical one. Some statistics packages replace missing data with means of the columns, which may or may not be a good idea. Another approach would be to use existing data to predict the missing values, using for example cross-validation. This is sometimes done in Principal Component Analysis (which is really nothing but an SVD, numerically speaking). Some variants of the NIPALS algorithm can handle this. I suggest you do a literature search on NIPALS and "missing data" and see what you come up with. Please let us know how it works out!

/JC