Path: news.mathworks.com!not-for-mail
From: Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: SVD with Missing Values
Date: Tue, 09 Dec 2008 10:19:21 -0500
Organization: The MathWorks, Inc.
Lines: 7
Message-ID: <ghm29p$npp$1@fred.mathworks.com>
References: <gh7kvc$anh$1@fred.mathworks.com> <ghbdb4$2sf$1@fred.mathworks.com> <ghbkh6$ski$1@fred.mathworks.com>
NNTP-Posting-Host: perkinsp.dhcp.mathworks.com
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: fred.mathworks.com 1228835961 24377 172.31.57.88 (9 Dec 2008 15:19:21 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Tue, 9 Dec 2008 15:19:21 +0000 (UTC)
User-Agent: Thunderbird 2.0.0.18 (Windows/20081105)
In-Reply-To: <ghbkh6$ski$1@fred.mathworks.com>
Xref: news.mathworks.com comp.soft-sys.matlab:505862


Samuel wrote:

> Thanks Peter for the response. What I am doing is most analogous to the Netflix Competition, albeit with a much smaller less sparse matrix. I have users and ratings of products and missing values for the products they have not rated. I am then trying to get a prediction of the values of missing products using a thin SVD approach. If I use thin SVDS in MATLAB, I have to use a numerical imputation for the missing values, which I think will disrupt the results when I multiply U*S*V' to generate the predictions (if I am understanding that correctly). I would really appreciate any thoughts on how to handle the missing values better for prediction purposes, or if I am missing the point altogether. Thanks.

Samuel, I am not even remotely up on this subject, and may be completely misunderstanding what you've said.  But it seems to me that what you describe is predicting the missing values in a matrix using an SVD on that same matrix that has had its missing values somehow filled in.  And if it's like the NetFlix case, most entries in that matrix are missing.  This would indeed seem to depend crucially on the way you fill in those values.

I suspect (do a google search on "missing svd netflix") that rather than thinking in terms of a single imputation and prediction, people use something like the E-M algorithm to do what you describe iteratively until convergence.  Presumably the big question would be "What's the M step?".  I don't know enough about this area to help.