Features of this implementation of LDA:
- Allows for >2 classes
- Permits user-specified prior probabilities
- Requires only base MATLAB (no toolboxes needed)
- Assumes that the data is complete (no missing values)
- Has been verified against statistical software
- "help LDA" provides usage and an example, including conditional probability calculation
Note: This routine always includes the prior probability adjustment to the linear score functions. (Some other LDA software drops this when the user specifies equal prior probabilities.)
But I agree with maryam, after you find the coefficients for the training data, it should be used to test and classify another data which is missed the class label.
It is great if it is contains a testing part.
it might be worth to mention will's blogspot entry where he explains the code in some more detail and also answers some interesting questions:
http://matlabdatamining.blogspot.de/2010/12/linear-discriminant-analysis-lda.html
I'm an utter beginner with LDA, but I'm getting quite different class probability results using this vs. the 'classify' routine from the statistics toolbox. Perhaps it's the prior probability adjustment, but it would be nice if this had a literature reference and/or comparable results to classify.
I’m oberstein, PHD student of university of Paris.
Thank you very much for your share of your LDA (discriminant analysis) code, I find it on the web of Matlab center, it is very useful for me, yours is more intelligent than mine o(∩_∩)o
But there are some things of your code that I don’t understand, Can I ask you three questions about your LDA code?
Thank you at first!
1 For Accumulate pooled covariance information, why do you use ((nGroup(i) - 1) / (n - k) ) in “PooledCov = PooledCov + ((nGroup(i) - 1) / (n - k) ).* cov(Input(Group,:))”? Why it isn’t nGroup(i)/ n or nGroup(i)/ n-1 witch we use often in the probability? Can you tell me the raison or the theory with ((nGroup(i) - 1) / (n - k) )?
2 I don’t quite understand you Matrix W.
2-1) In the LDA, we find at first Sw (with-in-class scatter matrix) and Sb (between-class scatter matrix), and then we can find the eigenvectors of inv(Sw)*Sb, isn’t it? What is your matrix W? Is it the eigenvectors? – I don’t think so. Is it the matrix inv(Sw)*Sb? – But why you add the term log(PriorProb(i))?
2-2) Can you tell me something about the term log(PriorProb(i)) ? I don’t understand why it is here in W(:,1). Is it for the linear regression?
3 For calculate class probabilities at last, why do you use exponent? P=exp(L)./repmat(sum(exp(L),2),[1 2]), it can’t be L./ repmat(sum(L),2),[1 2]) ? I don’t understand why we must use the exponent to calculate the probabilities.
Input and Target are both from the training data. "Input" is a matrix containing the independent variables, while "Target" contains the dependent variable.