|
On Apr 27, 9:26 pm, "John G" <a...@yahoo.com> wrote:
> Peter Perkins <Peter.Perk...@MathRemoveThisWorks.com> wrote in message <hr81mf$nn...@fred.mathworks.com>...
> > On 4/27/2010 8:04 PM, John G wrote:
> > > The LDA built into the stats toolbox appears to assume covariances equal
> > > & classes distributed normally, unlike Fisher LDA,
>
> > That _is_ Fisher LDA. How do you define it? If you want _unequal_ cov
> > matrices, that's quadratic discriminant analysis.
>
> I guess I was wrong then. I thought Fisher's LDA was a bit different (Wikipedia says it doesn't necessarily make the same assumptions as regular LDA).
>
> How do you implement the MatLab LDA then?
>
> [C,err,P,logp,coeff] = classify(sample,training,group,'linear')
>
> but what would you use for group and training? The example is kind of unclear. I'm uncertain what a training data set is - is it a particular subset of the m x m array you're working with or can it be generalized to something else or what?
In general, the total data set is partitioned into three subsets
training design data used to directly determine the weights,
given training parameters (e.g., the % of data used
for each of the subsets and/or the prior
probability
weighting and misclassification costs for each
class)
validation nontraining design data repetitively used to estimate
predictive performance so that training parameters
can be optimized.
test nondesign data used once and only once to obtain
an unbiased estimate of predictive performance on
unseen data.
If you wish to retrain because the test set performance is
significantly
worse than validation set performance, you should repartition the
data
to try to keep the new test results as unbiased as possible.
Typically,
I find that when this happens 10-fold crossvalidation is a better
alternative.
Hope this helps.
Greg
|