Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
the math in classify.m

Subject: the math in classify.m

From: marquito

Date: 9 Aug, 2005 08:38:35

Message: 1 of 8

Hi!

In my thesis I'm using to classify data the Matlab 'classify'
function with linear discrimination. To see what the math behind it
is, I looked into the code and found this:

=====================================================
    % Pooled estimate of covariance
    [Q,R] = qr(training - gmeans(gindex,:), 0);
    R = R / sqrt(n - ngroups); % SigmaHat = R'*R
    s = svd(R);
    if any(s <= eps^(3/4)*max(s))
        error('The pooled covariance matrix of TRAINING
               must be positive definite.');
    end

    % MVN relative log posterior density, by group, for
      each sample
    for k = 1:ngroups
        A = (sample - repmat(gmeans(k,:), mm, 1)) / R;
        D(:,k) = log(prior(k)) - .5*sum(A .* A, 2);
    end
======================================================

I dont know exactly, was is going on there. I expected to see
something like a multivariate Gauss distribution, like:

p(x|class) = 1/sqrt(2pi^d * |Sigma| ) * exp( (x-mu)Sigma^-1(x-mu))

or something similar to this. Could somebody verify this or explain
what kind of magic the programmer used (why qr decomposition?).
I'm also very interested how 'D' is calculated since I use this value
to show the distances of a sample to the different classes. Shouldn't
this be the probability density function of x for the different
classes?

Thanks in advance!

Subject: the math in classify.m

From: Peter Perkins

Date: 9 Aug, 2005 09:48:38

Message: 2 of 8

marquito wrote:

> =====================================================
> % Pooled estimate of covariance
> [Q,R] = qr(training - gmeans(gindex,:), 0);
> R = R / sqrt(n - ngroups); % SigmaHat = R'*R
> s = svd(R);
> if any(s <= eps^(3/4)*max(s))
> error('The pooled covariance matrix of TRAINING
> must be positive definite.');
> end
>
> % MVN relative log posterior density, by group, for
> each sample
> for k = 1:ngroups
> A = (sample - repmat(gmeans(k,:), mm, 1)) / R;
> D(:,k) = log(prior(k)) - .5*sum(A .* A, 2);
> end
> ======================================================
>
> I dont know exactly, was is going on there. I expected to see
> something like a multivariate Gauss distribution, like:
>
> p(x|class) = 1/sqrt(2pi^d * |Sigma| ) * exp( (x-mu)Sigma^-1(x-mu))
>
> or something similar to this. Could somebody verify this or explain
> what kind of magic the programmer used (why qr decomposition?).

You're right that the density formally involves the inverse of a cov matrix.
But inverting a potentially large matrix explicitly is usually not a good idea.
  Write down what the estimate SigmaHat would be (X0'*X0, where X0 is the
centered data matrix, bearing in mind that the centering is different for each
class), then write X0 as Q*R. Just like the comment says, SigmaHat is R'*R,
because Q is orthonormal. Now substitute that into the quadratic form
(x-mu)Sigma^-1(x-mu), and you find that you can compute that quadratic form
using backsolve on a triangular R.


> I'm also very interested how 'D' is calculated since I use this value
> to show the distances of a sample to the different classes. Shouldn't
> this be the probability density function of x for the different
> classes?

It is the log of that, multiplied by the class prior probabilities. Computed on
the log scale, because multivariate probabilities get very, very small.

With a flat prior, I think the third output POSTERIOR is what you are asking for.

Hope this helps.

- Peter Perkins
   The MathWorks, Inc.

Subject: the math in classify.m

From: marquito

Date: 9 Aug, 2005 12:06:05

Message: 3 of 8

Thanks for the quick answer! I fully understand the code now.

> With a flat prior, I think the third output POSTERIOR is what you
> are asking for.

Yes, I need the posterior p(class|x). Unfortunately my version of
classify.m (Matlab 6.5) doesn't have the option nor the code to
return it.
To fully calculate p(k|x) = p(x|k)*p(k)/p(x) I replaced the for-loop
with:

=====================================================
    % MVN relative log posterior density, by group, for
      each sample
    Sigma = R'*R;
    Sigma_det = det(2*pi*Sigma);
    help1 = 1/sqrt( Sigma_det );
    norm = 0;
    for k = 1:ngroups,
        A = (sample - repmat(gmeans(k,:), mm, 1)) / R;
        help2 = exp(-.5*sum(A.*A, 2));
        pdf = help1 * help2; % p(x|k)
        norm = norm + prior(k)*pdf; % p(x)
        D(:,k) = prior(k) .* pdf;
    end
    
    D = D ./ (repmat(norm, 1, ngroups)); % p(k|x)
====================================================

If I haven't overlooked something important, this should be fine,
right?

Subject: the math in classify.m

From: Peter Perkins

Date: 9 Aug, 2005 12:51:11

Message: 4 of 8

marquito wrote:

> Yes, I need the posterior p(class|x). Unfortunately my version of
> classify.m (Matlab 6.5) doesn't have the option nor the code to
> return it.

It almost does -- D(i,k) is log(p(x_i|g_k)). So if you exponentiate D and
normalize each row, you have p(g_k|x_i).

Subject: the math in classify.m

From: Greg Heath

Date: 9 Aug, 2005 10:56:29

Message: 5 of 8


marquito wrote:
> Hi!
>
> In my thesis I'm using to classify data the Matlab 'classify'
> function with linear discrimination. To see what the math behind it
> is, I looked into the code and found this:
>
> =====================================================
> % Pooled estimate of covariance
> [Q,R] = qr(training - gmeans(gindex,:), 0);
> R = R / sqrt(n - ngroups); % SigmaHat = R'*R
> s = svd(R);
> if any(s <= eps^(3/4)*max(s))
> error('The pooled covariance matrix of TRAINING
> must be positive definite.');
> end
>
> % MVN relative log posterior density, by group, for
> each sample
> for k = 1:ngroups
> A = (sample - repmat(gmeans(k,:), mm, 1)) / R;
> D(:,k) = log(prior(k)) - .5*sum(A .* A, 2);
> end
> ======================================================
>
> I dont know exactly, was is going on there. I expected to see
> something like a multivariate Gauss distribution, like:
>
> p(x|class) = 1/sqrt(2pi^d * |Sigma| ) * exp( (x-mu)Sigma^-1(x-mu))
>
> or something similar to this. Could somebody verify this or explain
> what kind of magic the programmer used (why qr decomposition?).
> I'm also very interested how 'D' is calculated since I use this value
> to show the distances of a sample to the different classes. Shouldn't
> this be the probability density function of x for the different
> classes?

Just because one postulates that x is d-dimensional, doesn't
make it so. Accordingly, sometimes two or more inputs are highly
collinear and the data really lies in a lower dimensional subspace.
Sigma is then ill conditioned (i.e., nearly singular) and, at best,
matrix inversion results in garbage. However, you don't have to cry
"error" and give up.

There are at least two fixes. First, reduce the dimensionality
of the data by projecting it into the dominant subspace. Second,
use matrix pseudoinversion. I successfully used the latter for more
than 20 years.

QR decomposition yields more accurate results than ordinary
inversion as long as the matrix is not too badly conditioned.
The conditioning of sigma is readily determined by using
rank(sigma) and cond(sigma). If it is well conditioned,
the slash solution sigma\eye(d) will work. If it is not
well conditioned but not quite singular, use QR. If it is
ill conditioned or even singular, use pseudoinversion.

Apparently classify takes the middle road.

Hope this helps.

Greg

Subject: the math in classify.m

From: marquito

Date: 10 Aug, 2005 11:24:43

Message: 6 of 8

Thanks both of you for your replies - you really helped me alot! To
Peter I must say that this is a very nice piece of code in
classify.m. I finally understand, why I had to take all those Linear
Algebra lessons in my studies...

Greg Heath wrote:

> use matrix pseudoinversion. I successfully used the
> latter for more than 20 years.

This is exactly what I needed, since I dont have much training data.
It's mostly close to d (dimension of x). When I use 'linear'
discrimination thats normally not a problem since a pooled covariance
is used. But with the option 'quadratic' the covariance matrix was
often ill conditioned. So I implemented pseudoinversion like you
recommended and the results I got were better than with 'linear'!

Well, I guess that nothing beats lots of training data but since its
impossible in my case that really helped me out.

Subject: the math in classify.m

From: Aniket

Date: 3 Dec, 2012 22:45:07

Message: 7 of 8

Hi

This is a little old thread but I hope its active!
Even I am using classify.m to classify my data into two groups. To get a better insight I tried to understand the math involved in the 'linear' case of classify.m. However, I could not really make a whole lot of sense out of it.

I referred the multivariate analysis book by W.J. Krzanowski and found one of the criteria for classification is the closeness of an observation to the mean of a class. Is this what is used in classify.m or is it different?

I understand my question is more statistics oriented than matlab based, but I hope to find some fundamental answers.

Thanks!

Subject: the math in classify.m

From: Greg Heath

Date: 4 Dec, 2012 00:34:10

Message: 8 of 8

"Aniket" wrote in message <k9ja1j$1gg$1@newscl01ah.mathworks.com>...
> Hi
>
> This is a little old thread but I hope its active!
> Even I am using classify.m to classify my data into two groups. To get a better insight I tried to understand the math involved in the 'linear' case of classify.m. However, I could not really make a whole lot of sense out of it.
>
> I referred the multivariate analysis book by W.J. Krzanowski and found one of the criteria for classification is the closeness of an observation to the mean of a class. Is this what is used in classify.m or is it different?

It is more general. The measure of closeness is the log of the Gaussian probability distribution which contains the (squared) Mahalanobis distance and log of the apriori probability.

> I understand my question is more statistics oriented than matlab based, but I hope to find some fundamental answers.

Hope this helps.

Greg

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us