Asked by Charles Henri
on 4 Jan 2013

Hi!

I have imbalanced data to classify thus the mse performance function is not suitable, just as the mae, sae and sse.

So, I would create a new performance function based on sensibility and specificity but I have not found any way to edit it.

The only thing I found is "template_performance" but it's obsoleted for Matlab 2012 and, anyway, I don't understand how manage with it.

So, please, could you provide me with an example or a tutorial ?

Thanks by advance

Answer by Greg Heath
on 12 Jan 2013

Accepted answer

I have never had reliable results with a MLP when the training priors differed by more than a factor of 2.

If you cannot oversample the underrepresented class, then undersample the overrepresented class (with each subsample no larger than twice the size of the smaller class).

A good way to subsample the larger class is to cluster it into multiple localized subsets that are subsequently randomly sampled.

Combine the results of independently trained multiple nets in either an ensemble (combine probability estimates) or a commitee (combine classification votes).

I have never had reliable results with a MLP using the noncontinuous misclassification error as a direct minimization goal.

Minimize MSE or weighted MSE for 0 or 1 targets.

Vary the MSE error weights until you can get approximately equal MSEs for both classes.

Use a holdout validation set and a varying threshold from 0 to 1 in order to get your operating curve.

Hope this helps.

Thank you for formally accepting my answer.

Greg

Answer by Greg Heath
on 5 Jan 2013

Edited by Greg Heath
on 5 Jan 2013

I have written about the unbalanced classification problem many times.

Try searching comp.ai.neural-nets and the CSSM newgroup

heath unbalanced

Stop laughing.

The quickest solution is to duplicate vectors in the smaller classes so that all classes have equal sizes.

Then, for c classes, use columns of the c-dimensional unit matrix as targets

Hope this helps.

Thank you for formally accepting my answer.

Greg

Answer by Charles Henri
on 11 Jan 2013

Hi Greg,

First, sorry for not having answered you earlier, I had issues with my internet connection.

Thanks for your answer but I really want to change my performance function because over-sampling is not the most appropriate way for my work.

I must use a performance function based on sensibility and specificity.

Thanks again,

CHC

Omer
on 4 Feb 2014

I have also similar problem:

I am using nn toolbox functions to create a neural network for classification purpose (2 output neuron). Instead of using standard performance function to optimize, I want to use my own custom. Such that my performance function will be:

( fp/(fp+tn) ) + ( fn/(fn+tp) );

where

tp: true positive fn: false negative and so on. Of course output y of the network must be converted to 0 or 1. maybe like this:

yPred = ( y(:,1) > y(:,2) );

How can I do this with using newpr or newff?

any help appreciated Thanks

Answer by Greg Heath
on 12 Feb 2014

This may be of interest:

http://www.mathworks.com/matlabcentral/answers/56137-how-to-use-a-custom-transfer-function-in-neural-net-training

Greg

Answer by Greg Heath
on 13 Feb 2014

You have the misleading impression that unbalanced data requires changing the minimization objective function.

It does not.

If you duplicate some of the underrepresented class members and then modify them slightly by adding a little noise, the imbalance problem is solved.

You can then weight the posterior probability estimates with class-conditional prior probabilities and misclassification rates to forma risk function via Bayes Theory. The input is then assigned to the class that results in minimum risk.

This is classical pattern recognition covered in any decent pattern recognition text.

I have classified the BioID data set using this technique. Search the NEWSGROUP and comp.ai.neural-nets using combinations of search words like

greg BioID unbalanced priors

Hope this helps.

Greg

Opportunities for recent engineering grads.

## 0 Comments