Misclassification Costs in Binary Classification

Question

0 votes

Hello everyone,

I am a student and currently new to the topic of Machine Learning. I have come across the following issue in Binary Classification.

I want to train an Ensemble Boosted Tree Model with a specific cost matrix:

function Mdl = trainBoostedTrees(trainData, trainLabels, costMatrix)
    template = templateTree('MaxNumSplits', 20,...
        'NumVariablesToSample', 'all');
    Mdl = fitcensemble(trainData,trainLabels,'Method', 'AdaBoostM1', ...
        'NumLearningCycles', 30,'Learners', template,'LearnRate', 0.1,...
        'Cost',costMatrix); 
end

Now, I want to use a cost matrix to shift the operating point down on the ROC curve and minimize the False Negative Rate.

( Kosten = Cost in German :) )

The problem is that I have to set the misclassification costs for the case so extremely high that it seems unrealistic to me (see last point in ROC). I would appreciate an answer where someone can explain to me why this is the case and what I might be doing wrong.

Another question I have in this context is: does the magnitude of my misclassification costs depend on the number of training samples?

Additional information:

My training dataset consists of approximately 49,000 x 21 samples, divided as follows:

33.3% True Labels / 66.6% False Labels

Please let me know if you need any further information or clarification. Thank you!

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Franziska Albers on 27 Jul 2023

1 vote

Hi Lars,

you are using the "cost" argument correctly. However, the effect strongly depends on your data. Some classes are just inherently hard to differentiate - you may need a different algorithm to get better performance or better features. You have a lot of training examples which is good, I would not expect the effect of the cost matrix to depend on that.

As far as I understand the "false" class is the majority class. Does this mean a false negative is a case where the "true" class is identified as "false" class? Maybe you can clarify or share the confusion matrix. Also, your false negative rate seems very low at a cost of 100. What rate are you aiming for?

In general balanced data is best for classification. If the data is not balanced you may balance it, use the "RUSBoost" method or apply a cost matrix. I would recommend that you try the RUSBoost or consider balancing your data. You can do that by oversampling the minority class or undersampling the majority class.

There is also some more information on how to handle imbalanced data here: https://www.mathworks.com/help/stats/classification-with-unequal-misclassification-costs.html

2 Comments
Show None Hide None

Lars Kilian on 28 Jul 2023

Hi Franziska,

firstly, thank you very much for your detailed response!

Yes, you're right, the false class is the majority class. My data are irregularly distributed, but this is real-life data and a realistic or real distribution of the data. That's why I thought it made sense to consider this irregular data set, to possibly consider real conditions?

My goal is to set the model so that the False Negative Rate is reduced to a minimum (maybe even lower than the current values). I accept that the True Negative Rate and the overall accuracy might drop significantly.

My problem was that I then have to go so extremely high with the cost value, that it does not seem logical to me.

Moreover, I always thought that the cost value depends on the number of training data? Is there a direct relationship between the threshold and the cost value?

Franziska Albers on 31 Jul 2023

Hi Killian,

To answer your question about the cost matrix:

fitcensemble uses Cost to adjust the prior class probabilities specified in Prior. Then, fitcensemble uses the adjusted prior probabilities for training. So, the training algorithms trains on more examples of the class with high misclassification cost. You can find more details on that here: https://www.mathworks.com/help/stats/supervised-learning-machine-learning-workflow-and-algorithms.html#mw_4cd1857b-b486-4247-b328-5fd810649696.

The cost does not have to depend on the number of training examples. However, for imbalanced datasets it is often recommended to start with a cost matrix that reflects the imbalance. So, if the majority class is 5 times bigger than the minority class it is a good practice to start with a classification cost of 5 for misclassifying the minority class and a classification cost of 1 for misclassifying the majority class. But that is just a recommended starting point. The only general thing to keep in mind is that classification costs are relative, so if you increase all costs by a factor of 5 nothing will change.

However, in your situation the problem is not o much imbalanced data. Your goal is an extremely low false negative rate even if it means worse overall accuracy – that is not a typical classification task. I would suggest to try to improve the machine learning model or (if possible) the data. I agree that a cost of more than a 1000 seems weird and possibly makes training and testing instable (this is another point where the size of your training dataset comes into play: for a small dataset very skewed cost matrixes can lead to instable behavior – that is also mentioned in the doc page linked above).

Another idea: Maybe you can frame your problem as anomaly detection and try models from that field? E.g. one-class support vector machines or isolation forests. See here: https://www.mathworks.com/help/stats/anomaly-detection.html

Sign in to comment.

Answer 2

shreyash on 29 Apr 2024

0 votes

Create a new Optimizable KNN model template from the Models section. Click Costs from the Options section in the toolstrip. Modify the matrix such that the costs for (101,111) and (111,101) are 1.5. Click Save and Apply and train your model.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Misclassification Costs in Binary Classification

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

2 Comments
Show None Hide None

More Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

Misclassification Costs in Binary Classification

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

2 Comments Show None Hide None

More Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

2 Comments
Show None Hide None

0 Comments
Show -2 older comments Hide -2 older comments