Misclassification Costs in Binary Classification
Show older comments
Hello everyone,
I am a student and currently new to the topic of Machine Learning. I have come across the following issue in Binary Classification.
I want to train an Ensemble Boosted Tree Model with a specific cost matrix:
function Mdl = trainBoostedTrees(trainData, trainLabels, costMatrix)
template = templateTree('MaxNumSplits', 20,...
'NumVariablesToSample', 'all');
Mdl = fitcensemble(trainData,trainLabels,'Method', 'AdaBoostM1', ...
'NumLearningCycles', 30,'Learners', template,'LearnRate', 0.1,...
'Cost',costMatrix);
end
Now, I want to use a cost matrix to shift the operating point down on the ROC curve and minimize the False Negative Rate.
( Kosten = Cost in German :) )The problem is that I have to set the misclassification costs for the case so extremely high that it seems unrealistic to me (see last point in ROC). I would appreciate an answer where someone can explain to me why this is the case and what I might be doing wrong.
Another question I have in this context is: does the magnitude of my misclassification costs depend on the number of training samples?
Additional information:
My training dataset consists of approximately 49,000 x 21 samples, divided as follows:
33.3% True Labels / 66.6% False Labels
Please let me know if you need any further information or clarification. Thank you!
Accepted Answer
More Answers (1)
shreyash
on 29 Apr 2024
0 votes
Create a new Optimizable KNN model template from the Models section. Click Costs from the Options section in the toolstrip. Modify the matrix such that the costs for (101,111) and (111,101) are 1.5. Click Save and Apply and train your model.
Categories
Find more on Classification Ensembles in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!