under-sample an imbalance dataset(data preprocessing)

5 views (last 30 days)
I have an imbalance dataset that has totally 8528 signals (four classes of bio-signals) here is the numbers of signals in each classes
A:5050 - B:2456 - C:738 - D:284 . (as you can see numbers and distribution of different types of classes are not balance)
How can I under-sample my imbalance dataset in order to achieve more f1score while training the dataset with different methods of machine learning?
clear all
close all
clc
Data=importdata ('REFERENCE-original.csv') ;%labels of signals from signal number1 to signal number8528
%% features extraction
num_data=length(Data);
for number_data=1:num_data
clc
number_data
num_data
name=Data{number_data,1}(1:6);
N_label=Data{number_data,1}(8);
data=load (['D:\dataset\',name,'.mat']);
signal=data.val;
DATA{number_data}=signal;
% normal=0 af=1 other=2 noise=3
if N_label=='A'
label(number_data)=0;
end
if N_label=='B'
label(number_data)=1;
end
if N_label=='C'
label(number_data)=2;
end
if N_label=='D'
label(number_data)=3;
end

Answers (1)

Sai Pavan
Sai Pavan on 17 Apr 2024 at 7:07
Hello Yasaman,
I understand that you want to achieve more F1 score while training an imbalance dataset. The problem of imbalance dataset can be countered using class weights to modify the training.
Class weights define the relative importance of each class to the training process. To prevent the network being biased towards more prevalent classes, we can calculate the class weights that are inversely proportional to the frequency of the classes. Please refer to the below section of the example that illustrates the method of calculating class weights: https://www.mathworks.com/help/deeplearning/ug/sequence-classification-using-inverse-frequency-class-weights.html#:~:text=TTest%20%3D%20labelsImbalanced(idxTest)%3B-,Determine%20Inverse%2DFrequency%20Class%20Weights,-For%20typical%20classification
We can then create a custom loss function that takes predictions Y and targets T and returns the weighted cross-entropy loss for training a classification network as shown in the code snippet attached below:
lossFcn = @(Y,T) crossentropy(Y,T,NormalizationFactor="all-elements",Weights=classWeights, ...
WeightsFormat="C")*numClasses;
Please refer to the below example that uses class weights to counter the imbalance dataset problem: https://www.mathworks.com/help/deeplearning/ug/sequence-classification-using-inverse-frequency-class-weights.html
Hope it helps!

Categories

Find more on Measurements and Feature Extraction in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!