This submission implements the ADASYN (Adaptive Synthetic Sampling) algorithm as proposed in the following paper:
H. He, Y. Bai, E.A. Garcia, and S. Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning", Proc. Int'l. J. Conf. Neural Networks, pp. 1322-1328, (2008).
The purpose of the ADASYN algorithm is to improve class balance by synthetically creating new examples from the minority class via linear interpolation between existing minority class examples. This approach by itself is known as the SMOTE method (Synthetic Minority Oversampling TEchnique). ADASYN is an extension of SMOTE, creating more examples in the vicinity of the boundary between the two classes than in the interior of the minority class.
A demo script producing the title figure of this submission is provided.
Dominic Siedhoff (2021). ADASYN (improves class balance, extension of SMOTE) (https://www.mathworks.com/matlabcentral/fileexchange/50541-adasyn-improves-class-balance-extension-of-smote), MATLAB Central File Exchange. Retrieved .
Inspired by: SMOTEBoost, SMOTE (Synthetic Minority Over-Sampling Technique)
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Create scripts with code, output, and formatted text in a single executable document.
Thanks for your algorithm, many places are worth learning for beginners like me. Other comments also helped me, it would be perfect if you have more detailed documentation, thank you again!
Regards
Wei
@Dominic Siedhoff: I could achieve multi classification SMOTE'g. But with respect to my previous question, lets say majority classes are (1.0, 0.9, 0.8, 0.7 and 0.6) and minority classes are (0.5, 0.4, 0.3, 0.2). Currently the code can handle two classes at a time and hence I started with below iteration sequence:
Iteration 1: majority class (0.6) and minority class (0.5). OUTPUT--> SMOTTED minority class (0.5)
Iteration 2: Majority class (SMOTTED 0.5) and minority class (0.4). OUTPUT --> SMOTTED minority class (0.4)
.
.
Iteration n
Am I doing it correct by taking last SMOTTED minority class as majority class for the next minority class oversampling ??
Regards,
Rohith
@Dominic Siedhoff: I am working on time series data where input vector has 3 variables x=[x1, x2, x3] and output can be classified from 0 to 1 as 10 different classes i.e. [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1] corresponding to different (x1, x2, x3) values. It would be helpful if you could provide some help on, how I can use the code for minority oversampling of certain classes. Example: lets say for class 0.3, 0.4, 0.5, 0.6 ??
@Tamamo Nook: The original problem I wrote the code for had way more than two features. Perhaps you got misled by the 2D example code? You could check the longer (i.e. > 2) dimension in the 2D example code and store your different instances (feature vectors) along that dimension. The other dimension is for features, which in the example is 2 feature long, but you can add more.
Hope to help
Dominic
When I use the code for 2 features dataset it works. But this code is not compatible for more than 2 features data.
It works. I was doing it wrong. Sorry for wrong comment.
It does not work on image data
ADASYN for multiclass problems:
Hi Dylan,
you can easily extend ADASYN to multiclass problems: For a problem with k classes, simply call ADASYN k-1 times, for the k-1 classes that are not the majority, and unite all the obtained results.
Have fun,
Dominic
Only works for binary classification. Good otherwise.
The documentation is within the ADASYN.m file (at the beginning), and demo_ADASYN.m provides a visual example. Have fun.
Needs better and clear documentation and examples
You get approximately balanced classes due to the way the Adasyn algorithm works: The total number of synthetic instances to create is distributed across a number of source minority instances in regions with low minority density. That may result in a variable saying "We need to create 4.333333 new instances around instance x". In such a case, the fractionate part will be rounded off, while others would be rounded up etc., leading to small rounding-related errors in the total number of synthetic instances created.
Why I do not get fully balanced classes? For example, given a binary classification problem with 18 instances in 0 class and 11 instances in 1 class, I get only 6 new instances for 1 class instead of (expected) 7?
Thank you for this implementation, it was very necessary to have this algorithm integrated in Matlab and it is very comprehensively written