File Exchange

image thumbnail

ADASYN (improves class balance, extension of SMOTE)

version 1.2.0.0 (5.53 KB) by Dominic Siedhoff
ADASYN algorithm to reduce class imbalance by synthesizing minority class examples

24 Downloads

Updated 23 Apr 2015

View Version History

View License

This submission implements the ADASYN (Adaptive Synthetic Sampling) algorithm as proposed in the following paper:
H. He, Y. Bai, E.A. Garcia, and S. Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning", Proc. Int'l. J. Conf. Neural Networks, pp. 1322-1328, (2008).
The purpose of the ADASYN algorithm is to improve class balance by synthetically creating new examples from the minority class via linear interpolation between existing minority class examples. This approach by itself is known as the SMOTE method (Synthetic Minority Oversampling TEchnique). ADASYN is an extension of SMOTE, creating more examples in the vicinity of the boundary between the two classes than in the interior of the minority class.
A demo script producing the title figure of this submission is provided.

Cite As

Dominic Siedhoff (2021). ADASYN (improves class balance, extension of SMOTE) (https://www.mathworks.com/matlabcentral/fileexchange/50541-adasyn-improves-class-balance-extension-of-smote), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (22)

Wei Han

Thanks for your algorithm, many places are worth learning for beginners like me. Other comments also helped me, it would be perfect if you have more detailed documentation, thank you again!
Regards
Wei

David Popovic

Rohith Kamath

@Dominic Siedhoff: I could achieve multi classification SMOTE'g. But with respect to my previous question, lets say majority classes are (1.0, 0.9, 0.8, 0.7 and 0.6) and minority classes are (0.5, 0.4, 0.3, 0.2). Currently the code can handle two classes at a time and hence I started with below iteration sequence:
Iteration 1: majority class (0.6) and minority class (0.5). OUTPUT--> SMOTTED minority class (0.5)
Iteration 2: Majority class (SMOTTED 0.5) and minority class (0.4). OUTPUT --> SMOTTED minority class (0.4)
.
.
Iteration n

Am I doing it correct by taking last SMOTTED minority class as majority class for the next minority class oversampling ??

Regards,
Rohith

Rohith Kamath

@Dominic Siedhoff: I am working on time series data where input vector has 3 variables x=[x1, x2, x3] and output can be classified from 0 to 1 as 10 different classes i.e. [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1] corresponding to different (x1, x2, x3) values. It would be helpful if you could provide some help on, how I can use the code for minority oversampling of certain classes. Example: lets say for class 0.3, 0.4, 0.5, 0.6 ??

Kazuya

Dominic Siedhoff

@Tamamo Nook: The original problem I wrote the code for had way more than two features. Perhaps you got misled by the 2D example code? You could check the longer (i.e. > 2) dimension in the 2D example code and store your different instances (feature vectors) along that dimension. The other dimension is for features, which in the example is 2 feature long, but you can add more.

Hope to help
Dominic

Tamamo Nook

When I use the code for 2 features dataset it works. But this code is not compatible for more than 2 features data.

JB Porret

kinblu

Mahmoud Sayed

Suvidha

It works. I was doing it wrong. Sorry for wrong comment.

Suvidha

It does not work on image data

anjie zhang

Dominic Siedhoff

ADASYN for multiclass problems:

Hi Dylan,
you can easily extend ADASYN to multiclass problems: For a problem with k classes, simply call ADASYN k-1 times, for the k-1 classes that are not the majority, and unite all the obtained results.
Have fun,
Dominic

Dylan Brewer

Dylan Brewer

Only works for binary classification. Good otherwise.

Dominic Siedhoff

The documentation is within the ADASYN.m file (at the beginning), and demo_ADASYN.m provides a visual example. Have fun.

Santiago Ginzburg

Needs better and clear documentation and examples

Dominic Siedhoff

You get approximately balanced classes due to the way the Adasyn algorithm works: The total number of synthetic instances to create is distributed across a number of source minority instances in regions with low minority density. That may result in a variable saying "We need to create 4.333333 new instances around instance x". In such a case, the fractionate part will be rounded off, while others would be rounded up etc., leading to small rounding-related errors in the total number of synthetic instances created.

Arnold Klein

Why I do not get fully balanced classes? For example, given a binary classification problem with 18 instances in 0 class and 11 instances in 1 class, I get only 6 new instances for 1 class instead of (expected) 7?

Andreas

Ismael Huertas

Thank you for this implementation, it was very necessary to have this algorithm integrated in Matlab and it is very comprehensively written

MATLAB Release Compatibility
Created with R2011b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!