View License

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Highlights from
RUSBoost

Join the 15-year community celebration.

Play games and win prizes!

» Learn more

4.0
4.0 | 11 ratings Rate this file 15 Downloads (last 30 days) File Size: 5.38 MB File ID: #37315 Version: 1.1
image thumbnail

RUSBoost

by

Barnan Das (view profile)

 

26 Jun 2012 (Updated )

RUSBoost is a boosting-based sampling algorithm that handles class imbalance in class labeled data.

| Watch this File

File Information
Description

This code implements RUSBoost. RUSBoost is an algorithm to handle class imbalance problem in data with discrete class labels. It uses a combination of RUS (random under-sampling) and the standard boosting procedure AdaBoost, to better model the minority class by removing majority class samples. It is very similar to SMOTEBoost, which is another algorithm that combines boosting and
data sampling, but claims to achieves the goal with random under-sampling (RUS)
of majority class examples. This method results in a simpler algorithm with faster model training time.

The current implementation of RUSBoost has been independently done by the author
for the purpose of research. In order to enable the users use a lot of different
weak learners for boosting, an interface is created with Weka API. Currently, four Weka algortihms could be used as weak learner: J48, SMO, IBk, Logistic. It uses 10 boosting iterations and achieves a class imbalance ratio of 35:65 (minority:majority) at each boosting iteration by removing the majority class
samples.

For more detail on the theoretical description of the algorithm please refer to the following paper:
C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse and A. Napolitano, "RUSBoost:
A Hybrid Approach to Alleviating Class Imbalance, IEEE Transaction on Systems,
Man and Cybernetics-Part A: Systems and Human, Vol.40(1), January 2010.

Required Products MATLAB
MATLAB release MATLAB 7.12 (R2011a)
Other requirements JDK 6 or above
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (30)
14 Mar 2016 albert wang

from this webiste,you can found the new code for changing the .mat,.txt and .csv to the arff format.http://blog.csdn.net/jiandanjinxin/article/details/50886826

Comment only
14 Mar 2016 albert wang

It is remarkable job! Thanks for your sharing. I have found the code change the format(.txt,.mat,.csv) to the arff format.

19 Dec 2015 jacky chen

thanks first, but it's limited to the data, I use the double data, such as add the following after line 9 in Test.m:

data(:,1:6) = data(:,1:6)+0.1;

then it gives me a error:
Java exception occurred:
java.io.IOException: nominal value not declared in header, read Token[5], line 11

at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)

at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)

at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)

at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)

at weka.core.Instances.<init>(Instances.java:124)

I find the MATLAB has the RUBoost method with the function;fitensemble

04 Dec 2015 honey AE

Is it necessary to install WEKA After downloading the algorithm?

Comment only
13 Nov 2015 honey AE  
13 Nov 2015 honey AE  
10 Apr 2015 virendra singh

RUSBOOST are runing well but when i am applying new dataset then it is giving a error eventhough i have done all the think which is mention in ARFFheader.txt file but still showing error.

Comment only
04 Feb 2015 virendra singh

Index exceeds Java array dimensions

Error in RUSBoost (line 106)
if train(i,end)==pred(i)

Error in Test (line 34)
prediction = RUSBoost(train_data,test_data,'svm');

Comment only
27 Jan 2015 virendra singh

sir when i am runing Test.m then following error comming..please sir help me it,s really important.

??? Error using ==> fprintf
Invalid file identifier. Use fopen to generate a valid file identifier.

Error in ==> CSVtoARFF at 11
fprintf(farff, '@relation %s', relation);

Error in ==> RUSBoost at 33
CSVtoARFF (train, 'train', 'train');

Error in ==> Test at 34
prediction = RUSBoost(train_data,test_data,'tree');

22 Nov 2014 Gokhan Kirlik  
10 Aug 2014 ben Glampson

HI When running on the file test.m I get the message that boosting is being aborted as "Too many iterations have loss > 0.5" is this expected after 3/4?

Also I wondered after training is there a way to re run the model on new data as it comes in?

Many thanks,

Ben

25 Feb 2013 lu li

lu li (view profile)

Sorry for being inquisitive, but I got another question: how updated weights ‘W’ are associated with resampled dataset 'RESAMPLED'? It seems in your code upadated weights are only associated with the original training data.

Comment only
25 Feb 2013 alex hsu

thanks a lot^^

Comment only
25 Feb 2013 lu li

lu li (view profile)

thanks a lot

Comment only
24 Feb 2013 Barnan Das

Barnan Das (view profile)

Alex, you need to set the Java CLASSPATH environment variable to the downloaded RUSBoost directory.

Comment only
24 Feb 2013 Barnan Das

Barnan Das (view profile)

Lu Li, the last line of the code is correct. Although, the prediction in this case is '0', I have printed out the probability of the sample being '1' to indicate how poorly it is performing. A value of 1 - Prob(1) would give you the value of Prob(0). I hope it makes sense.

Comment only
24 Feb 2013 alex hsu

yes,i download the RUSBoost source code again.

Comment only
24 Feb 2013 lu li

lu li (view profile)

Are you sure you did not make any change to the Test.m file?

Comment only
24 Feb 2013 alex hsu

Thanks lu li,but still unable to perform.

Comment only
24 Feb 2013 lu li

lu li (view profile)

I think there is a bug in RUSBoost.m
the last line
prediction(i,:) = [0 wt_one];
should be
prediction(i,:) = [0 wt_zero];

Comment only
24 Feb 2013 alex hsu

I have interest in RUSBoost,but i cannot run the Test.m. Please help me~THX

Warning: The argument for the %s format
specifier must be of type char (a string).
> In CSVtoARFF at 18
In RUSBoost at 33
In Test at 34
??? Error using ==> javaObject
No class weka.core.Instances can be
located on Java class path

Error in ==> RUSBoost at 35
train = javaObject('weka.core.Instances',
train_reader);

Error in ==> Test at 34
prediction =
RUSBoost(train_data,test_data,'tree');

Comment only
24 Feb 2013 lu li

lu li (view profile)

 
24 Feb 2013 lu li

lu li (view profile)

 
24 Feb 2013 lu li

lu li (view profile)

 
24 Feb 2013 lu li

lu li (view profile)

 
23 Feb 2013 Barnan Das

Barnan Das (view profile)

There is no automated way of generating ARFF file. Although, the WEKA GUI has one such option. If there are hundreds of features of the data, I think they should generated by a piece of code.

Comment only
23 Feb 2013 lu li

lu li (view profile)

Are there any convenient ways to define the ARFFheader if the specific problem have hundreds of features of same type?

Comment only
23 Feb 2013 lu li

lu li (view profile)

But is there any way to generate ARFFheard file automatically?

Comment only
22 Feb 2013 Barnan Das

Barnan Das (view profile)

Please ensure that your data format complies with the ARFF standard. You can feed the appropriate ARFF format in "ARFFheader.txt" file. Please take a look at "README.txt" for more details on how to run a new dataset.

Comment only
22 Feb 2013 lu li

lu li (view profile)

I think RUSBoost is an excellent idea.
I run the script Test.m, it works fine, but if I use other data, for instance, when I change several lines at the begnining of Test.m

angle=rand(200,1)*2*pi; l=rand(200,1)*40+30; blue=[sin(angle).*l cos(angle).*l];
angle=rand(200,1)*2*pi; l=rand(200,1)*40; red=[sin(angle).*l cos(angle).*l];

datafeatures=[blue;red];
dataclass(1:200)=0; dataclass(201:400)=1;
data=[datafeatures,dataclass];

the following error occur

Error using javaObject
Java exception occurred:
java.io.IOException: nominal value not declared in header, read Token[1.225178e+01], line 11

at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)

at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)

at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)

at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)

at weka.core.Instances.<init>(Instances.java:124)

Error in RUSBoost (line 35)
train = javaObject('weka.core.Instances', train_reader);

Error in Test2 (line 36)
prediction = RUSBoost(train_data,test_data,'tree');

Comment only
Updates
07 Aug 2012 1.1

Bug fixes; screenshot added.

Contact us