Code covered by the BSD License  

Highlights from
RUSBoost

3.75
3.8 | 7 ratings Rate this file 18 Downloads (last 30 days) File Size: 5.38 MB File ID: #37315
image thumbnail

RUSBoost

by

 

26 Jun 2012 (Updated )

RUSBoost is a boosting-based sampling algorithm that handles class imbalance in class labeled data.

| Watch this File

File Information
Description

This code implements RUSBoost. RUSBoost is an algorithm to handle class imbalance problem in data with discrete class labels. It uses a combination of RUS (random under-sampling) and the standard boosting procedure AdaBoost, to better model the minority class by removing majority class samples. It is very similar to SMOTEBoost, which is another algorithm that combines boosting and
data sampling, but claims to achieves the goal with random under-sampling (RUS)
of majority class examples. This method results in a simpler algorithm with faster model training time.

The current implementation of RUSBoost has been independently done by the author
for the purpose of research. In order to enable the users use a lot of different
weak learners for boosting, an interface is created with Weka API. Currently, four Weka algortihms could be used as weak learner: J48, SMO, IBk, Logistic. It uses 10 boosting iterations and achieves a class imbalance ratio of 35:65 (minority:majority) at each boosting iteration by removing the majority class
samples.

For more detail on the theoretical description of the algorithm please refer to the following paper:
C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse and A. Napolitano, "RUSBoost:
A Hybrid Approach to Alleviating Class Imbalance, IEEE Transaction on Systems,
Man and Cybernetics-Part A: Systems and Human, Vol.40(1), January 2010.

Required Products MATLAB
MATLAB release MATLAB 7.12 (R2011a)
Other requirements JDK 6 or above
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (23)
04 Feb 2015 virendra singh

Index exceeds Java array dimensions

Error in RUSBoost (line 106)
if train(i,end)==pred(i)

Error in Test (line 34)
prediction = RUSBoost(train_data,test_data,'svm');

Comment only
27 Jan 2015 virendra singh

sir when i am runing Test.m then following error comming..please sir help me it,s really important.

??? Error using ==> fprintf
Invalid file identifier. Use fopen to generate a valid file identifier.

Error in ==> CSVtoARFF at 11
fprintf(farff, '@relation %s', relation);

Error in ==> RUSBoost at 33
CSVtoARFF (train, 'train', 'train');

Error in ==> Test at 34
prediction = RUSBoost(train_data,test_data,'tree');

22 Nov 2014 Gokhan Kirlik  
10 Aug 2014 ben Glampson

HI When running on the file test.m I get the message that boosting is being aborted as "Too many iterations have loss > 0.5" is this expected after 3/4?

Also I wondered after training is there a way to re run the model on new data as it comes in?

Many thanks,

Ben

25 Feb 2013 lu li

lu li (view profile)

Sorry for being inquisitive, but I got another question: how updated weights ‘W’ are associated with resampled dataset 'RESAMPLED'? It seems in your code upadated weights are only associated with the original training data.

Comment only
25 Feb 2013 alex hsu

thanks a lot^^

Comment only
25 Feb 2013 lu li

lu li (view profile)

thanks a lot

Comment only
24 Feb 2013 Barnan Das

Alex, you need to set the Java CLASSPATH environment variable to the downloaded RUSBoost directory.

Comment only
24 Feb 2013 Barnan Das

Lu Li, the last line of the code is correct. Although, the prediction in this case is '0', I have printed out the probability of the sample being '1' to indicate how poorly it is performing. A value of 1 - Prob(1) would give you the value of Prob(0). I hope it makes sense.

Comment only
24 Feb 2013 alex hsu

yes,i download the RUSBoost source code again.

Comment only
24 Feb 2013 lu li

lu li (view profile)

Are you sure you did not make any change to the Test.m file?

Comment only
24 Feb 2013 alex hsu

Thanks lu li,but still unable to perform.

Comment only
24 Feb 2013 lu li

lu li (view profile)

I think there is a bug in RUSBoost.m
the last line
prediction(i,:) = [0 wt_one];
should be
prediction(i,:) = [0 wt_zero];

Comment only
24 Feb 2013 alex hsu

I have interest in RUSBoost,but i cannot run the Test.m. Please help me~THX

Warning: The argument for the %s format
specifier must be of type char (a string).
> In CSVtoARFF at 18
In RUSBoost at 33
In Test at 34
??? Error using ==> javaObject
No class weka.core.Instances can be
located on Java class path

Error in ==> RUSBoost at 35
train = javaObject('weka.core.Instances',
train_reader);

Error in ==> Test at 34
prediction =
RUSBoost(train_data,test_data,'tree');

Comment only
24 Feb 2013 lu li

lu li (view profile)

 
24 Feb 2013 lu li

lu li (view profile)

 
24 Feb 2013 lu li

lu li (view profile)

 
24 Feb 2013 lu li

lu li (view profile)

 
23 Feb 2013 Barnan Das

There is no automated way of generating ARFF file. Although, the WEKA GUI has one such option. If there are hundreds of features of the data, I think they should generated by a piece of code.

Comment only
23 Feb 2013 lu li

lu li (view profile)

Are there any convenient ways to define the ARFFheader if the specific problem have hundreds of features of same type?

Comment only
23 Feb 2013 lu li

lu li (view profile)

But is there any way to generate ARFFheard file automatically?

Comment only
22 Feb 2013 Barnan Das

Please ensure that your data format complies with the ARFF standard. You can feed the appropriate ARFF format in "ARFFheader.txt" file. Please take a look at "README.txt" for more details on how to run a new dataset.

Comment only
22 Feb 2013 lu li

lu li (view profile)

I think RUSBoost is an excellent idea.
I run the script Test.m, it works fine, but if I use other data, for instance, when I change several lines at the begnining of Test.m

angle=rand(200,1)*2*pi; l=rand(200,1)*40+30; blue=[sin(angle).*l cos(angle).*l];
angle=rand(200,1)*2*pi; l=rand(200,1)*40; red=[sin(angle).*l cos(angle).*l];

datafeatures=[blue;red];
dataclass(1:200)=0; dataclass(201:400)=1;
data=[datafeatures,dataclass];

the following error occur

Error using javaObject
Java exception occurred:
java.io.IOException: nominal value not declared in header, read Token[1.225178e+01], line 11

at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)

at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)

at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)

at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)

at weka.core.Instances.<init>(Instances.java:124)

Error in RUSBoost (line 35)
train = javaObject('weka.core.Instances', train_reader);

Error in Test2 (line 36)
prediction = RUSBoost(train_data,test_data,'tree');

Comment only
Updates
07 Aug 2012

Bug fixes; screenshot added.

Contact us