Code covered by the BSD License  

Highlights from
RUSBoost

5.0

5.0 | 5 ratings Rate this file 34 Downloads (last 30 days) File Size: 5.38 MB File ID: #37315
image thumbnail

RUSBoost

by

 

26 Jun 2012 (Updated )

RUSBoost is a boosting-based sampling algorithm that handles class imbalance in class labeled data.

| Watch this File

File Information
Description

This code implements RUSBoost. RUSBoost is an algorithm to handle class imbalance problem in data with discrete class labels. It uses a combination of RUS (random under-sampling) and the standard boosting procedure AdaBoost, to better model the minority class by removing majority class samples. It is very similar to SMOTEBoost, which is another algorithm that combines boosting and
data sampling, but claims to achieves the goal with random under-sampling (RUS)
of majority class examples. This method results in a simpler algorithm with faster model training time.

The current implementation of RUSBoost has been independently done by the author
for the purpose of research. In order to enable the users use a lot of different
weak learners for boosting, an interface is created with Weka API. Currently, four Weka algortihms could be used as weak learner: J48, SMO, IBk, Logistic. It uses 10 boosting iterations and achieves a class imbalance ratio of 35:65 (minority:majority) at each boosting iteration by removing the majority class
samples.

For more detail on the theoretical description of the algorithm please refer to the following paper:
C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse and A. Napolitano, "RUSBoost:
A Hybrid Approach to Alleviating Class Imbalance, IEEE Transaction on Systems,
Man and Cybernetics-Part A: Systems and Human, Vol.40(1), January 2010.

Required Products MATLAB
MATLAB release MATLAB 7.12 (R2011a)
Other requirements JDK 6 or above
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (20)
10 Aug 2014 ben Glampson

HI When running on the file test.m I get the message that boosting is being aborted as "Too many iterations have loss > 0.5" is this expected after 3/4?

Also I wondered after training is there a way to re run the model on new data as it comes in?

Many thanks,

Ben

25 Feb 2013 lu li

Sorry for being inquisitive, but I got another question: how updated weights ‘W’ are associated with resampled dataset 'RESAMPLED'? It seems in your code upadated weights are only associated with the original training data.

25 Feb 2013 alex hsu

thanks a lot^^

25 Feb 2013 lu li

thanks a lot

24 Feb 2013 Barnan Das

Alex, you need to set the Java CLASSPATH environment variable to the downloaded RUSBoost directory.

24 Feb 2013 Barnan Das

Lu Li, the last line of the code is correct. Although, the prediction in this case is '0', I have printed out the probability of the sample being '1' to indicate how poorly it is performing. A value of 1 - Prob(1) would give you the value of Prob(0). I hope it makes sense.

24 Feb 2013 alex hsu

yes,i download the RUSBoost source code again.

24 Feb 2013 lu li

Are you sure you did not make any change to the Test.m file?

24 Feb 2013 alex hsu

Thanks lu li,but still unable to perform.

24 Feb 2013 lu li

I think there is a bug in RUSBoost.m
the last line
prediction(i,:) = [0 wt_one];
should be
prediction(i,:) = [0 wt_zero];

24 Feb 2013 alex hsu

I have interest in RUSBoost,but i cannot run the Test.m. Please help me~THX

Warning: The argument for the %s format
specifier must be of type char (a string).
> In CSVtoARFF at 18
In RUSBoost at 33
In Test at 34
??? Error using ==> javaObject
No class weka.core.Instances can be
located on Java class path

Error in ==> RUSBoost at 35
train = javaObject('weka.core.Instances',
train_reader);

Error in ==> Test at 34
prediction =
RUSBoost(train_data,test_data,'tree');

24 Feb 2013 lu li  
24 Feb 2013 lu li  
24 Feb 2013 lu li  
24 Feb 2013 lu li  
23 Feb 2013 Barnan Das

There is no automated way of generating ARFF file. Although, the WEKA GUI has one such option. If there are hundreds of features of the data, I think they should generated by a piece of code.

23 Feb 2013 lu li

Are there any convenient ways to define the ARFFheader if the specific problem have hundreds of features of same type?

23 Feb 2013 lu li

But is there any way to generate ARFFheard file automatically?

22 Feb 2013 Barnan Das

Please ensure that your data format complies with the ARFF standard. You can feed the appropriate ARFF format in "ARFFheader.txt" file. Please take a look at "README.txt" for more details on how to run a new dataset.

22 Feb 2013 lu li

I think RUSBoost is an excellent idea.
I run the script Test.m, it works fine, but if I use other data, for instance, when I change several lines at the begnining of Test.m

angle=rand(200,1)*2*pi; l=rand(200,1)*40+30; blue=[sin(angle).*l cos(angle).*l];
angle=rand(200,1)*2*pi; l=rand(200,1)*40; red=[sin(angle).*l cos(angle).*l];

datafeatures=[blue;red];
dataclass(1:200)=0; dataclass(201:400)=1;
data=[datafeatures,dataclass];

the following error occur

Error using javaObject
Java exception occurred:
java.io.IOException: nominal value not declared in header, read Token[1.225178e+01], line 11

at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)

at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)

at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)

at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)

at weka.core.Instances.<init>(Instances.java:124)

Error in RUSBoost (line 35)
train = javaObject('weka.core.Instances', train_reader);

Error in Test2 (line 36)
prediction = RUSBoost(train_data,test_data,'tree');

Updates
07 Aug 2012

Bug fixes; screenshot added.

Contact us