File Exchange

image thumbnail

RUSBoost

version 1.1 (5.38 MB) by

RUSBoost is a boosting-based sampling algorithm that handles class imbalance in class labeled data.

19 Downloads

Updated

View License

This code implements RUSBoost. RUSBoost is an algorithm to handle class imbalance problem in data with discrete class labels. It uses a combination of RUS (random under-sampling) and the standard boosting procedure AdaBoost, to better model the minority class by removing majority class samples. It is very similar to SMOTEBoost, which is another algorithm that combines boosting and
data sampling, but claims to achieves the goal with random under-sampling (RUS)
of majority class examples. This method results in a simpler algorithm with faster model training time.

The current implementation of RUSBoost has been independently done by the author
for the purpose of research. In order to enable the users use a lot of different
weak learners for boosting, an interface is created with Weka API. Currently, four Weka algortihms could be used as weak learner: J48, SMO, IBk, Logistic. It uses 10 boosting iterations and achieves a class imbalance ratio of 35:65 (minority:majority) at each boosting iteration by removing the majority class
samples.

For more detail on the theoretical description of the algorithm please refer to the following paper:
C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse and A. Napolitano, "RUSBoost:
A Hybrid Approach to Alleviating Class Imbalance, IEEE Transaction on Systems,
Man and Cybernetics-Part A: Systems and Human, Vol.40(1), January 2010.

Comments and Ratings (31)

albert wang

from this webiste,you can found the new code for changing the .mat,.txt and .csv to the arff format.http://blog.csdn.net/jiandanjinxin/article/details/50886826

albert wang

It is remarkable job! Thanks for your sharing. I have found the code change the format(.txt,.mat,.csv) to the arff format.

jacky chen

thanks first, but it's limited to the data, I use the double data, such as add the following after line 9 in Test.m:

data(:,1:6) = data(:,1:6)+0.1;

then it gives me a error:
Java exception occurred:
java.io.IOException: nominal value not declared in header, read Token[5], line 11

at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)

at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)

at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)

at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)

at weka.core.Instances.<init>(Instances.java:124)

I find the MATLAB has the RUBoost method with the function;fitensemble

honey AE

Is it necessary to install WEKA After downloading the algorithm?

honey AE

honey AE

RUSBOOST are runing well but when i am applying new dataset then it is giving a error eventhough i have done all the think which is mention in ARFFheader.txt file but still showing error.

Index exceeds Java array dimensions

Error in RUSBoost (line 106)
    if train(i,end)==pred(i)

Error in Test (line 34)
prediction = RUSBoost(train_data,test_data,'svm');

sir when i am runing Test.m then following error comming..please sir help me it,s really important.

??? Error using ==> fprintf
Invalid file identifier. Use fopen to generate a valid file identifier.

Error in ==> CSVtoARFF at 11
fprintf(farff, '@relation %s', relation);

Error in ==> RUSBoost at 33
CSVtoARFF (train, 'train', 'train');

Error in ==> Test at 34
prediction = RUSBoost(train_data,test_data,'tree');
 

Gokhan Kirlik

ben Glampson

HI When running on the file test.m I get the message that boosting is being aborted as "Too many iterations have loss > 0.5" is this expected after 3/4?

Also I wondered after training is there a way to re run the model on new data as it comes in?

Many thanks,

Ben

lu li

lu li (view profile)

Sorry for being inquisitive, but I got another question: how updated weights ‘W’ are associated with resampled dataset 'RESAMPLED'? It seems in your code upadated weights are only associated with the original training data.

alex hsu

thanks a lot^^

lu li

lu li (view profile)

thanks a lot

Barnan Das

Barnan Das (view profile)

Alex, you need to set the Java CLASSPATH environment variable to the downloaded RUSBoost directory.

Barnan Das

Barnan Das (view profile)

Lu Li, the last line of the code is correct. Although, the prediction in this case is '0', I have printed out the probability of the sample being '1' to indicate how poorly it is performing. A value of 1 - Prob(1) would give you the value of Prob(0). I hope it makes sense.

alex hsu

yes,i download the RUSBoost source code again.

lu li

lu li (view profile)

Are you sure you did not make any change to the Test.m file?

alex hsu

Thanks lu li,but still unable to perform.

lu li

lu li (view profile)

I think there is a bug in RUSBoost.m
the last line
 prediction(i,:) = [0 wt_one];
should be
 prediction(i,:) = [0 wt_zero];

alex hsu

I have interest in RUSBoost,but i cannot run the Test.m. Please help me~THX

Warning: The argument for the %s format
specifier must be of type char (a string).
> In CSVtoARFF at 18
  In RUSBoost at 33
  In Test at 34
??? Error using ==> javaObject
No class weka.core.Instances can be
located on Java class path

Error in ==> RUSBoost at 35
train = javaObject('weka.core.Instances',
train_reader);

Error in ==> Test at 34
prediction =
RUSBoost(train_data,test_data,'tree');

lu li

lu li (view profile)

lu li

lu li (view profile)

lu li

lu li (view profile)

lu li

lu li (view profile)

Barnan Das

Barnan Das (view profile)

There is no automated way of generating ARFF file. Although, the WEKA GUI has one such option. If there are hundreds of features of the data, I think they should generated by a piece of code.

lu li

lu li (view profile)

Are there any convenient ways to define the ARFFheader if the specific problem have hundreds of features of same type?

lu li

lu li (view profile)

 But is there any way to generate ARFFheard file automatically?

Barnan Das

Barnan Das (view profile)

Please ensure that your data format complies with the ARFF standard. You can feed the appropriate ARFF format in "ARFFheader.txt" file. Please take a look at "README.txt" for more details on how to run a new dataset.

lu li

lu li (view profile)

I think RUSBoost is an excellent idea.
I run the script Test.m, it works fine, but if I use other data, for instance, when I change several lines at the begnining of Test.m

angle=rand(200,1)*2*pi; l=rand(200,1)*40+30; blue=[sin(angle).*l cos(angle).*l];
angle=rand(200,1)*2*pi; l=rand(200,1)*40; red=[sin(angle).*l cos(angle).*l];

datafeatures=[blue;red];
dataclass(1:200)=0; dataclass(201:400)=1;
data=[datafeatures,dataclass];

the following error occur

Error using javaObject
Java exception occurred:
java.io.IOException: nominal value not declared in header, read Token[1.225178e+01], line 11

at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)

at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)

at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)

at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)

at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)

at weka.core.Instances.<init>(Instances.java:124)

Error in RUSBoost (line 35)
train = javaObject('weka.core.Instances', train_reader);

Error in Test2 (line 36)
prediction = RUSBoost(train_data,test_data,'tree');

Updates

1.1

Bug fixes; screenshot added.

MATLAB Release
MATLAB 7.12 (R2011a)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video