File Exchange

image thumbnail

Discretization algorithms: Class-Attribute Contingency Coefficient

version 1.2 (1.8 MB) by

To discrete continuous data, CACC is a promising discretization scheme proposed in 2008

3.33333
4 Ratings

0 Downloads

Updated

View License

     Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster.
     We implement the CACC algorithm is based on paper[1].
     As for the code, one can open "ControlCenter.m" at first, there is a simple example here, along with one yeast database. Explanation is included inside this file too.
    If there is any problem, just let me know, i will help you as soon as possible.

[1]Cheng-Jung Tsai, Chien-I Lee, Wei-Pang Yang: A discretization algorithm based on Class-Attribute Contingency Coefficient. Inf. Sci. 178(3): 714-731 (2008)

Comments and Ratings (21)

Julio Zaragoza

I closed my Karin Zachinelly account. My CACC implementation files are in Julio Zaragoza's account now.

Please, if you find any bugs in my implementation, let me know.

This implementation is not complete and it is actually incorrect, for starters the CACC is not computed. This code obtains y' and takes that as the CACC (which is obviously wrong). And the algorithm is not implemented in full as described in the paper.

That is why people that tries this code with the data from the paper obtain different results.

People please check my (hopefully correct) implementation and let me know about any bugs.

FIR

FIR (view profile)

hi i have used this algorithm for my data set as indicated in my paper , in which the cacc and cut off points does not match according to my paper ,please help..

my dataset is

age=[3 ;56 ;15 ;17 ;21 ;35 ;45 ;46 ;51 ;56 ;57;66 ;70 ;71 ]

FIR

FIR (view profile)

I hava a dataste of 5 columns say the 1st column has numbers 60 numbers randomlf between 1 to 100,now i want to set a cut off point for this dataset ,is it possible with this algorithm,

cut of point means
for ex-0-10
 10-30
 30-50
50-80
80-100

Guangdi Li

Guangdi Li (view profile)

You can use the returned variable: DiscretizationSet, to discrete new continuous data. DiscretizationSet is a matrix containing K row and F columns, each column represent one feature, following your input feature data, then the cutoff is saved in corresponding column for your new discretization.
hope it is clear :)

hi,i hav used this algorithm for classification.Soi have used your algorithm to convert the continous value into discrete and form an classifer.n after generating the classifer user will enter the continuous value to get the output.So i want 2 know that can i convert the user entered continous value to discrete value(based on the previous discrete interval)?

Edek

Edek (view profile)

Hi, i want to know if there is a solution for my discretization problem. The problem is this:
>> A = xlsread('example2.xls');
>> [ DiscretData,DiscretizationSet1 ] = CACC_Discretization( A, 3 )
??? Undefined function or method 'CACC_Discretization' for input arguments of type 'double'.

So... its possible to work when we have arguments of type double?
I can end you my file, in order that you can see my mistake.
Thank you so much, and sorry for your time waste in answer this easy question.

wahyu powh

Hello...
i have question.
I still do not understand the calculate cacc

where at paper tsai calculate cacc
(Tsai, C.J., Lee, C.I., Yang, W.P.,2008, A Discretization Algorithm Based on Class-Attribute Contingency Coefficient, Science Direct)
cacc=(y'/y'+M)^0.5
 y'=M[..... -1]/log(n)

and your code
for p = 1:C
    for q = 1:k
       if RowQuantaMatrix( p ) > 0 && ColumnQuantaMatrix( q ) > 0
          CACCValue = CACCValue + ( QuantaMatrix( p,q ) )^2/( RowQuantaMatrix( p )*ColumnQuantaMatrix( q )) ;
       end
    end
end
CACCValue = M*( CACCValue-1 )/log2(k+1) ;

why this different, not like in paper tsai?

your calculate final cacc= M*(CACCvalue-1) /log2(k+1)
so your final cacc=Y' in paper tsai

and i'm tried with dataset at Paper tsai (table age : 2 attribute- Age&target class)
Im compare result different between your code & paper (value cacc & cutting point)
explanation please...

Thx

zapp

zapp (view profile)

ehm, disregard my previous message. I see the way the class affiliation is coded. I usually worked with one vector multiclass coding. thx

zapp

zapp (view profile)

hi there
can the class variable be other than of binary type?
thx for thx code

Yoann

Yoann (view profile)

Hello,
is this discretization scheme, could work with dataset containing negative and positive values?

thank you for your help. i apologize for such a silly question. thank oyu once again.

Guangdi Li

Guangdi Li (view profile)

For iris dataset, create a matrix like [attribute1,attribute2,attribute3,attribute4,ClassVariable], then use command:

[discrete,discretizationset]= CACC_Discretization(originaldata,1)

In matlab, you can do it like:

 load fisheriris
 N = size( meas,1 );
 originaldata = [ meas,zeros(N,1) ];
 for p = 1:N
     if isequal(species{p},'versicolor')==1
         originaldata(p,5)=1;
     elseif isequal(species{p},'virginica')==1
         originaldata(p,5)=2;
     end
 end

 [discrete,discretizationset]= CACC_Discretization(originaldata,1)
 

hi,
can we apply this algorithm to classification dataset?I hav applied it on iris dataset(containing 4 attribute + 1 class attribute with 3 classes) but it is only converting 2 continuous attributes to discrete while two remains the same.
I have invoke the CACC_Discretization function as
[discrete,discretizationset]= CACC_Discretization(originaldata,3)
here originaldata is my iris dataset and 3 is number of classes in which data is classified.
I think i m not getting the second input variable.
Please help me out.

Adrian__

Thank you very much for your help.
Now is working perfectly.

Adrian__

Hello,

I am most appreciative for your help.
I just hope that the database I sent you does not violate any of the algorithm's requirements.

Should this be the problem , I apologize in advance for wasting your time.

Guangdi Li

Guangdi Li (view profile)

Of course, you are welcome to send me the dataset to check what's the problem.

Adrian__

Hello,

Thank you very much for this code; I found it very useful when working with my dataset.

However , I must confess that I got the same error message Khadil was talking about when I tried to discretize a subset of my original database.

Strange is the fact that for different subsets no error is returned while for others the simulation is stopped by the aforementioned error.

Could you please let me know what is causing the problem?

Should you need a sample of the database I was talking about, I will email it to you as soon as you agree.

Guangdi Li

Guangdi Li (view profile)

what kind of dataset you have? can you give me a simple example for testing?

Khalid

Khalid (view profile)

Hi,

I got this error message when I use it on my dataset:

??? Attempted to access B(0); index must be a positive integer or logical.

Error in ==> CACC_Discretization at 81
             D( k ) = B( Local );

?!

Updates

1.2

improve the code

1.1

Improve it

MATLAB Release
MATLAB 7.6 (R2008a)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video