3.33333

3.3 | 4 ratings Rate this file 15 Downloads (last 30 days) File Size: 1.8 MB File ID: #24343

Discretization algorithms: Class-Attribute Contingency Coefficient

by

 

04 Jun 2009 (Updated )

To discrete continuous data, CACC is a promising discretization scheme proposed in 2008

| Watch this File

File Information
Description

     Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster.
     We implement the CACC algorithm is based on paper[1].
     As for the code, one can open "ControlCenter.m" at first, there is a simple example here, along with one yeast database. Explanation is included inside this file too.
    If there is any problem, just let me know, i will help you as soon as possible.

[1]Cheng-Jung Tsai, Chien-I Lee, Wei-Pang Yang: A discretization algorithm based on Class-Attribute Contingency Coefficient. Inf. Sci. 178(3): 714-731 (2008)

MATLAB release MATLAB 7.6 (R2008a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (21)
15 May 2013 Julio Zaragoza

I closed my Karin Zachinelly account. My CACC implementation files are in Julio Zaragoza's account now.

Please, if you find any bugs in my implementation, let me know.

20 Feb 2013 karin Zachinelly

This implementation is not complete and it is actually incorrect, for starters the CACC is not computed. This code obtains y' and takes that as the CACC (which is obviously wrong). And the algorithm is not implemented in full as described in the paper.

That is why people that tries this code with the data from the paper obtain different results.

People please check my (hopefully correct) implementation and let me know about any bugs.

20 Feb 2013 karin Zachinelly  
05 Jan 2012 FIR

hi i have used this algorithm for my data set as indicated in my paper , in which the cacc and cut off points does not match according to my paper ,please help..

my dataset is

age=[3 ;56 ;15 ;17 ;21 ;35 ;45 ;46 ;51 ;56 ;57;66 ;70 ;71 ]

20 Dec 2011 FIR

I hava a dataste of 5 columns say the 1st column has numbers 60 numbers randomlf between 1 to 100,now i want to set a cut off point for this dataset ,is it possible with this algorithm,

cut of point means
for ex-0-10
10-30
30-50
50-80
80-100

05 Sep 2011 Guangdi Li

You can use the returned variable: DiscretizationSet, to discrete new continuous data. DiscretizationSet is a matrix containing K row and F columns, each column represent one feature, following your input feature data, then the cutoff is saved in corresponding column for your new discretization.
hope it is clear :)

03 Sep 2011 Prachitee Shekhawat

hi,i hav used this algorithm for classification.Soi have used your algorithm to convert the continous value into discrete and form an classifer.n after generating the classifer user will enter the continuous value to get the output.So i want 2 know that can i convert the user entered continous value to discrete value(based on the previous discrete interval)?

07 Apr 2011 Edek

Hi, i want to know if there is a solution for my discretization problem. The problem is this:
>> A = xlsread('example2.xls');
>> [ DiscretData,DiscretizationSet1 ] = CACC_Discretization( A, 3 )
??? Undefined function or method 'CACC_Discretization' for input arguments of type 'double'.

So... its possible to work when we have arguments of type double?
I can end you my file, in order that you can see my mistake.
Thank you so much, and sorry for your time waste in answer this easy question.

21 Mar 2011 wahyu powh

Hello...
i have question.
I still do not understand the calculate cacc

where at paper tsai calculate cacc
(Tsai, C.J., Lee, C.I., Yang, W.P.,2008, A Discretization Algorithm Based on Class-Attribute Contingency Coefficient, Science Direct)
cacc=(y'/y'+M)^0.5
y'=M[..... -1]/log(n)

and your code
for p = 1:C
for q = 1:k
if RowQuantaMatrix( p ) > 0 && ColumnQuantaMatrix( q ) > 0
CACCValue = CACCValue + ( QuantaMatrix( p,q ) )^2/( RowQuantaMatrix( p )*ColumnQuantaMatrix( q )) ;
end
end
end
CACCValue = M*( CACCValue-1 )/log2(k+1) ;

why this different, not like in paper tsai?

your calculate final cacc= M*(CACCvalue-1) /log2(k+1)
so your final cacc=Y' in paper tsai

and i'm tried with dataset at Paper tsai (table age : 2 attribute- Age&target class)
Im compare result different between your code & paper (value cacc & cutting point)
explanation please...

Thx

20 Mar 2011 zapp

ehm, disregard my previous message. I see the way the class affiliation is coded. I usually worked with one vector multiclass coding. thx

20 Mar 2011 zapp

hi there
can the class variable be other than of binary type?
thx for thx code

15 Mar 2011 Yoann

Hello,
is this discretization scheme, could work with dataset containing negative and positive values?

27 Feb 2011 Prachitee Shekhawat

thank you for your help. i apologize for such a silly question. thank oyu once again.

17 Feb 2011 Guangdi Li

For iris dataset, create a matrix like [attribute1,attribute2,attribute3,attribute4,ClassVariable], then use command:

[discrete,discretizationset]= CACC_Discretization(originaldata,1)

In matlab, you can do it like:

load fisheriris
N = size( meas,1 );
originaldata = [ meas,zeros(N,1) ];
for p = 1:N
if isequal(species{p},'versicolor')==1
originaldata(p,5)=1;
elseif isequal(species{p},'virginica')==1
originaldata(p,5)=2;
end
end

[discrete,discretizationset]= CACC_Discretization(originaldata,1)

17 Feb 2011 Prachitee Shekhawat

hi,
can we apply this algorithm to classification dataset?I hav applied it on iris dataset(containing 4 attribute + 1 class attribute with 3 classes) but it is only converting 2 continuous attributes to discrete while two remains the same.
I have invoke the CACC_Discretization function as
[discrete,discretizationset]= CACC_Discretization(originaldata,3)
here originaldata is my iris dataset and 3 is number of classes in which data is classified.
I think i m not getting the second input variable.
Please help me out.

01 Feb 2011 Adrian__

Thank you very much for your help.
Now is working perfectly.

31 Jan 2011 Adrian__

Hello,

I am most appreciative for your help.
I just hope that the database I sent you does not violate any of the algorithm's requirements.

Should this be the problem , I apologize in advance for wasting your time.

28 Jan 2011 Guangdi Li

Of course, you are welcome to send me the dataset to check what's the problem.

27 Jan 2011 Adrian__

Hello,

Thank you very much for this code; I found it very useful when working with my dataset.

However , I must confess that I got the same error message Khadil was talking about when I tried to discretize a subset of my original database.

Strange is the fact that for different subsets no error is returned while for others the simulation is stopped by the aforementioned error.

Could you please let me know what is causing the problem?

Should you need a sample of the database I was talking about, I will email it to you as soon as you agree.

01 Apr 2010 Guangdi Li

what kind of dataset you have? can you give me a simple example for testing?

31 Mar 2010 Khalid

Hi,

I got this error message when I use it on my dataset:

??? Attempted to access B(0); index must be a positive integer or logical.

Error in ==> CACC_Discretization at 81
D( k ) = B( Local );

?!

Updates
04 Jul 2009

Improve it

31 Jan 2011

improve the code

Contact us