3.33333
3.3 | 4 ratings Rate this file 11 Downloads (last 30 days) File Size: 1.8 MB File ID: #24343

Discretization algorithms: Class-Attribute Contingency Coefficient

by

Guangdi Li (view profile)

 

04 Jun 2009 (Updated )

To discrete continuous data, CACC is a promising discretization scheme proposed in 2008

| Watch this File

File Information
Description

     Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster.
     We implement the CACC algorithm is based on paper[1].
     As for the code, one can open "ControlCenter.m" at first, there is a simple example here, along with one yeast database. Explanation is included inside this file too.
    If there is any problem, just let me know, i will help you as soon as possible.

[1]Cheng-Jung Tsai, Chien-I Lee, Wei-Pang Yang: A discretization algorithm based on Class-Attribute Contingency Coefficient. Inf. Sci. 178(3): 714-731 (2008)

MATLAB release MATLAB 7.6 (R2008a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (21)
15 May 2013 Julio Zaragoza

Julio Zaragoza (view profile)

I closed my Karin Zachinelly account. My CACC implementation files are in Julio Zaragoza's account now.

Please, if you find any bugs in my implementation, let me know.

Comment only
20 Feb 2013 karin Zachinelly

This implementation is not complete and it is actually incorrect, for starters the CACC is not computed. This code obtains y' and takes that as the CACC (which is obviously wrong). And the algorithm is not implemented in full as described in the paper.

That is why people that tries this code with the data from the paper obtain different results.

People please check my (hopefully correct) implementation and let me know about any bugs.

20 Feb 2013 karin Zachinelly  
05 Jan 2012 FIR

FIR (view profile)

hi i have used this algorithm for my data set as indicated in my paper , in which the cacc and cut off points does not match according to my paper ,please help..

my dataset is

age=[3 ;56 ;15 ;17 ;21 ;35 ;45 ;46 ;51 ;56 ;57;66 ;70 ;71 ]

Comment only
20 Dec 2011 FIR

FIR (view profile)

I hava a dataste of 5 columns say the 1st column has numbers 60 numbers randomlf between 1 to 100,now i want to set a cut off point for this dataset ,is it possible with this algorithm,

cut of point means
for ex-0-10
10-30
30-50
50-80
80-100

Comment only
05 Sep 2011 Guangdi Li

Guangdi Li (view profile)

You can use the returned variable: DiscretizationSet, to discrete new continuous data. DiscretizationSet is a matrix containing K row and F columns, each column represent one feature, following your input feature data, then the cutoff is saved in corresponding column for your new discretization.
hope it is clear :)

Comment only
03 Sep 2011 Prachitee Shekhawat

hi,i hav used this algorithm for classification.Soi have used your algorithm to convert the continous value into discrete and form an classifer.n after generating the classifer user will enter the continuous value to get the output.So i want 2 know that can i convert the user entered continous value to discrete value(based on the previous discrete interval)?

Comment only
07 Apr 2011 Edek

Edek (view profile)

Hi, i want to know if there is a solution for my discretization problem. The problem is this:
>> A = xlsread('example2.xls');
>> [ DiscretData,DiscretizationSet1 ] = CACC_Discretization( A, 3 )
??? Undefined function or method 'CACC_Discretization' for input arguments of type 'double'.

So... its possible to work when we have arguments of type double?
I can end you my file, in order that you can see my mistake.
Thank you so much, and sorry for your time waste in answer this easy question.

Comment only
21 Mar 2011 wahyu powh

Hello...
i have question.
I still do not understand the calculate cacc

where at paper tsai calculate cacc
(Tsai, C.J., Lee, C.I., Yang, W.P.,2008, A Discretization Algorithm Based on Class-Attribute Contingency Coefficient, Science Direct)
cacc=(y'/y'+M)^0.5
y'=M[..... -1]/log(n)

and your code
for p = 1:C
for q = 1:k
if RowQuantaMatrix( p ) > 0 && ColumnQuantaMatrix( q ) > 0
CACCValue = CACCValue + ( QuantaMatrix( p,q ) )^2/( RowQuantaMatrix( p )*ColumnQuantaMatrix( q )) ;
end
end
end
CACCValue = M*( CACCValue-1 )/log2(k+1) ;

why this different, not like in paper tsai?

your calculate final cacc= M*(CACCvalue-1) /log2(k+1)
so your final cacc=Y' in paper tsai

and i'm tried with dataset at Paper tsai (table age : 2 attribute- Age&target class)
Im compare result different between your code & paper (value cacc & cutting point)
explanation please...

Thx

Comment only
20 Mar 2011 zapp

zapp (view profile)

ehm, disregard my previous message. I see the way the class affiliation is coded. I usually worked with one vector multiclass coding. thx

20 Mar 2011 zapp

zapp (view profile)

hi there
can the class variable be other than of binary type?
thx for thx code

Comment only
15 Mar 2011 Yoann

Yoann (view profile)

Hello,
is this discretization scheme, could work with dataset containing negative and positive values?

Comment only
27 Feb 2011 Prachitee Shekhawat

thank you for your help. i apologize for such a silly question. thank oyu once again.

Comment only
17 Feb 2011 Guangdi Li

Guangdi Li (view profile)

For iris dataset, create a matrix like [attribute1,attribute2,attribute3,attribute4,ClassVariable], then use command:

[discrete,discretizationset]= CACC_Discretization(originaldata,1)

In matlab, you can do it like:

load fisheriris
N = size( meas,1 );
originaldata = [ meas,zeros(N,1) ];
for p = 1:N
if isequal(species{p},'versicolor')==1
originaldata(p,5)=1;
elseif isequal(species{p},'virginica')==1
originaldata(p,5)=2;
end
end

[discrete,discretizationset]= CACC_Discretization(originaldata,1)

Comment only
17 Feb 2011 Prachitee Shekhawat

hi,
can we apply this algorithm to classification dataset?I hav applied it on iris dataset(containing 4 attribute + 1 class attribute with 3 classes) but it is only converting 2 continuous attributes to discrete while two remains the same.
I have invoke the CACC_Discretization function as
[discrete,discretizationset]= CACC_Discretization(originaldata,3)
here originaldata is my iris dataset and 3 is number of classes in which data is classified.
I think i m not getting the second input variable.
Please help me out.

Comment only
01 Feb 2011 Adrian__

Thank you very much for your help.
Now is working perfectly.

31 Jan 2011 Adrian__

Hello,

I am most appreciative for your help.
I just hope that the database I sent you does not violate any of the algorithm's requirements.

Should this be the problem , I apologize in advance for wasting your time.

Comment only
28 Jan 2011 Guangdi Li

Guangdi Li (view profile)

Of course, you are welcome to send me the dataset to check what's the problem.

Comment only
27 Jan 2011 Adrian__

Hello,

Thank you very much for this code; I found it very useful when working with my dataset.

However , I must confess that I got the same error message Khadil was talking about when I tried to discretize a subset of my original database.

Strange is the fact that for different subsets no error is returned while for others the simulation is stopped by the aforementioned error.

Could you please let me know what is causing the problem?

Should you need a sample of the database I was talking about, I will email it to you as soon as you agree.

Comment only
01 Apr 2010 Guangdi Li

Guangdi Li (view profile)

what kind of dataset you have? can you give me a simple example for testing?

Comment only
31 Mar 2010 Khalid

Khalid (view profile)

Hi,

I got this error message when I use it on my dataset:

??? Attempted to access B(0); index must be a positive integer or logical.

Error in ==> CACC_Discretization at 81
D( k ) = B( Local );

?!

Comment only
Updates
04 Jul 2009

Improve it

31 Jan 2011

improve the code

Contact us