Code covered by the BSD License

### Highlights from Discretization methods: Class-Attribute Contingency Coefficient (CACC - MATLAB)

5.0

5.0 | 2 ratings Rate this file 14 Downloads (last 30 days) File Size: 6.27 KB File ID: #41740

# Discretization methods: Class-Attribute Contingency Coefficient (CACC - MATLAB)

14 May 2013 (Updated )

Correct Implementation of the CACC Discretization Method.http://cs.adelaide.edu.au/~jzaragoza

File Information
Description

This is the correct MATLAB implementation of the discretization method appearing in the paper "A Discretization Algorithm Based on Class-Attribute Contingency Coefficient" by Tsai et al., 2008.

If you tried some other implementations and you don't receive the same results reported in the paper, it is because those implementations are WRONG and in some cases INCOMPLETE.

I tested my code with the data provided in the paper and all of my discretization ranges, CACC values and discretized data are the same as in the paper.

The file 'main.m' contains an example which uses the CACC function for discretizing some data used in the paper.

If you find any bugs in my code please report them so that I can fix them.

Bug #1 squashed! Thanks to Rahul for his comments about Line #156

Required Products MATLAB
MATLAB release MATLAB 7.14 (R2012a)
31 Dec 2013

That is what I thought, also it seems like your version is O(M^2) where M is the distinct values as you have nested loops when you are adding the inner boundaries. I'm not sure how the paper is achieving O(m log m).

31 Dec 2013

You need to develop the C/C++ version of the code, otherwise it will take long time

30 Dec 2013

This works great on smaller datasets, but have you tried on larger datasets, I'm trying to discretize Gene Expression data, which has 1.5 million samples and 20000 unique classes.

30 Dec 2013

Yeah it is -1 (n = number of cutting points - 1).
Thanks a lot for your comment, Rahul.

30 Dec 2013

Not sure, but in line 181:
yprime = M*(y-1)/log(length(discscheme));

should it not be
yprime = M*(y-1)/log(length(discscheme)-1);

as you want number of intervals ?

28 Jun 2013
15 May 2013

Improved description

16 May 2013