Code covered by the BSD License  

Highlights from
Decision Trees and Predictive Models with cross-validation and ROC analysis plot

3.0

3.0 | 1 rating Rate this file 37 Downloads (last 30 days) File Size: 966 KB File ID: #26326
image thumbnail

Decision Trees and Predictive Models with cross-validation and ROC analysis plot

by

 

09 Jan 2010 (Updated )

This code implements a classification tree and plots the ROC curves for each target class

| Watch this File

File Information
Description

Decision tree learning is a common method used in data mining. Most of the commercial packages offer complex Tree classification algorithms, but they are very much expensive.
This matlab code uses ‘classregtree' function that implement GINI algorithm to determine the best split for each node (CART).
The main function of this code is named Tree. It imports data directly from an excel or csv file, using the first row as variable names (necessary). The first column is the outcome group and It must be numeric.
To start the classification tree type in Matlab workspace: Tree(‘filename.xls’) or Tree(‘filename.csv’) (be careful that your excel file contains a first row with variable names and the outcome group in the first column).
It can also import directly from matlab file (.mat extention). Please create a file with this 3 variables: X (matrix of covariate values), y (outcome values), textdata (cell structure contains the text name of outcome and covariates). If you want an example please type: [X, y, textdata] = ExcelImport (‘example.xls’) or [X, y, textdata] = ExcelImport (‘yourfile.xls’) and watch the output.
There are two important issues:
1) outcome classes must be numeric, with value from 0 to n.
2) outcome classes must’n contain NaN (the code will exit in this circumstance).
At this point a first GUI helps you to select variables to include in the analysis, so you don’t need to modify your original datafile. It continues with a second GUI that asks for categorical variables: select one or more if necessary.
Then the Tree function:
1) Calculates the features relative importance.
2) Draws classification tree.
3) Performs a cross validation in order to obtain the best pruning position.
4) Draws the cost for pruning.
5) Plots ROC curves for each target classes (output classes) and display AUC
6) Estimates the classification rate (accuracy) with the 10-fold crossvalidation and with the leave one out crossvalidation.

There are some important notes:
1) Please pay attention when you save your datafile. The Excel import function of Matlab doesn’t recognize well all excel file type. In MAC OS 10.6.2 with Matlab 2009a, for example, you must save it with Excel 95 compatibility.
2) Sometimes the Excel import function does mistakes. In this case watch your file for ‘number typed as string’ or blank columns on the right. In this case I advice you to select the outcome and covariates to analyze with the mouse and copy it into a new file (with Ms Excel copy and paste) and use that one.
3) Handle with care datafile with missing values. The Matlab classregtree function doesn’t use surrogates splits.
4) This code runs only with Matlab 2009a (or 2009b). The previous version support classification tree but the functions are quite different.

An example file (example.xls) is included in zip. In matlab type : Tree(‘example.xls’) to start.

Please send me your opinion.

Required Products Statistics Toolbox
MATLAB release MATLAB 7.8 (R2009a)
Other requirements For reasons that I haven't completely understood, these routines don't work with some versions of Matlab.
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (3)
19 Apr 2014 Jung Hoon

UMM

03 Jun 2012 Waleed Ahmed

i wont be able to run it can you tell me the issue ?

22 Oct 2010 Walaa Gouda

There is a problem as i have 28 features but your program make classification only according to feature 1 only i don't know why

Updates
12 Jan 2010

I've changed some comments and screenshot.

07 Nov 2010

Added Import directly from matlab file
Added check for NaN in outcome variable
Added check for consistency in outcome variable
Added some comments in files.

08 Nov 2010

Added Import from Matlab File.
Added some comments in file.
Added consistency check for outcome variable.

09 Aug 2012

Now, the overall classification rate is estimates by:
1) 10 fold crossvalidation
2) Leave one out crossvalidation (LOOCV)
that give more realistic results.

Contact us