Code covered by the BSD License  

Highlights from
Decision Trees and Predictive Models with cross-validation and ROC analysis plot

Be the first to rate this file! 76 Downloads (last 30 days) File Size: 973.51 KB File ID: #26326
image thumbnail

Decision Trees and Predictive Models with cross-validation and ROC analysis plot

by Andrea Padoan

 

09 Jan 2010 (Updated 08 Nov 2010)

This code implements a classification tree and plots the ROC curves for each target class

| Watch this File

File Information
Description

Decision tree learning is a common method used in data mining. Most of the commercial packages offer complex Tree classification algorithms, but they are very much expensive.
This matlab code uses ‘classregtree' function that implement GINI algorithm to determine the best split for each node.
The main function of this code is named Tree. It imports data directly from an excel or csv file, using the first row as variable names (necessary). The first column is the outcome group and It must be numeric.
To start the classification tree type in Matlab workspace: Tree(‘filename.xls’) or Tree(‘filename.csv’) (be careful that your excel file contains a first row with variable names and the outcome group in the first column).
It can also import directly from matlab file (.mat extention). Please create a file with this 3 variables: X (matrix of covariate values), y (outcome values), textdata (cell structure contains the text name of outcome and covariates). If you want an example please type: [X, y, textdata] = ExcelImport (‘example.xls’) or [X, y, textdata] = ExcelImport (‘yourfile.xls’) and watch the output.
There are two important issues:
1) outcome classes must be numeric, with value from 0 to n.
2) outcome classes must’n contain NaN (the code will exit in this circumstance).
At this point a first GUI helps you to select variables to include in the analysis, so you don’t need to modify your original datafile. It continues with a second GUI that asks for categorical variables: select one or more if necessary.
Then the Tree function:
1) Calculates the features relative importance.
2) Draws classification tree.
3) Performs a cross validation in order to obtain the best pruning position.
4) Draws the cost for pruning.
5) Plots ROC curves for each target classes (output classes) and display AUC
There are some important notes:
1) Please pay attention when you save your datafile. The Excel import function of Matlab doesn’t recognize well all excel file type. In MAC OS 10.6.2 with Matlab 2009a, for example, you must save it with Excel 95 compatibility.
2) Sometimes the Excel import function does mistakes. In this case watch your file for ‘number typed as string’ or blank columns on the right. In this case I advice you to select the outcome and covariates to analyze with the mouse and copy it into a new file (with Ms Excel copy and paste) and use that one.
3) Handle with care datafile with missing values. The Matlab classregtree function doesn’t use surrogates splits.
4) This code runs only with Matlab 2009a (or 2009b). The previous version support classification tree but the functions are quite different.

An example file (example.xls) is included in zip. In matlab type : Tree(‘example.xls’) to start.

Please send me your opinion.

Required Products Statistics Toolbox
MATLAB release MATLAB 7.8 (R2009a)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (2)
22 Oct 2010 Walaa Gouda

There is a problem as i have 28 features but your program make classification only according to feature 1 only i don't know why

03 Jun 2012 Waleed Ahmed

i wont be able to run it can you tell me the issue ?

Please login to add a comment or rating.
Updates
12 Jan 2010

I've changed some comments and screenshot.

07 Nov 2010

Added Import directly from matlab file
Added check for NaN in outcome variable
Added check for consistency in outcome variable
Added some comments in files.

08 Nov 2010

Added Import from Matlab File.
Added some comments in file.
Added consistency check for outcome variable.

Tag Activity for this File
Tag Applied By Date/Time
data mining Andrea Padoan 11 Jan 2010 10:17:14
statistics Andrea Padoan 11 Jan 2010 10:17:14
decision tree Andrea Padoan 11 Jan 2010 10:17:14
classification tree Andrea Padoan 11 Jan 2010 10:17:14
gini algorithm Andrea Padoan 11 Jan 2010 10:17:14
crossvalidation Andrea Padoan 11 Jan 2010 10:17:14
statistic Andrea Padoan 11 Jan 2010 10:17:14
roc Andrea Padoan 11 Jan 2010 10:17:14
classification tree Sandip 10 Mar 2010 11:04:00
statistics Sandip 10 Mar 2010 11:04:03
crossvalidation Sandip 10 Mar 2010 11:04:05
classification tree Demetrios Eliades 22 Mar 2010 06:12:41
tree induction Will Dwinnell 23 Mar 2010 11:05:37
auc Will Dwinnell 23 Mar 2010 11:05:39
machine learning Will Dwinnell 23 Mar 2010 11:05:53
pattern recognition Will Dwinnell 23 Mar 2010 11:06:01
decision tree Mike Liu 21 May 2010 01:27:37
classification Mike Liu 21 May 2010 01:28:21
classification Jeevanand S 22 May 2011 17:35:36

Contact us at files@mathworks.com