Code covered by the BSD License  

Highlights from
ROC curve

4.0

4.0 | 19 ratings Rate this file 219 Downloads (last 30 days) File Size: 4.35 KB File ID: #19950
image thumbnail

ROC curve

by Giuseppe Cardillo

 

16 May 2008 (Updated 16 Nov 2011)

compute a ROC curve

| Watch this File

File Information
Description

ROC - Receiver Operating Characteristics.
The ROC graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making.
The function computes and plots the classical ROC curve and the mirrored ROC curve (in my opinion, it is more useful).

ROC requires another function of mine: partest. If it is not present on the computer, ROC will download it from FEX

You can find more information on:
http://www.advancedmcode.org/roc-curve.html

My profile on XING http://www.xing.com/go/invita/13675097
My profile on LinkedIN http://it.linkedin.com/in/giuseppecardillo

MATLAB release MATLAB 7.6 (R2008a)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (37)
04 Jun 2008 Brahim HAMADICHAREF

Good stuff.

24 Jun 2008 Truc Phan  
13 Jul 2008 Diego García Bascuñán

Good function

11 Aug 2008 Günther Eibl

easy to use!

20 Nov 2008 Pawel  
25 Jun 2009 Phong Vo

Thank you very much!

16 Jul 2009 cabrego

Nice function, but I think it may have a bug, I am getting different results for significantly overlaping distributions when I compare to medcalc, and online calculators.

email me for more details: cpabrego@gmail.com

24 Jul 2009 Michael

Agree with cabrego, this algorithm does not work correctly. Depending on the input data, it generates ROC curves with specificity and sensitivity backward. I believe this is because elements that fall below a cutoff value (I in the code) are called "true positives" when they should be "false positives". The convention is that higher values of a test are abnormal (positive).

I confirmed that other software (online ROC calculator, ROCR in R, STATA) does not behave this way with the same input data and all others produce correct results.

Use at your own risk.

29 Jul 2009 cabrego

I tested the new release and it is agreeing with other codes now. Michael, you may also wish to verify that the new version is working correctly.

I also think adding the cut off points as an additional x or y axis would be very useful to understand the trade off between sensitivity and sensibility.

27 Oct 2009 Jens Kaftan

Hi Giuseppe.
I have had a look at the new release today and I think it is still not perfectly correct. I have validated the scripts using the example data of Hanley and McNeil's 1982 paper: "The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve", which seems to be the basis for the calculations (such as the approximation of Q_1 and Q_2) anyways. To my opinion the problem is that when integrating over the ROC curve to compute the AUC, the data point (sensitivity=1, specificity=0) is not considered when using the trapezoidal rule. Consequently the AUC value (and all AUC dependent measures) differ slightly from the example in the mentioned article (which becomes more severe for non-continous tests with only a few cut-off points).

Best,
Jens.

08 Apr 2010 Neel  
08 Apr 2010 Neel

Giuseppe, I find your code useful. I was keen to calculate Equal Error Rate based on your code. Do you have any suggestions as to how I can do this easily. This will be appreciated.

29 Jul 2010 Segun Oshin  
29 Jul 2010 Segun Oshin

Hi, the code is very good. However, I encounter an error where the cut-off point is set, on line 186,

??? Attempted to access labels(7397); index out of bounds because numel(labels)=7396.

Error in ==> roc at 186
        co=labels(J); %Set the cut-off point

Is there a way to fix this?
Kind regards!

03 Nov 2010 Benjamin

Giuseppe,

First off, great code, really. I was wondering if you used a specific citable method to calculate the standard error for the AUC, which is then used for the CI?

Second, and more trivial, have you thought about implementing this as a GUI or stand alone? My next side project is to make one for my boss to use (without Matlab).

Thanks again for the great code.

11 Nov 2010 Benjamin

I think there may be an issue within the code, but I could be wrong. When you create xroc and yroc using
xroc=flipud([1; 1-a(:,2); 0]) , the additional two rows are not also added to labels. For instance, in your example data, this yields 72 paired points for the ROC curve (# rows in xroc or yroc) but only 70 thresholds (# rows in labels). This causes issues when reporting the threshold value since this is determined by a row reference back to labels (in the example, the threshold by your math should be 151 not 150 (using 'labels' and 'a').

If this doesn't make sense, or I am wrong, please let me know. Its really not a big deal with large datasets with many points on the curve, but becomes an issue with smaller sets where points are farther apart.

11 Nov 2010 Benjamin

Also, when hbar>ubar, I think values in standard error calculations should be changed. Otherwise, you can get to different standard error values from the same area under the curve depending on whether healthy average is higher than disease average.

Sorry to keep bugging you here, but this is the best way I can see to make suggestions. As you can tell, I have been digging into this lately.

12 Nov 2010 Giuseppe Cardillo

I'll try to answer the questions by Benjamin.
1) The SE of the area is calculated using this equation from Hanley JA, McNeil BJ. Radiology 1982 143 29-36.
2) I haven't project to implement a GUI. Anyway this function is under GPL license, so you can modify and redistribute it without any problems but correct citations.
3) I took in account that there are 2 more points in xroc and yroc arrays than labels array. If you look deeper in the code (line 138):
table=[labels'; yroc(2:end-1)'; 1-xroc(2:end-1)';]';
As you can see, the displayed xroc and yroc points go from 2 to end-1 (and so the points 0,0 and 1,1 are excluded). Anyway, using the demo dataset the cut-off point is 152 (that is the closest to green line)...
4) The standard error of the area is a function of the area and points used to draw the ROC curve: if you have two ROC curves, the first with 10 points and the second with 100 points the first will have a greater SE than the second. hbar and ubar are used to correctly compute the false and true positives and negatives. Their values don't influence the SE computation.

12 Nov 2010 Benjamin

Thanks for answering all of my questions, I really do appreciate it. I still have an issue with your answer for number 3.

First, I was using an inverted data set when I stated the answer should be 151 not 150 (previous post). Second, using the download available on this page right now, running roc(x) gives a cutoff of 153. As you state, the correct answer is in fact 152. Therefore, I am not sure if you changed something and didn't update, since the cutoff value is still using the row of the minimum distance from xroc,yroc and grabbing that rows value from 'labels', hence the wrong answer of 153. (lines 184-186)

To confirm this, run the download from here and see what cutoff you get. Maybe it is something on my end?

As for the standard error calculation (#4), I was playing around and found that if I inverted the 1's and 0's before running, I would get a different Serror for the AUC, which I assumed should be the same regardless of whether they were inverted. The Serror of the sample data is 0.02713, and if I invert the observations, it becomes 0.0364. This is probably trivial.

Thanks again for your excellent responses.

12 Nov 2010 Giuseppe Cardillo

Perhaps you are right: I uploaded the file at July to fix the bug by Segun Oshin; it is clear that somewhat in the upload went wrong. I have just reupload the file.
If you are using my roc dataset, you will see that 0's and 1's are not in the same proportion. If you invert 0's and 1's the curve is slight different and so the SE is quite different.

12 Nov 2010 Benjamin

Thanks, that fixed it and I now understand the difference with SE.

12 Mar 2011 Reza

hey there, nice program!

However, you use each element of the data as a threshold and you calculate the fpr tpr etc. This means a very very long time and many many points on the curve for a large vector (which makes your program useless). To avoid this, I suggest you let the user to choose the number of thresholds.

15 Mar 2011 Giuseppe Cardillo

Dear Jay, thank you for your comment. I don't agree so much with you. If you look at the code:
1) all vectors are preallocated;
2) True and false positive and negative are computed using logical indexing.
So the computations are very fast.
Anyway, I introduced your suggest and now you can choose if you want to use all or 3<=N<all unique values as thresholds. I have just uploaded the file.

15 Mar 2011 Reza

Thanks for the update.
P.S. I work with images of 2000x2000 pixels, so... ;)

15 Mar 2011 Benjamin

I was also going to suggest adding a varagin to delineate a step-size, ex:
%add a varagin, in this case I am calling it step which can describe the distance between thresholds to be calculated
if(nargin<2 )
z=sortrows(x,1);
%find unique values in z
step=unique(z(:,1));
elseif length(step)==1 % the fixed step size is being requested
    step=[min(pred):step:max(pred)]
end
% later in guiseppe code just do labels=step

Also, Guiseppe, I implemented your standard error and pythagoras into my code which generated data that will probably used in an upcoming paper. Do you mind being acknowledged or are there any actual articles to cite? Your call. And lastly, I have a GUI that is pretty beta, but works.

15 Mar 2011 Giuseppe Cardillo

Dear Benjamin, I think that Pythagora don't care if you acknowledge him or not :-).
For the standard error I used an equation described in: Hanley JA, McNeil BJ. Radiology 1982 143 29-36. The meaning and use of the area under the Receiver Operating Characteristic (ROC) curve.

Please cite me only if you use all my function: if you took pieces of code, you can decide to cite me or not.

Lastly, I prefer to use quantile and not a fixed size step because real data usually are not equally spaced.

08 Jun 2011 Ali Ali

Thanks a lot

12 Nov 2011 Fariba Yousefi

Hey guys, I couldn't run the program, any help plz?

13 Nov 2011 Giuseppe Cardillo

maybe giving us some informations we could try to help you...

14 Nov 2011 Fariba Yousefi

Data is on rocdata.mat, true?
so I write load rocdata.mat but I should put it into x, how can I do that?

15 Nov 2011 Giuseppe Cardillo

typing 'load rocdata' you already have your matrix x into the workspace. if you will type x you will see all the data of the matrix. now type roc(x)

16 Nov 2011 Fariba Yousefi

This error happens,
load rocdata
roc(x)

??? Error using ==> cell
Size vector must be a row vector with real elements.

Error in ==> roc at 44
args=cell(varargin);

16 Nov 2011 Giuseppe Cardillo

this error shows that something doesnt work on rocdata and so you have not x in your workspace. Now I have changed and uploaded a new version of roc so, if you call roc without arguments, it will run the demo by itself. If you dont want to wait for the FEX updating simply change in the code the default.value in this way

default.values = {[165 1;140 1;154 1;139 1;134 1;154 1;120 1;133 1;150 1;...

146 1;140 1;114 1;128 1;131 1;116 1;128 1;122 1;129 1;145 1;117 1;140 1;...

149 1;116 1;147 1;125 1;149 1;129 1;157 1;144 1;123 1;107 1;129 1;152 1;...

164 1;134 1;120 1;148 1;151 1;149 1;138 1;159 1;169 1;137 1;151 1;141 1;...

145 1;135 1;135 1;153 1;125 1;159 1;148 1;142 1;130 1;111 1;140 1;136 1;...

142 1;139 1;137 1;187 1;154 1;151 1;149 1;148 1;157 1;159 1;143 1;124 1;...

141 1;114 1;136 1;110 1;129 1;145 1;132 1;125 1;149 1;146 1;138 1;151 1;...

147 1;154 1;147 1;158 1;156 1;156 1;128 1;151 1;138 1;193 1;131 1;127 1;...

129 1;120 1;159 1;147 1;159 1;156 1;143 1;149 1;160 1;126 1;136 1;150 1;...

136 1;151 1;140 1;145 1;140 1;134 1;140 1;138 1;144 1;140 1;140 1;159 0;...

136 0;149 0;156 0;191 0;169 0;194 0;182 0;163 0;152 0;145 0;176 0;122 0;...

141 0;172 0;162 0;165 0;184 0;239 0;178 0;178 0;164 0;185 0;154 0;164 0;...

140 0;207 0;214 0;165 0;183 0;218 0;142 0;161 0;168 0;181 0;162 0;166 0;...

150 0;205 0;163 0;166 0;176 0;],0,0.05,1};

16 Nov 2011 Fariba Yousefi

I appreciate your help, it worked :)

16 Nov 2011 Fariba Yousefi  
04 Dec 2011 K

Hi Mr.Cardillo, I'd like to run the sample command "roc" but the error appears:

>> roc
??? Error: File: roc.m Line: 225 Column: 11
Expression or statement is incorrect--possibly unbalanced (, {, or [.

Any idea to encounter this?, my matlab version 7.6.0

Thanks in advance.

05 Dec 2011 Giuseppe Cardillo

this is a problem caused by using a new syntax of matlab that is not supported by your version. Simply do this:
1) edit roc
2) change [~,J]=min(d); into [S,J]=min(d);
3) save and exit

Please login to add a comment or rating.
Updates
12 Nov 2008

Changes in help section

19 Nov 2008

Test on significance of AUC added

03 Dec 2008

Input error handling added

10 Dec 2008

if mean(healthy)>mean(unhealthy) the function mirrors the curve to obtain the correct ROC curve.

10 Feb 2009

Mistake correction in z test computation

18 Feb 2009

Changes to make it compatible with uroccomp function

27 Jul 2009

bug correction

29 Jul 2009

New plot output

30 Jul 2009

According to cabrego comment, in the function output the table of cutoff points, sensibility and specificity.

02 Sep 2009

improved compatibility with URocomp

02 Sep 2009

In my previous submission I forgot to add the demo...

03 Sep 2009

advancedmcode link added in description section

18 Sep 2009

correction in ROC performance bounds

06 Nov 2009

I modified the files according to Jens Kaftan suggestion

24 Nov 2009

bug fixing in area computation after adding the points (0,0) and (1,1) as previously suggested

23 Dec 2009

Changes in description

03 Mar 2010

The function is deeper commented

09 Mar 2010

ROC requires another function of mine: partest. If it is not present on the computer, ROC will download it from FEX

08 Apr 2010

another little bug correction to include the points (0,0) and (1,1)

12 Jul 2010

Trapz correction

12 Nov 2010

Bug fixing in Cut off grabbing

12 Nov 2010

Previously I uploaded an old version of roc.m This is the last version

15 Mar 2011

I added the possibility to choose if you want to use all unique values or 3<=N<all unique values as tresholds

16 Nov 2011

running roc without arguments, it will run a demo

Tag Activity for this File
Tag Applied By Date/Time
statistics Giuseppe Cardillo 22 Oct 2008 10:01:55
probability Giuseppe Cardillo 22 Oct 2008 10:01:55
roc Giuseppe Cardillo 22 Oct 2008 10:01:55
receiver operating characteristics Giuseppe Cardillo 22 Oct 2008 10:01:55
statistics Cristina McIntire 12 Nov 2008 14:31:44
probability Cristina McIntire 12 Nov 2008 14:31:44
curve Cristina McIntire 12 Dec 2008 15:40:31
roc mohamad 22 Sep 2010 10:20:31
receiver operating characteristics Renata Kirkwood 30 Jan 2011 19:04:40
roc Yuneza 14 Jun 2011 12:16:59
statistics Yuneza 15 Jun 2011 07:46:57
curve alqueda 08 Sep 2011 12:22:34
receiver operating characteristics pink 12 Oct 2011 05:59:23
roc Roman 21 Oct 2011 11:38:52

Contact us at files@mathworks.com