Code covered by the BSD License

### Highlights from R-square: The coefficient of determination

5.0

5.0 | 4 ratings Rate this file 214 Downloads (last 30 days) File Size: 2.11 KB File ID: #34492

# R-square: The coefficient of determination

07 Jan 2012 (Updated )

RSQUARE is a simple routine for computing R-square (coefficient of determination).

File Information
Description

Compute coefficient of determination of data fit model and RMSE

[r2 rmse] = rsquare(y,f)
[r2 rmse] = rsquare(y,f,c)

RSQUARE computes the coefficient of determination (R-square) value from
actual data Y and model data F. The code uses a general version of
R-square, based on comparing the variability of the estimation errors
with the variability of the original values. RSQUARE also outputs the
root mean squared error (RMSE) for the user's convenience.

Note: RSQUARE ignores comparisons involving NaN values.

INPUTS
Y : Actual data
F : Model fit

OPTION
C : Constant term in model
R-square may be a questionable measure of fit when no
constant term is included in the model.
[DEFAULT] TRUE : Use traditional R-square computation
FALSE : Uses alternate R-square computation for model
without constant term [R2 = 1 - NORM(Y-F)/NORM(Y)]

OUTPUT
R2 : Coefficient of determination
RMSE : Root mean squared error

EXAMPLE
x = 0:0.1:10;
y = 2.*x + 1 + randn(size(x));
p = polyfit(x,y,1);
f = polyval(p,x);
[r2 rmse] = rsquare(y,f);
figure; plot(x,y,'b-');
hold on; plot(x,f,'r-');
title(strcat(['R2 = ' num2str(r2) '; RMSE = ' num2str(rmse)]))

Jered R Wells
11/17/11
jered [dot] wells [at] duke [dot] edu

v1.2 (02/14/2012)

Thanks to John D'Errico for useful comments and insight which has helped
to improve this code. His code POLYFITN was consulted in the inclusion of
the C-option (REF. File ID: #34765).

Required Products MATLAB
MATLAB release MATLAB 7.7 (R2008b)
10 May 2013

Oliver: Sorry for the late reply. In the case of linear regression (when an intercept or constant term is included in the model), my RSQUARE function and the square of MATLAB's CORR will produce the same result. However, RSQUARE provides the option of computing the R-square statistic using an alternate method which prevents negative R-square values which can occur when no constant term is included in the fit model. Please reference John D'Errico's comment below for more details.

Also, for convenience, my function produces the RMSE of the fit to the data.

27 Aug 2012

Hi,

is there a difference between this function and using the square of the Matlab function corr(x,y)

since
x = 0:0.1:10;
y = 2.*x + 1 + randn(size(x));
p = polyfit(x,y,1);
f = polyval(p,x);
[r2 rmse] = rsquare(y,f);

and corr(x',y')^2

gives the same result for r2.

23 Aug 2012

Perfect! Does exactly what it says. Time saved!

24 May 2012
27 Apr 2012

does what it says

24 Apr 2012
14 Feb 2012

John, I've taken your good advice and hopefully made some satisfactory changes. Was not aware of LOOKFOR or H1 lines before reading this, so thanks for the heads up. Will modify future files to include H1 lines. Also referenced your file POLYFITN for your handling of models without constant terms. Good catch (reference to your code provided in my documentation). Also included a few extras in the code that will hopefully benefit users. If you see any other areas for improvement, I'm glad to make relevant mods. I'm always supportive of maintaining good code on the File Exchange.

14 Feb 2012

While the help for this is not bad in general,, certainly not compared to much of what we get here, it has one serious flaw. It lacks an H1 line. But what is that, and why is it important?

An H1 line is the VERY first comment line, which on this code has been left completely blank. An H1 line should be a SINGLE line of description, not carried onto a second line. It should have a set of useful key words in it that one might use to search for that code. After all, how is someone who downloads this file supposed to remember that it is called rsquare, as opposed to rsquared, or rsq, or R2, or r_square, etc? Yes, I said remember, because they won't be using this tool every day. They might want to call it once a month or maybe even once a week. And how should someone, who is not the author supposed to remember that next month, or next year? The trick is, to use lookfor. Lookfor is a keyword search tool in MATLAB that searches through all of those H1 lines, in all functions on your search path, returning those which give hits.

In fact, the author supplied something which is not too bad as an H1 line, but they put it down in the comments, where lookfor will never bother to look.

As far as the code itself goes, its fairly simple, a one-liner, written in a way that is vectorized (good to see) and is robust (also good to see) against whether the user supplies column of row vectors. Of course, if the user supplies an array, it turn that into a column vector too. Is that intended?

There is no error checking, which is a flaw. What if the inputs are not the same size? It will generate a generic matlab error message that might be less easy to recognize, rather than an easy to understand message that states the fact that the two inputs MUST have the same number of elements.

Good code is efficient and all that, but it should not stop there. It should be friendly code that helps the user. It should follow conventions where they exist to help you to find it again later. After all, you may have downloaded hundreds of functions from the file exchange. Good code should be friendly in the sense that it exits gracefully when a user passes in the wrong input, exiting with a easy to read error message.

Next, note that this tool generates the basic R-squared value, not an adjusted R-squared of any sort. Also remember that R-squared is an iffy measure of fit when you have NO constant term in the model. In fact, it is not uncommon to find a negative value for the R-squared parameter in models which lack a constant term. Of course, this always seems to cause concern for people, who simply do not understand when something that they think is apparently squared is coming out negative.

I'm pointing this last fact out because the author has uploaded several other files on the same day, which all apparently fit nonlinear models with NO constant term! If the author is using this coefficient to judge the quality of fit for such a model, then there may indeed be an issue. Of course, it is never a good idea to fall into the mono-numerosis trap anyway. Never use a single number as the final judge of the quality of fit for a model. In fact, you should use your eyes, your brain. Plot the result. If it fits well enough to satisfy you, then the fit is good - be happy. If you see more lack of fit than you could tolerate for your problem, then it is inadequate.

I won't put a numerical rating on this file, because I see the author has already made mods to it to improve it once. That suggests the author cares enough to improve the code, once he sees the reason for making those changes. Since the changes I've described are (while important) not really massive ones, I'll assume the author will fix the problems I've described.

13 Feb 2012

Cleaned up description

13 Feb 2012

Edited help file and added example

14 Feb 2012