Why RMSE obtained by fitlm in matlab does not match with RMSE calculated in EXCEL ؟

Hi every one . I've used the mdl = fitlm(x,y) function to fit a linear regression model to my Dataset. I also calculate the RMSE in Excel by Known Formula . the fitlm function in matlab return the exact value of R-squared calculated in excel and the exact Coefficients of Trendline. but the Value of RMSE in matlab and excel does not match. i was made a wide search but I'm still in trouble with that . any idea ? thanks for help . with best regards .

3 Comments

Insufficient information. Post your code.
Greg
I would specifically suggest posting both a *.m file and a *.xls file that replicate the simplest example you can provide that exhibits the problem.
Hi
Thanks for help
this is the RMSE code i was written in matlab
and i calculate the RMSE value in Excel in the same way of this code shown below
function R = RMSE (data , estimate)
R = sqrt(sum((data(:) - estimate (:)).^2) / numel(data));
end
and this is the fitlm code
for kk=1:size(x,2)
lm=fitlm(x,y,'linear');
Rsq_TR=lm.Rsquared.Ordinary;
RMSE_TR=lm.RMSE;
end
and this is some of my sample data set
-0.29039678 -0.212428117
0.149798734 0.268486256
0.851351352 0.563358782
-0.694364578 -0.652148947
-0.326624496 -0.468058466
-0.019838988 0.209298441
-0.765669926 -0.828415796
-0.133410006 0.003000269
-0.122484186 -0.073538329
-0.471535366 -0.376903371
0.181426106 0.365548334
0.186889016 0.37947765
-0.260782058 -0.024155069
-0.215353652 -0.371639751
-0.062967222 0.092607096
-0.461759632 -0.56834877
0.074468086 0.154155435
0.624496838 0.529777368
-0.197814836 -0.096543508
-0.247268546 -0.136442585
0.106382978 0.28668492
-0.791259344 -0.718599416
-0.532489936 -0.572352826
-0.006325474 -0.010056861
0.48706153 0.485766679
-0.631397354 -0.619362709
-0.815698678 -0.823629139
0.272857964 0.354840024
0.080506038 0.220713202
the left column is the data values(x) and the right column is the estimate (y) values

Sign in to comment.

 Accepted Answer

Your known formula is not always the formula that one might use. In fact, there is a subtly different alternative.
You divided by the number of data points there. In fact, a rational formula for RMSE has one divide by the number of data, less the number of parameters estimated. So by the number of degrees of freedom. A simple test of this fact is often the easiest thing to do, then one can verify my thesis.
x = randn(100,1);
y = randn(100,1);
lm = fitlm(x,y,'linear')
lm =
Linear regression model:
y ~ 1 + x1
Estimated Coefficients:
Estimate SE tStat pValue
__________________ __________________ __________________ _________________
(Intercept) -0.060640930787764 0.0951675587117722 -0.637201706218221 0.52547928403413
x1 0.0370287221087163 0.0935517335328456 0.395810111799967 0.693105501934868
Number of observations: 100, Error degrees of freedom: 98
Root Mean Squared Error: 0.946
R-squared: 0.0016, Adjusted R-Squared -0.00859
F-statistic vs. constant model: 0.157, p-value = 0.693
lm.RMSE
ans =
0.946014427051301
sqrt(sum((y - lm.predict(x)).^2/100))
ans =
0.936506503055594
sqrt(sum((y - lm.predict(x)).^2/98))
ans =
0.946014427051301
As you can see, dividing by the degrees of freedom is what fitlm must be doing.

3 Comments

Hi Thanks again
please explain that what is the Number of observations & Error degrees of freedom ?
and why we must divide by the number of data, less the number of parameters estimated ??
and what about the 2 other points that we are not consider and not include in dividing in this example ??
and what kind of RMSE is most correct than the other ? RMSE obtained by fitlm function or obtained by known formula in top of this page ?
with best regards
Hello again
Thank you for your helps.
I must say this before but I thought that it is not important.
This Data set shown below is My Data set for function approximation with Neural Network in MATLAB. The left column is the Target vector and the right column is the model output vector.
-0.29039678 -0.212428117
0.149798734 0.268486256
0.851351352 0.563358782
-0.694364578 -0.652148947
-0.326624496 -0.468058466
-0.019838988 0.209298441
-0.765669926 -0.828415796
-0.133410006 0.003000269
-0.122484186 -0.073538329
-0.471535366 -0.376903371
0.181426106 0.365548334
0.186889016 0.37947765
-0.260782058 -0.024155069
-0.215353652 -0.371639751
-0.062967222 0.092607096
-0.461759632 -0.56834877
0.074468086 0.154155435
0.624496838 0.529777368
-0.197814836 -0.096543508
-0.247268546 -0.136442585
0.106382978 0.28668492
-0.791259344 -0.718599416
-0.532489936 -0.572352826
-0.006325474 -0.010056861
0.48706153 0.485766679
-0.631397354 -0.619362709
-0.815698678 -0.823629139
0.272857964 0.354840024
0.080506038 0.220713202
and I want to show the capability of my model by calculating the RMSE value between the Target & Output vectors in MATLAB .
my question is that what kind of RMSE should I use ?
The RMSE obtained by fitlm function or RMSE obtained by the formula shown below?
function R = RMSE (data , estimate)
R = sqrt(sum((data(:) - estimate (:)).^2) / numel(data));
end
and if I use the RMSE obtained by fitlm function, Is the RMSE value true ??
because the vector created by
lm.predict(x)
and my Output vector are different.
with best regards
You are asking which one is correct.
Well, they all are correct. They are just different measures of the same model. You are free to choose any one you want. HOWEVER, if the differences are significant then you should be able to explain why.
Since I are un injuneer and not a statistician, I will refer you to Google and Wikipedia re the search words
tutorial degrees-of-freedom
Hope this helps.
Greg

Sign in to comment.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Asked:

on 27 May 2016

Commented:

on 2 Jun 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!