Get the equation for a linear regression model

What code do I use to get the software to display the acutal equation instead of "[Linear formula with 18 terms in 7 predictors]"?

11 Comments

Here is an example:
load carsmall % Load the carsmall data set
X = [Weight,Horsepower,Acceleration];
mdl = fitlm(X,MPG); % Fit a linear regression model
vars = cat(1, {'1'}, mdl.PredictorNames); % create variable matrix
coeffs = string(mdl.Coefficients.Estimate); % create coeffcients matrix
equation = strcat('y=', strjoin(strcat('(', strcat(strcat(coeffs, '*'), vars), ')'), ' + ')) % output
equation = "y=(47.9768*1) + (-0.00654156*x1) + (-0.0429433*x2) + (-0.0115827*x3)"
dpb
dpb on 8 Oct 2025
Edited: dpb on 8 Oct 2025
That could get pretty messy with 18 terms in 7 predictor variables...
@Heather, can you attach a .mat file of the model to poke at? There's nothing even remotely close to that complexity in examples Mathworks provides.
ADDENDUM:
Stray thought --
@Walter Roberson, I don't have the Symbolic TB; is there any way if @Heather did she could use it to generate a usable representation?
I am more of a beginner in understanding MATLAB. I have attached my model, we have a limited set of datapoints but it would be helpful to me to have someone take a look it.
load("October 2025 Testing.mat")
tbl
tbl = 21×8 table
GurleyPorosity SheffieldSmoothness CobbAbsorbency MoistureContent ContactAngleZero ContactAngleTwo ContactAngleDifference FipagoFive ______________ ___________________ ______________ _______________ ________________ _______________ ______________________ __________ 22.9 421.9 47.6 7.41 86 78 -8 59 24.7 415.9 80 6.68 91 53 -38 77 52.2 424.5 102.7 7.22 107 106 -1 49 30.5 388 41.6 7.89 106 102 -4 57 15.7 424 61.4 6.99 83 0 -83 49 8.6 425.9 34.7 7.95 85 61 -24 55 17.2 345.8 64.4 8.23 103 99 -4 64 25.5 419.5 32.9 8.6 91 85 -6 58 39.6 344.7 40 6.85 110 103 -7 56 48.3 363.2 103.5 8.16 101 95 -6 63 14.4 403.7 67.8 8.04 91 79 -12 56 41.8 380.6 28.7 7.74 103 92 -11 66 15.9 416.2 99.2 4.97 106 99 -7 56 15.7 397.4 77.7 6.64 103 100 -3 50 10.5 409.1 40.8 6.97 97 90 -7 63 25.1 406.3 39.4 7.21 107 100 -7 61
mdl
mdl =
Linear regression model: FipagoFive ~ [Linear formula with 18 terms in 7 predictors] Estimated Coefficients: Estimate SE tStat pValue _________ _________ _______ _________ (Intercept) -1621.4 221.63 -7.3157 0.0052752 GurleyPorosity -0.40375 0.10334 -3.9069 0.029784 SheffieldSmoothness 3.1353 0.33324 9.4084 0.0025441 CobbAbsorbency -6.4152 0.84387 -7.6021 0.0047234 MoistureContent 225.48 25.385 8.8824 0.0030089 ContactAngleZero 6.0657 1.1731 5.1706 0.014036 ContactAngleTwo 0 0 NaN NaN ContactAngleDifference -26.253 8.8001 -2.9833 0.058444 GurleyPorosity:ContactAngleDifference -0.080539 0.017992 -4.4765 0.020781 SheffieldSmoothness:MoistureContent -0.38717 0.038452 -10.069 0.002086 SheffieldSmoothness:ContactAngleDifference 0.036217 0.011543 3.1375 0.051764 CobbAbsorbency:MoistureContent 0.30543 0.032778 9.318 0.0026169 CobbAbsorbency:ContactAngleTwo 0.03633 0.0057831 6.2821 0.0081452 CobbAbsorbency:ContactAngleDifference -0.072361 0.012034 -6.0132 0.0092153 MoistureContent:ContactAngleTwo -0.90509 0.13664 -6.6239 0.007008 MoistureContent:ContactAngleDifference 0.80417 0.28096 2.8623 0.064458 ContactAngleZero:ContactAngleDifference 0.17905 0.052756 3.3939 0.042651 ContactAngleTwo:ContactAngleDifference -0.020167 0.004475 -4.5067 0.020409 Number of observations: 21, Error degrees of freedom: 4 Root Mean Squared Error: 1.32 R-squared: 0.993, Adjusted R-Squared: 0.967 F-statistic vs. constant model: 37.1, p-value = 0.00156
mdl = fitlm(tbl,'FipagoFive~1+GurleyPorosity+SheffieldSmoothness+CobbAbsorbency+MoistureContent+ContactAngleZero+ContactAngleTwo+ContactAngleDifference+GurleyPorosity:ContactAngleDifference+SheffieldSmoothness:MoistureContent+SheffieldSmoothness:ContactAngleDifference+CobbAbsorbency:MoistureContent+CobbAbsorbency:ContactAngleTwo+CobbAbsorbency:ContactAngleDifference+MoistureContent:ContactAngleTwo+MoistureContent:ContactAngleDifference+ContactAngleZero:ContactAngleDifference+ContactAngleTwo:ContactAngleDifference')
Warning: Regression design matrix is rank deficient to within machine precision.
mdl =
Linear regression model: FipagoFive ~ [Linear formula with 18 terms in 7 predictors] Estimated Coefficients: Estimate SE tStat pValue _________ _________ _______ _________ (Intercept) -1621.4 221.63 -7.3157 0.0052752 GurleyPorosity -0.40375 0.10334 -3.9069 0.029784 SheffieldSmoothness 3.1353 0.33324 9.4084 0.0025441 CobbAbsorbency -6.4152 0.84387 -7.6021 0.0047234 MoistureContent 225.48 25.385 8.8824 0.0030089 ContactAngleZero 32.319 9.5681 3.3777 0.043164 ContactAngleTwo -26.253 8.8001 -2.9833 0.058444 ContactAngleDifference 0 0 NaN NaN SheffieldSmoothness:MoistureContent -0.38717 0.038452 -10.069 0.002086 CobbAbsorbency:MoistureContent 0.30543 0.032778 9.318 0.0026169 CobbAbsorbency:ContactAngleTwo 0.03633 0.0057831 6.2821 0.0081452 MoistureContent:ContactAngleTwo -0.90509 0.13664 -6.6239 0.007008 GurleyPorosity:ContactAngleDifference -0.080539 0.017992 -4.4765 0.020781 SheffieldSmoothness:ContactAngleDifference 0.036217 0.011543 3.1375 0.051764 CobbAbsorbency:ContactAngleDifference -0.072361 0.012034 -6.0132 0.0092153 MoistureContent:ContactAngleDifference 0.80417 0.28096 2.8623 0.064458 ContactAngleZero:ContactAngleDifference 0.17905 0.052756 3.3939 0.042651 ContactAngleTwo:ContactAngleDifference -0.020167 0.004475 -4.5067 0.020409 Number of observations: 21, Error degrees of freedom: 4 Root Mean Squared Error: 1.32 R-squared: 0.993, Adjusted R-Squared: 0.967 F-statistic vs. constant model: 37.1, p-value = 0.00156
The actual linear model in your case is
Linear Model ~ 1 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x1*x7 + x2*x4 + x2*x7 + x3*x4 + x3*x6 + x3*x7 + x4*x6 + x4*x7 + x5*x7 + x6*x7
where x1,...,x7 are the 7 predictor variables
(x1 = GurleyPorosity, x2 = SheffieldSmoothness, x3 = CobbAbsorbency, x4 = MoistureContent , x5 = ContactAngleZero , x6 = ContactAngleTwo, x7 = ContactAngleDifference)
This means that "fitlm" tries to find 18 coefficients c1,c2,...,c18 such that the dependent variable FipagoFive is approximated by
FipagoFive = c1 + c2*x1 + c3*x2 + c4*x3 + c5*x4 + c6*x5 + c7*x6 + c8*x7 + c9*x1*x7 + c10*x2*x4 + c11*x2*x7 + c12*x3*x4 + c13*x3*x6 + c14*x3*x7 + c15*x4*x6 + c16*x4*x7 + c17*x5*x7 + c18*x6*x7
I doubt this is senseful because you only have 21 measurements for 18 unknowns.
load 'October 2025 Testing'
whos
Name Size Bytes Class Attributes FivesFipago 21x8 4249 table ans 1x48 96 char mdl 1x1 65826 LinearModel tbl 21x8 4249 table
head(FivesFipago,2)
GurleyPorosity SheffieldSmoothness CobbAbsorbency MoistureContent ContactAngleZero ContactAngleTwo ContactAngleDifference FipagoFive ______________ ___________________ ______________ _______________ ________________ _______________ ______________________ __________ 22.9 421.9 47.6 7.41 86 78 -8 59 24.7 415.9 80 6.68 91 53 -38 77
vnames=FivesFipago.Properties.VariableNames;
mdl
mdl =
Linear regression model: FipagoFive ~ [Linear formula with 18 terms in 7 predictors] Estimated Coefficients: Estimate SE tStat pValue _________ _________ _______ _________ (Intercept) -1621.4 221.63 -7.3157 0.0052752 GurleyPorosity -0.40375 0.10334 -3.9069 0.029784 SheffieldSmoothness 3.1353 0.33324 9.4084 0.0025441 CobbAbsorbency -6.4152 0.84387 -7.6021 0.0047234 MoistureContent 225.48 25.385 8.8824 0.0030089 ContactAngleZero 6.0657 1.1731 5.1706 0.014036 ContactAngleTwo 0 0 NaN NaN ContactAngleDifference -26.253 8.8001 -2.9833 0.058444 GurleyPorosity:ContactAngleDifference -0.080539 0.017992 -4.4765 0.020781 SheffieldSmoothness:MoistureContent -0.38717 0.038452 -10.069 0.002086 SheffieldSmoothness:ContactAngleDifference 0.036217 0.011543 3.1375 0.051764 CobbAbsorbency:MoistureContent 0.30543 0.032778 9.318 0.0026169 CobbAbsorbency:ContactAngleTwo 0.03633 0.0057831 6.2821 0.0081452 CobbAbsorbency:ContactAngleDifference -0.072361 0.012034 -6.0132 0.0092153 MoistureContent:ContactAngleTwo -0.90509 0.13664 -6.6239 0.007008 MoistureContent:ContactAngleDifference 0.80417 0.28096 2.8623 0.064458 ContactAngleZero:ContactAngleDifference 0.17905 0.052756 3.3939 0.042651 ContactAngleTwo:ContactAngleDifference -0.020167 0.004475 -4.5067 0.020409 Number of observations: 21, Error degrees of freedom: 4 Root Mean Squared Error: 1.32 R-squared: 0.993, Adjusted R-Squared: 0.967 F-statistic vs. constant model: 37.1, p-value = 0.00156
mdl.Formula
ans =
FipagoFive ~ 1 + GurleyPorosity*ContactAngleDifference + SheffieldSmoothness*MoistureContent + SheffieldSmoothness*ContactAngleDifference + CobbAbsorbency*MoistureContent + CobbAbsorbency*ContactAngleTwo + CobbAbsorbency*ContactAngleDifference + MoistureContent*ContactAngleTwo + MoistureContent*ContactAngleDifference + ContactAngleZero*ContactAngleDifference + ContactAngleTwo*ContactAngleDifference
whos ans
Name Size Bytes Class Attributes ans 1x1 3202 classreg.regr.LinearFormula
terms=strip(split(string(mdl.Formula),{'~','+'}))
terms = 12×1 string array
"FipagoFive" "1" "GurleyPorosity*ContactAngleDifference" "SheffieldSmoothness*MoistureContent" "SheffieldSmoothness*ContactAngleDifference" "CobbAbsorbency*MoistureContent" "CobbAbsorbency*ContactAngleTwo" "CobbAbsorbency*ContactAngleDifference" "MoistureContent*ContactAngleTwo" "MoistureContent*ContactAngleDifference" "ContactAngleZero*ContactAngleDifference" "ContactAngleTwo*ContactAngleDifference"
This seems inconsistent; the model summary indicates 18 terms but the formula returns only 11 including the intercept; only the interaction terms are listed in the formula.
I don't follow that at the moment; looks like a bug, maybe, I don't know for sure at the moment, but certainly looks suspicious.
Of course, with only 21 data points, it's severely overfitted although the t-values are significant for all except the one that appears to be completely non-estimable; probably owing to a linear correlation with another; you'll have to investigate that.
Let's see what
mdl.VariableInfo
ans = 8×4 table
Class Range InModel IsCategorical __________ _____________________ _______ _____________ GurleyPorosity {'double'} {[ 8.6000 52.2000]} true false SheffieldSmoothness {'double'} {[344.7000 425.9000]} true false CobbAbsorbency {'double'} {[ 28.7000 153.8000]} true false MoistureContent {'double'} {[ 4.9700 9.1100]} true false ContactAngleZero {'double'} {[ 83 110]} true false ContactAngleTwo {'double'} {[ 0 106]} true false ContactAngleDifference {'double'} {[ -83 -1]} true false FipagoFive {'double'} {[ 49 77]} false false
shows. OK, it just lists whether the variable is in the model regardless.
coeff=mdl.Coefficients
coeff = 18×4 table
Estimate SE tStat pValue _________ _________ _______ _________ (Intercept) -1621.4 221.63 -7.3157 0.0052752 GurleyPorosity -0.40375 0.10334 -3.9069 0.029784 SheffieldSmoothness 3.1353 0.33324 9.4084 0.0025441 CobbAbsorbency -6.4152 0.84387 -7.6021 0.0047234 MoistureContent 225.48 25.385 8.8824 0.0030089 ContactAngleZero 6.0657 1.1731 5.1706 0.014036 ContactAngleTwo 0 0 NaN NaN ContactAngleDifference -26.253 8.8001 -2.9833 0.058444 GurleyPorosity:ContactAngleDifference -0.080539 0.017992 -4.4765 0.020781 SheffieldSmoothness:MoistureContent -0.38717 0.038452 -10.069 0.002086 SheffieldSmoothness:ContactAngleDifference 0.036217 0.011543 3.1375 0.051764 CobbAbsorbency:MoistureContent 0.30543 0.032778 9.318 0.0026169 CobbAbsorbency:ContactAngleTwo 0.03633 0.0057831 6.2821 0.0081452 CobbAbsorbency:ContactAngleDifference -0.072361 0.012034 -6.0132 0.0092153 MoistureContent:ContactAngleTwo -0.90509 0.13664 -6.6239 0.007008 MoistureContent:ContactAngleDifference 0.80417 0.28096 2.8623 0.064458
is the summary table output which indicates the model should have them in there as well.
Vars1=mdl.Variables(1,1:7) % first row of the input data variables
Vars1 = 1×7 table
GurleyPorosity SheffieldSmoothness CobbAbsorbency MoistureContent ContactAngleZero ContactAngleTwo ContactAngleDifference ______________ ___________________ ______________ _______________ ________________ _______________ ______________________ 22.9 421.9 47.6 7.41 86 78 -8
[predict(mdl,Vars1{1,:}) mdl.Fitted(1)] % see if we can match its expected output
ans = 1×2
60.1392 60.1392
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
So see what get if use only the coefficients that match the elements in the formula...
terms=terms(2:end);
terms=strrep(terms,'*',':');
terms(1)=coeff.Properties.RowNames(1); % fixup to match intercept
%coeff.Properties.RowNames
%terms
[ia,ib]=ismember(coeff.Properties.RowNames,terms); % find where terms are in formula
Then need to get the terms of the cross-products location and evalute them to add the term values to see if match prediction;
vars=split(terms(2:end),':'); % get the variables in each cross term
vnames=vnames.';
%v=[arrayfun(@(v)find(matches(vnames,v)),vars(:,1)) arrayfun(@(v)find(matches(vnames,v)),vars(:,2))] %
V=coeff.Estimate(1); % intercept
for i=1:height(vars)
x=FivesFipago.(vars(i,1))(1)*FivesFipago.(vars(i,2))(1);
V=V+coeff.Estimate(i+8)*x;
end
V
V = -3.3505e+03
As expected, this doesn't match the actual model at all...double-check my work, but it definitely looks like the formula being returned is incomplete in this instance in that it doesn't include all the terms.
One expects maybe there's an issue with the size of the model that with fewer terms it does work correctly and so it hasn't been caught out before.
load 'October 2025 Testing'
%head(FivesFipago,2)
scatter(FivesFipago.ContactAngleTwo,FivesFipago.FipagoFive)
xlabel('Angle2'), ylabel('Response')
%corr(FivesFipago.ContactAngleTwo,FivesFipago.FipagoFive)
The response isn't linearly correlated with angle2 so not sure why it didn't behave well...would take some more digging.
R=corrcoef(FivesFipago{:,:})
R = 8×8
1.0000 -0.3230 0.1308 0.1541 0.4912 0.3922 0.2939 0.0425 -0.3230 1.0000 -0.1190 -0.2158 -0.5411 -0.4033 -0.2864 -0.3442 0.1308 -0.1190 1.0000 0.0104 0.0529 0.0466 0.0373 0.1689 0.1541 -0.2158 0.0104 1.0000 -0.2884 0.0430 0.1818 0.0953 0.4912 -0.5411 0.0529 -0.2884 1.0000 0.7696 0.5608 0.0049 0.3922 -0.4033 0.0466 0.0430 0.7696 1.0000 0.9602 0.0089 0.2939 -0.2864 0.0373 0.1818 0.5608 0.9602 1.0000 0.0094 0.0425 -0.3442 0.1689 0.0953 0.0049 0.0089 0.0094 1.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
R(logical(eye(size(R))))=nan; % so can search for off diagonal elements only
[Rmx,imax]=max(abs(R),[],'all','omitnan'); [r,c]=ind2sub(size(R),imax);
Rmx
Rmx = 0.9602
figure
vnames=FivesFipago.Properties.VariableNames;
p=fitlm(FivesFipago,[vnames{r} '~' vnames{c}])
p =
Linear regression model: ContactAngleDifference ~ 1 + ContactAngleTwo Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ __________ (Intercept) -75.214 4.3607 -17.248 4.6098e-13 ContactAngleTwo 0.7405 0.049388 14.993 5.5462e-12 Number of observations: 21, Error degrees of freedom: 19 Root Mean Squared Error: 5.21 R-squared: 0.922, Adjusted R-Squared: 0.918 F-statistic vs. constant model: 225, p-value = 5.55e-12
hL=plot(p); delete(hL(3)) % unclutter confidence limits
xlim(xlim+[-5 0]) % move 0 off axis for visibility
%scatter(FivesFipago.(vnames{r}),FivesFipago.(vnames{c}))
%xlabel(vnames(r)), ylabel(vnames(c))
And there's the issue; AngleTwo and the AngleDifference are very highly correlated; use one or the other, but not both.
The easiest way to see that one of ContactAngleDifference, ContactAngleTwo or ContactAngleZero has to be removed from the list of "independent" variables is the relation
ContactAngleDifference = ContactAngleTwo - ContactAngleZero
Yeah, @Torsten, that is so if/when the definition is provided. The above works to discover any pair of highly correlated variables regardless if all one has are the data. That's why approacheed it empirically for @Heather (as well as just illustrating use of variables for addressing table variables which is powerful and neat facility generally not appreciated by neophytes).
I'm still curious about the .formula property not returning the full model terms, but haven't yet had time to probe more deeply before submitting a bug report.
It seems that if second-order or interaction terms are present in the linear model, the main effects are not included in the formula for the regression model.
The example
Terms Matrix for Matrix Input
under
shows this (illogical) behaviour, but without commenting on it or giving a justification.
Well, it is logical in a sense, just not well documented what mdl.Formula property returns...it is the Wilkinson representation of the model in which "x1*x2" implies the interaction term plus all lower terms. There's a difference in specifying "x1*x2" and "x1:x2" which is easy to miss--the latter says to include only the specific interaction term.
I haven't yet investigated fully whether the returned formula property is always in Wilkinson notation regardless of how the input model is specified or whether it just reflects the user input specification.
I'll try to do some more spelunking and see; at least the doc needs to be explicit in what the user should expect; certainly for @Heather's Q? here, it isn't what she's looking for; one would have to build that expression from the table of coefficients similarly as your earlier examples. If that were to be so, that seems rude to be the only way to be able to generate the full representation of the model for presentation purposes, for example.

Sign in to comment.

Answers (0)

Asked:

on 7 Oct 2025

Commented:

dpb
on 9 Oct 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!