Get the equation for a linear regression model
Show older comments
What code do I use to get the software to display the acutal equation instead of "[Linear formula with 18 terms in 7 predictors]"?

11 Comments
Walter Roberson
on 7 Oct 2025
You could try
mdl.Formula
Here is an example:
load carsmall % Load the carsmall data set
X = [Weight,Horsepower,Acceleration];
mdl = fitlm(X,MPG); % Fit a linear regression model
vars = cat(1, {'1'}, mdl.PredictorNames); % create variable matrix
coeffs = string(mdl.Coefficients.Estimate); % create coeffcients matrix
equation = strcat('y=', strjoin(strcat('(', strcat(strcat(coeffs, '*'), vars), ')'), ' + ')) % output
That could get pretty messy with 18 terms in 7 predictor variables...
@Heather, can you attach a .mat file of the model to poke at? There's nothing even remotely close to that complexity in examples Mathworks provides.
ADDENDUM:
Stray thought --
@Walter Roberson, I don't have the Symbolic TB; is there any way if @Heather did she could use it to generate a usable representation?
The actual linear model in your case is
Linear Model ~ 1 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x1*x7 + x2*x4 + x2*x7 + x3*x4 + x3*x6 + x3*x7 + x4*x6 + x4*x7 + x5*x7 + x6*x7
where x1,...,x7 are the 7 predictor variables
(x1 = GurleyPorosity, x2 = SheffieldSmoothness, x3 = CobbAbsorbency, x4 = MoistureContent , x5 = ContactAngleZero , x6 = ContactAngleTwo, x7 = ContactAngleDifference)
This means that "fitlm" tries to find 18 coefficients c1,c2,...,c18 such that the dependent variable FipagoFive is approximated by
FipagoFive = c1 + c2*x1 + c3*x2 + c4*x3 + c5*x4 + c6*x5 + c7*x6 + c8*x7 + c9*x1*x7 + c10*x2*x4 + c11*x2*x7 + c12*x3*x4 + c13*x3*x6 + c14*x3*x7 + c15*x4*x6 + c16*x4*x7 + c17*x5*x7 + c18*x6*x7
I doubt this is senseful because you only have 21 measurements for 18 unknowns.
load 'October 2025 Testing'
whos
head(FivesFipago,2)
vnames=FivesFipago.Properties.VariableNames;
mdl
mdl.Formula
whos ans
terms=strip(split(string(mdl.Formula),{'~','+'}))
This seems inconsistent; the model summary indicates 18 terms but the formula returns only 11 including the intercept; only the interaction terms are listed in the formula.
I don't follow that at the moment; looks like a bug, maybe, I don't know for sure at the moment, but certainly looks suspicious.
Of course, with only 21 data points, it's severely overfitted although the t-values are significant for all except the one that appears to be completely non-estimable; probably owing to a linear correlation with another; you'll have to investigate that.
Let's see what
mdl.VariableInfo
shows. OK, it just lists whether the variable is in the model regardless.
coeff=mdl.Coefficients
is the summary table output which indicates the model should have them in there as well.
Vars1=mdl.Variables(1,1:7) % first row of the input data variables
[predict(mdl,Vars1{1,:}) mdl.Fitted(1)] % see if we can match its expected output
So see what get if use only the coefficients that match the elements in the formula...
terms=terms(2:end);
terms=strrep(terms,'*',':');
terms(1)=coeff.Properties.RowNames(1); % fixup to match intercept
%coeff.Properties.RowNames
%terms
[ia,ib]=ismember(coeff.Properties.RowNames,terms); % find where terms are in formula
Then need to get the terms of the cross-products location and evalute them to add the term values to see if match prediction;
vars=split(terms(2:end),':'); % get the variables in each cross term
vnames=vnames.';
%v=[arrayfun(@(v)find(matches(vnames,v)),vars(:,1)) arrayfun(@(v)find(matches(vnames,v)),vars(:,2))] %
V=coeff.Estimate(1); % intercept
for i=1:height(vars)
x=FivesFipago.(vars(i,1))(1)*FivesFipago.(vars(i,2))(1);
V=V+coeff.Estimate(i+8)*x;
end
V
As expected, this doesn't match the actual model at all...double-check my work, but it definitely looks like the formula being returned is incomplete in this instance in that it doesn't include all the terms.
One expects maybe there's an issue with the size of the model that with fewer terms it does work correctly and so it hasn't been caught out before.
load 'October 2025 Testing'
%head(FivesFipago,2)
scatter(FivesFipago.ContactAngleTwo,FivesFipago.FipagoFive)
xlabel('Angle2'), ylabel('Response')
%corr(FivesFipago.ContactAngleTwo,FivesFipago.FipagoFive)
The response isn't linearly correlated with angle2 so not sure why it didn't behave well...would take some more digging.
R=corrcoef(FivesFipago{:,:})
R(logical(eye(size(R))))=nan; % so can search for off diagonal elements only
[Rmx,imax]=max(abs(R),[],'all','omitnan'); [r,c]=ind2sub(size(R),imax);
Rmx
figure
vnames=FivesFipago.Properties.VariableNames;
p=fitlm(FivesFipago,[vnames{r} '~' vnames{c}])
hL=plot(p); delete(hL(3)) % unclutter confidence limits
xlim(xlim+[-5 0]) % move 0 off axis for visibility
%scatter(FivesFipago.(vnames{r}),FivesFipago.(vnames{c}))
%xlabel(vnames(r)), ylabel(vnames(c))
And there's the issue; AngleTwo and the AngleDifference are very highly correlated; use one or the other, but not both.
Torsten
on 9 Oct 2025
The easiest way to see that one of ContactAngleDifference, ContactAngleTwo or ContactAngleZero has to be removed from the list of "independent" variables is the relation
ContactAngleDifference = ContactAngleTwo - ContactAngleZero
dpb
on 9 Oct 2025
Yeah, @Torsten, that is so if/when the definition is provided. The above works to discover any pair of highly correlated variables regardless if all one has are the data. That's why approacheed it empirically for @Heather (as well as just illustrating use of variables for addressing table variables which is powerful and neat facility generally not appreciated by neophytes).
I'm still curious about the .formula property not returning the full model terms, but haven't yet had time to probe more deeply before submitting a bug report.
Torsten
on 9 Oct 2025
It seems that if second-order or interaction terms are present in the linear model, the main effects are not included in the formula for the regression model.
The example
Terms Matrix for Matrix Input
under
shows this (illogical) behaviour, but without commenting on it or giving a justification.
dpb
on 9 Oct 2025
Well, it is logical in a sense, just not well documented what mdl.Formula property returns...it is the Wilkinson representation of the model in which "x1*x2" implies the interaction term plus all lower terms. There's a difference in specifying "x1*x2" and "x1:x2" which is easy to miss--the latter says to include only the specific interaction term.
I haven't yet investigated fully whether the returned formula property is always in Wilkinson notation regardless of how the input model is specified or whether it just reflects the user input specification.
I'll try to do some more spelunking and see; at least the doc needs to be explicit in what the user should expect; certainly for @Heather's Q? here, it isn't what she's looking for; one would have to build that expression from the table of coefficients similarly as your earlier examples. If that were to be so, that seems rude to be the only way to be able to generate the full representation of the model for presentation purposes, for example.
Answers (0)
Categories
Find more on R Language in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
