Get the equation for a linear regression model

Question

0 votes

What code do I use to get the software to display the acutal equation instead of "[Linear formula with 18 terms in 7 predictors]"?

11 Comments
Show 9 older comments Hide 9 older comments

Heather on 8 Oct 2025

Edited: Torsten on 8 Oct 2025

Open in MATLAB Online

October 2025 Testing.mat

I am more of a beginner in understanding MATLAB. I have attached my model, we have a limited set of datapoints but it would be helpful to me to have someone take a look it.

load("October 2025 Testing.mat")
tbl
tbl = 21×8 table
    GurleyPorosity    SheffieldSmoothness    CobbAbsorbency    MoistureContent    ContactAngleZero    ContactAngleTwo    ContactAngleDifference    FipagoFive
    ______________    ___________________    ______________    _______________    ________________    _______________    ______________________    __________

         22.9                421.9                47.6              7.41                 86                  78                    -8                  59    
         24.7                415.9                  80              6.68                 91                  53                   -38                  77    
         52.2                424.5               102.7              7.22                107                 106                    -1                  49    
         30.5                  388                41.6              7.89                106                 102                    -4                  57    
         15.7                  424                61.4              6.99                 83                   0                   -83                  49    
          8.6                425.9                34.7              7.95                 85                  61                   -24                  55    
         17.2                345.8                64.4              8.23                103                  99                    -4                  64    
         25.5                419.5                32.9               8.6                 91                  85                    -6                  58    
         39.6                344.7                  40              6.85                110                 103                    -7                  56    
         48.3                363.2               103.5              8.16                101                  95                    -6                  63    
         14.4                403.7                67.8              8.04                 91                  79                   -12                  56    
         41.8                380.6                28.7              7.74                103                  92                   -11                  66    
         15.9                416.2                99.2              4.97                106                  99                    -7                  56    
         15.7                397.4                77.7              6.64                103                 100                    -3                  50    
         10.5                409.1                40.8              6.97                 97                  90                    -7                  63    
         25.1                406.3                39.4              7.21                107                 100                    -7                  61    
mdl
mdl = 
Linear regression model:
    FipagoFive ~ [Linear formula with 18 terms in 7 predictors]

Estimated Coefficients:
                                                  Estimate        SE         tStat      pValue  
                                                  _________    _________    _______    _________

    (Intercept)                                     -1621.4       221.63    -7.3157    0.0052752
    GurleyPorosity                                 -0.40375      0.10334    -3.9069     0.029784
    SheffieldSmoothness                              3.1353      0.33324     9.4084    0.0025441
    CobbAbsorbency                                  -6.4152      0.84387    -7.6021    0.0047234
    MoistureContent                                  225.48       25.385     8.8824    0.0030089
    ContactAngleZero                                 6.0657       1.1731     5.1706     0.014036
    ContactAngleTwo                                       0            0        NaN          NaN
    ContactAngleDifference                          -26.253       8.8001    -2.9833     0.058444
    GurleyPorosity:ContactAngleDifference         -0.080539     0.017992    -4.4765     0.020781
    SheffieldSmoothness:MoistureContent            -0.38717     0.038452    -10.069     0.002086
    SheffieldSmoothness:ContactAngleDifference     0.036217     0.011543     3.1375     0.051764
    CobbAbsorbency:MoistureContent                  0.30543     0.032778      9.318    0.0026169
    CobbAbsorbency:ContactAngleTwo                  0.03633    0.0057831     6.2821    0.0081452
    CobbAbsorbency:ContactAngleDifference         -0.072361     0.012034    -6.0132    0.0092153
    MoistureContent:ContactAngleTwo                -0.90509      0.13664    -6.6239     0.007008
    MoistureContent:ContactAngleDifference          0.80417      0.28096     2.8623     0.064458
    ContactAngleZero:ContactAngleDifference         0.17905     0.052756     3.3939     0.042651
    ContactAngleTwo:ContactAngleDifference        -0.020167     0.004475    -4.5067     0.020409


Number of observations: 21, Error degrees of freedom: 4
Root Mean Squared Error: 1.32
R-squared: 0.993,  Adjusted R-Squared: 0.967
F-statistic vs. constant model: 37.1, p-value = 0.00156
mdl = fitlm(tbl,'FipagoFive~1+GurleyPorosity+SheffieldSmoothness+CobbAbsorbency+MoistureContent+ContactAngleZero+ContactAngleTwo+ContactAngleDifference+GurleyPorosity:ContactAngleDifference+SheffieldSmoothness:MoistureContent+SheffieldSmoothness:ContactAngleDifference+CobbAbsorbency:MoistureContent+CobbAbsorbency:ContactAngleTwo+CobbAbsorbency:ContactAngleDifference+MoistureContent:ContactAngleTwo+MoistureContent:ContactAngleDifference+ContactAngleZero:ContactAngleDifference+ContactAngleTwo:ContactAngleDifference')
Warning: Regression design matrix is rank deficient to within machine precision.
mdl = 
Linear regression model:
    FipagoFive ~ [Linear formula with 18 terms in 7 predictors]

Estimated Coefficients:
                                                  Estimate        SE         tStat      pValue  
                                                  _________    _________    _______    _________

    (Intercept)                                     -1621.4       221.63    -7.3157    0.0052752
    GurleyPorosity                                 -0.40375      0.10334    -3.9069     0.029784
    SheffieldSmoothness                              3.1353      0.33324     9.4084    0.0025441
    CobbAbsorbency                                  -6.4152      0.84387    -7.6021    0.0047234
    MoistureContent                                  225.48       25.385     8.8824    0.0030089
    ContactAngleZero                                 32.319       9.5681     3.3777     0.043164
    ContactAngleTwo                                 -26.253       8.8001    -2.9833     0.058444
    ContactAngleDifference                                0            0        NaN          NaN
    SheffieldSmoothness:MoistureContent            -0.38717     0.038452    -10.069     0.002086
    CobbAbsorbency:MoistureContent                  0.30543     0.032778      9.318    0.0026169
    CobbAbsorbency:ContactAngleTwo                  0.03633    0.0057831     6.2821    0.0081452
    MoistureContent:ContactAngleTwo                -0.90509      0.13664    -6.6239     0.007008
    GurleyPorosity:ContactAngleDifference         -0.080539     0.017992    -4.4765     0.020781
    SheffieldSmoothness:ContactAngleDifference     0.036217     0.011543     3.1375     0.051764
    CobbAbsorbency:ContactAngleDifference         -0.072361     0.012034    -6.0132    0.0092153
    MoistureContent:ContactAngleDifference          0.80417      0.28096     2.8623     0.064458
    ContactAngleZero:ContactAngleDifference         0.17905     0.052756     3.3939     0.042651
    ContactAngleTwo:ContactAngleDifference        -0.020167     0.004475    -4.5067     0.020409


Number of observations: 21, Error degrees of freedom: 4
Root Mean Squared Error: 1.32
R-squared: 0.993,  Adjusted R-Squared: 0.967
F-statistic vs. constant model: 37.1, p-value = 0.00156

dpb on 8 Oct 2025

Edited: dpb on 8 Oct 2025

Open in MATLAB Online

October 2025 Testing.mat

load 'October 2025 Testing'
whos
  Name              Size            Bytes  Class          Attributes

  FivesFipago      21x8              4249  table                    
  ans               1x48               96  char                     
  mdl               1x1             65826  LinearModel              
  tbl              21x8              4249  table                    
head(FivesFipago,2)
    GurleyPorosity    SheffieldSmoothness    CobbAbsorbency    MoistureContent    ContactAngleZero    ContactAngleTwo    ContactAngleDifference    FipagoFive
    ______________    ___________________    ______________    _______________    ________________    _______________    ______________________    __________

         22.9                421.9                47.6              7.41                 86                 78                     -8                  59    
         24.7                415.9                  80              6.68                 91                 53                    -38                  77    
vnames=FivesFipago.Properties.VariableNames;
mdl
mdl = 
Linear regression model:
    FipagoFive ~ [Linear formula with 18 terms in 7 predictors]

Estimated Coefficients:
                                                  Estimate        SE         tStat      pValue  
                                                  _________    _________    _______    _________

    (Intercept)                                     -1621.4       221.63    -7.3157    0.0052752
    GurleyPorosity                                 -0.40375      0.10334    -3.9069     0.029784
    SheffieldSmoothness                              3.1353      0.33324     9.4084    0.0025441
    CobbAbsorbency                                  -6.4152      0.84387    -7.6021    0.0047234
    MoistureContent                                  225.48       25.385     8.8824    0.0030089
    ContactAngleZero                                 6.0657       1.1731     5.1706     0.014036
    ContactAngleTwo                                       0            0        NaN          NaN
    ContactAngleDifference                          -26.253       8.8001    -2.9833     0.058444
    GurleyPorosity:ContactAngleDifference         -0.080539     0.017992    -4.4765     0.020781
    SheffieldSmoothness:MoistureContent            -0.38717     0.038452    -10.069     0.002086
    SheffieldSmoothness:ContactAngleDifference     0.036217     0.011543     3.1375     0.051764
    CobbAbsorbency:MoistureContent                  0.30543     0.032778      9.318    0.0026169
    CobbAbsorbency:ContactAngleTwo                  0.03633    0.0057831     6.2821    0.0081452
    CobbAbsorbency:ContactAngleDifference         -0.072361     0.012034    -6.0132    0.0092153
    MoistureContent:ContactAngleTwo                -0.90509      0.13664    -6.6239     0.007008
    MoistureContent:ContactAngleDifference          0.80417      0.28096     2.8623     0.064458
    ContactAngleZero:ContactAngleDifference         0.17905     0.052756     3.3939     0.042651
    ContactAngleTwo:ContactAngleDifference        -0.020167     0.004475    -4.5067     0.020409


Number of observations: 21, Error degrees of freedom: 4
Root Mean Squared Error: 1.32
R-squared: 0.993,  Adjusted R-Squared: 0.967
F-statistic vs. constant model: 37.1, p-value = 0.00156
mdl.Formula
ans = 
FipagoFive ~ 1 + GurleyPorosity*ContactAngleDifference + SheffieldSmoothness*MoistureContent + SheffieldSmoothness*ContactAngleDifference + CobbAbsorbency*MoistureContent
                + CobbAbsorbency*ContactAngleTwo + CobbAbsorbency*ContactAngleDifference + MoistureContent*ContactAngleTwo + MoistureContent*ContactAngleDifference
                + ContactAngleZero*ContactAngleDifference + ContactAngleTwo*ContactAngleDifference
whos ans
  Name      Size            Bytes  Class                          Attributes

  ans       1x1              3202  classreg.regr.LinearFormula              
terms=strip(split(string(mdl.Formula),{'~','+'}))
terms = 12×1 string array
    "FipagoFive"
    "1"
    "GurleyPorosity*ContactAngleDifference"
    "SheffieldSmoothness*MoistureContent"
    "SheffieldSmoothness*ContactAngleDifference"
    "CobbAbsorbency*MoistureContent"
    "CobbAbsorbency*ContactAngleTwo"
    "CobbAbsorbency*ContactAngleDifference"
    "MoistureContent*ContactAngleTwo"
    "MoistureContent*ContactAngleDifference"
    "ContactAngleZero*ContactAngleDifference"
    "ContactAngleTwo*ContactAngleDifference"

This seems inconsistent; the model summary indicates 18 terms but the formula returns only 11 including the intercept; only the interaction terms are listed in the formula.

I don't follow that at the moment; looks like a bug, maybe, I don't know for sure at the moment, but certainly looks suspicious.

Of course, with only 21 data points, it's severely overfitted although the t-values are significant for all except the one that appears to be completely non-estimable; probably owing to a linear correlation with another; you'll have to investigate that.

Let's see what

mdl.VariableInfo
ans = 8×4 table
                                Class               Range            InModel    IsCategorical
                              __________    _____________________    _______    _____________

    GurleyPorosity            {'double'}    {[   8.6000 52.2000]}     true          false    
    SheffieldSmoothness       {'double'}    {[344.7000 425.9000]}     true          false    
    CobbAbsorbency            {'double'}    {[ 28.7000 153.8000]}     true          false    
    MoistureContent           {'double'}    {[    4.9700 9.1100]}     true          false    
    ContactAngleZero          {'double'}    {[           83 110]}     true          false    
    ContactAngleTwo           {'double'}    {[            0 106]}     true          false    
    ContactAngleDifference    {'double'}    {[           -83 -1]}     true          false    
    FipagoFive                {'double'}    {[            49 77]}     false         false    

shows. OK, it just lists whether the variable is in the model regardless.

coeff=mdl.Coefficients
coeff = 18×4 table
                                                  Estimate        SE         tStat      pValue  
                                                  _________    _________    _______    _________

    (Intercept)                                     -1621.4       221.63    -7.3157    0.0052752
    GurleyPorosity                                 -0.40375      0.10334    -3.9069     0.029784
    SheffieldSmoothness                              3.1353      0.33324     9.4084    0.0025441
    CobbAbsorbency                                  -6.4152      0.84387    -7.6021    0.0047234
    MoistureContent                                  225.48       25.385     8.8824    0.0030089
    ContactAngleZero                                 6.0657       1.1731     5.1706     0.014036
    ContactAngleTwo                                       0            0        NaN          NaN
    ContactAngleDifference                          -26.253       8.8001    -2.9833     0.058444
    GurleyPorosity:ContactAngleDifference         -0.080539     0.017992    -4.4765     0.020781
    SheffieldSmoothness:MoistureContent            -0.38717     0.038452    -10.069     0.002086
    SheffieldSmoothness:ContactAngleDifference     0.036217     0.011543     3.1375     0.051764
    CobbAbsorbency:MoistureContent                  0.30543     0.032778      9.318    0.0026169
    CobbAbsorbency:ContactAngleTwo                  0.03633    0.0057831     6.2821    0.0081452
    CobbAbsorbency:ContactAngleDifference         -0.072361     0.012034    -6.0132    0.0092153
    MoistureContent:ContactAngleTwo                -0.90509      0.13664    -6.6239     0.007008
    MoistureContent:ContactAngleDifference          0.80417      0.28096     2.8623     0.064458

is the summary table output which indicates the model should have them in there as well.

Vars1=mdl.Variables(1,1:7)  % first row of the input data variables
Vars1 = 1×7 table
    GurleyPorosity    SheffieldSmoothness    CobbAbsorbency    MoistureContent    ContactAngleZero    ContactAngleTwo    ContactAngleDifference
    ______________    ___________________    ______________    _______________    ________________    _______________    ______________________

         22.9                421.9                47.6              7.41                 86                 78                     -8          
[predict(mdl,Vars1{1,:}) mdl.Fitted(1)] % see if we can match its expected output
ans = 1×2
   60.1392   60.1392
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

So see what get if use only the coefficients that match the elements in the formula...

terms=terms(2:end);
terms=strrep(terms,'*',':');
terms(1)=coeff.Properties.RowNames(1);      % fixup to match intercept
%coeff.Properties.RowNames
%terms
[ia,ib]=ismember(coeff.Properties.RowNames,terms); % find where terms are in formula

Then need to get the terms of the cross-products location and evalute them to add the term values to see if match prediction;

vars=split(terms(2:end),':');                % get the variables in each cross term
vnames=vnames.';
%v=[arrayfun(@(v)find(matches(vnames,v)),vars(:,1)) arrayfun(@(v)find(matches(vnames,v)),vars(:,2))]   %
V=coeff.Estimate(1);        % intercept
for i=1:height(vars)
  x=FivesFipago.(vars(i,1))(1)*FivesFipago.(vars(i,2))(1);
  V=V+coeff.Estimate(i+8)*x;
end
V
V = -3.3505e+03

As expected, this doesn't match the actual model at all...double-check my work, but it definitely looks like the formula being returned is incomplete in this instance in that it doesn't include all the terms.

One expects maybe there's an issue with the size of the model that with fewer terms it does work correctly and so it hasn't been caught out before.

dpb on 8 Oct 2025

Edited: dpb on 9 Oct 2025

Open in MATLAB Online

October 2025 Testing.mat

load 'October 2025 Testing'

%head(FivesFipago,2)

scatter(FivesFipago.ContactAngleTwo,FivesFipago.FipagoFive)

xlabel('Angle2'), ylabel('Response')

%corr(FivesFipago.ContactAngleTwo,FivesFipago.FipagoFive)

The response isn't linearly correlated with angle2 so not sure why it didn't behave well...would take some more digging.

R=corrcoef(FivesFipago{:,:})

R = 8×8

1.0000 -0.3230 0.1308 0.1541 0.4912 0.3922 0.2939 0.0425 -0.3230 1.0000 -0.1190 -0.2158 -0.5411 -0.4033 -0.2864 -0.3442 0.1308 -0.1190 1.0000 0.0104 0.0529 0.0466 0.0373 0.1689 0.1541 -0.2158 0.0104 1.0000 -0.2884 0.0430 0.1818 0.0953 0.4912 -0.5411 0.0529 -0.2884 1.0000 0.7696 0.5608 0.0049 0.3922 -0.4033 0.0466 0.0430 0.7696 1.0000 0.9602 0.0089 0.2939 -0.2864 0.0373 0.1818 0.5608 0.9602 1.0000 0.0094 0.0425 -0.3442 0.1689 0.0953 0.0049 0.0089 0.0094 1.0000

R(logical(eye(size(R))))=nan; % so can search for off diagonal elements only

[Rmx,imax]=max(abs(R),[],'all','omitnan'); [r,c]=ind2sub(size(R),imax);

Rmx

Rmx = 0.9602

figure

vnames=FivesFipago.Properties.VariableNames;

p=fitlm(FivesFipago,[vnames{r} '~' vnames{c}])

p =

Linear regression model: ContactAngleDifference ~ 1 + ContactAngleTwo Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ __________ (Intercept) -75.214 4.3607 -17.248 4.6098e-13 ContactAngleTwo 0.7405 0.049388 14.993 5.5462e-12 Number of observations: 21, Error degrees of freedom: 19 Root Mean Squared Error: 5.21 R-squared: 0.922, Adjusted R-Squared: 0.918 F-statistic vs. constant model: 225, p-value = 5.55e-12

hL=plot(p); delete(hL(3)) % unclutter confidence limits

xlim(xlim+[-5 0]) % move 0 off axis for visibility

%scatter(FivesFipago.(vnames{r}),FivesFipago.(vnames{c}))

%xlabel(vnames(r)), ylabel(vnames(c))

And there's the issue; AngleTwo and the AngleDifference are very highly correlated; use one or the other, but not both.

Torsten on 9 Oct 2025

@dpb

It seems that if second-order or interaction terms are present in the linear model, the main effects are not included in the formula for the regression model.

The example

Terms Matrix for Matrix Input

under

https://uk.mathworks.com/help/stats/fitlm.html

shows this (illogical) behaviour, but without commenting on it or giving a justification.

dpb on 9 Oct 2025

Well, it is logical in a sense, just not well documented what mdl.Formula property returns...it is the Wilkinson representation of the model in which "x1*x2" implies the interaction term plus all lower terms. There's a difference in specifying "x1*x2" and "x1:x2" which is easy to miss--the latter says to include only the specific interaction term.

I haven't yet investigated fully whether the returned formula property is always in Wilkinson notation regardless of how the input model is specified or whether it just reflects the user input specification.

I'll try to do some more spelunking and see; at least the doc needs to be explicit in what the user should expect; certainly for @Heather's Q? here, it isn't what she's looking for; one would have to build that expression from the table of coefficients similarly as your earlier examples. If that were to be so, that seems rude to be the only way to be able to generate the full representation of the model for presentation purposes, for example.

Sign in to comment.

Sign in to answer this question.

Follow Question

Get the equation for a linear regression model

11 Comments
Show 9 older comments Hide 9 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

Get the equation for a linear regression model

11 Comments Show 9 older comments Hide 9 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

11 Comments
Show 9 older comments Hide 9 older comments