Improve Model Fit with Weights
This example shows how to fit a polynomial model to data using both the linear least-squares method and the weighted least-squares method for comparison.
Generate sample data from different normal distributions by using the
rng("default") % For reproducibility rnorm = ; idx = ; for k=1:20 r = k*randn([20,1]) + (1/20)*(k^3); rnorm = [rnorm;r]; idx = [idx;ones(20,1).*k]; end
The dependent variable
rnorm contains sample data from 20 normal distributions. The independent variable
idx contains integers indicating whether two elements in
rnorm are sampled from the same normal distribution.
Fit a third-degree polynomial model to
rnorm. Return information about the coefficient estimates and the algorithm used to fit the model.
[p3fit,~,fitinfo] = fit(idx,rnorm,"poly3")
p3fit = Linear model Poly3: p3fit(x) = p1*x^3 + p2*x^2 + p3*x + p4 Coefficients (with 95% confidence bounds): p1 = 0.05156 (0.0438, 0.05932) p2 = -0.03993 (-0.2875, 0.2076) p3 = 0.1418 (-2.124, 2.408) p4 = 0.0462 (-5.585, 5.678)
fitinfo = struct with fields: numobs: 400 numparam: 4 residuals: [400×1 double] Jacobian: [400×4 double] exitflag: 1 algorithm: 'QR factorization and solve' iterations: 1
p3fit contains the estimates for the model coefficients with 95% confidence intervals. The default fitting method for fitting a polynomial model is linear least squares.
fitinfo contains information about the fitting algorithm used to fit the coefficients to the data. The error in the data can be estimated by the residuals stored in
Plot the residuals using a stem plot.
stem(idx,fitinfo.residuals) xlabel("idx") ylabel("residuals")
The plot of the residuals shows that their variance increases as the values in
idx increase. The nonconstant variances across the different values of
idx indicate that the weighted least-squares fitting method is more appropriate for calculating the model coefficients.
Create a vector of zeros for later storage of the weights.
W = zeros(length(rnorm),1);
The weights you supply transform the residual variances so that they are constant for different values of
idx. Define the weight for each element in
rnorm as the reciprocal of the residual variance for the corresponding value in
idx. Then fit the model with the weights.
for k=1:20 rnorm_idx = rnorm(idx==k); recipvar = 1/var(rnorm_idx); w = (idx==k).*recipvar; W = W+w; end [wp3fit,~,wfitinfo] = fit(idx,rnorm,"poly3","Weights",W)
wp3fit = Linear model Poly3: wp3fit(x) = p1*x^3 + p2*x^2 + p3*x + p4 Coefficients (with 95% confidence bounds): p1 = 0.04894 (0.04419, 0.0537) p2 = 0.03601 (-0.08744, 0.1595) p3 = -0.4262 (-1.253, 0.4009) p4 = 0.9836 (-0.1959, 2.163)
wfitinfo = struct with fields: numobs: 400 numparam: 4 residuals: [400×1 double] Jacobian: [400×4 double] exitflag: 1 algorithm: 'QR factorization and solve' iterations: 1
wfitinfo contain the results of the weighted least-squares fitting.
rnorm in the same plot.
plot(p3fit,idx,rnorm) hold on plot(wp3fit,"g") xlabel("idx") ylabel("rnorm") legend(["rnorm","linear least-squares fit","weighted least-squares fit"]) hold off
The plot shows
wp3fit closely tracking
You can determine whether
wp3fit is a better fit than
p3fit by plotting the residuals.
stem(idx,wfitinfo.residuals) xlabel("idx") ylabel("residuals")
The output shows that the
wp3fit residuals are smaller than the
p3fit residuals. The variances of the
wp3fit residuals are also more similar for different values of
idx than the variances of the