This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

RegressionLinear class

Linear regression model for high-dimensional data

Description

RegressionLinear is a trained linear model object for regression; the linear model is a support vector machine regression (SVM) or linear regression model. fitrlinear fits a RegressionLinear model by minimizing the objective function using techniques that reduce computation time for high-dimensional data sets (e.g., stochastic gradient descent). The regression loss plus the regularization term compose the objective function.

Unlike other regression models, and for economical memory usage, RegressionLinear model objects do not store the training data. However, they do store, for example, the estimated linear model coefficients, estimated coefficients, and the regularization strength.

You can use trained RegressionLinear models to predict responses for new data. For details, see predict.

Construction

Create a RegressionLinear object by using fitrlinear.

Properties

expand all

Linear Regression Properties

Half of the width of the epsilon-insensitive-band, specified as a nonnegative scalar.

If Learner is not 'svm', then Epsilon is an empty array ([]).

Data Types: single | double

Regularization term strength, specified as a nonnegative scalar or vector of nonnegative values.

Data Types: double | single

Linear regression model type, specified as 'leastsquares' or 'svm'.

In this table, f(x)=xβ+b.

  • β is a vector of p coefficients.

  • x is an observation from p predictor variables.

  • b is the scalar bias.

ValueAlgorithmLoss functionFittedLoss Value
'leastsquares'Linear regression via ordinary least squaresMean squared error (MSE): [y,f(x)]=12[yf(x)]2'mse'
'svm'Support vector machine regressionEpsilon-insensitive: [y,f(x)]=max[0,|yf(x)|ε]'epsiloninsensitive'

Linear coefficient estimates, specified as a numeric vector with length equal to the number of predictors.

Data Types: double

Estimated bias term or model intercept, specified as a numeric scalar.

Data Types: double

Loss function used to fit the model, specified as 'epsiloninsensitive' or 'mse'.

ValueAlgorithmLoss functionLearner Value
'epsiloninsensitive'Support vector machine regressionEpsilon-insensitive: [y,f(x)]=max[0,|yf(x)|ε]'svm'
'mse'Linear regression via ordinary least squaresMean squared error (MSE): [y,f(x)]=12[yf(x)]2'leastsquares'

Complexity penalty type, specified as 'lasso (L1)' or 'ridge (L2)'.

The software composes the objective function for minimization from the sum of the average loss function (see FittedLoss) and a regularization value from this table.

ValueDescription
'lasso (L1)'Lasso (L1) penalty: λj=1p|βj|
'ridge (L2)'Ridge (L2) penalty: λ2j=1pβj2

λ specifies the regularization term strength (see Lambda).

The software excludes the bias term (β0) from the regularization penalty.

Other Regression Properties

Parameters used for training the RegressionLinear model, specified as a structure.

Access fields of ModelParameters using dot notation. For example, access the relative tolerance on the linear coefficients and the bias term by using Mdl.ModelParameters.BetaTolerance.

Data Types: struct

Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X.

Data Types: cell

Expanded predictor names, specified as a cell array of character vectors.

Because a RegressionLinear model does not support categorical predictors, ExpandedPredictorNames and PredictorNames are equal.

Data Types: cell

Response variable name, specified as a character vector.

Data Types: char

Response transformation function, specified as 'none' or a function handle. ResponseTransform describes how the software transforms raw response values.

For a MATLAB® function, or a function that you define, enter its function handle. For example, you can enter Mdl.ResponseTransform = @function, where function accepts a numeric vector of the original responses and returns a numeric vector of the same size containing the transformed responses.

Data Types: char | function_handle

Methods

lossRegression loss for linear regression models
predictPredict response of linear regression model
selectModelsSelect fitted regularized linear regression models

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples

collapse all

Train a linear regression model using SVM, dual SGD, and ridge regularization.

Simulate 10000 observations from this model

  • is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

  • e is random normal error with mean 0 and standard deviation 0.3.

rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Train a linear regression model. By default, fitrlinear uses support vector machines with a ridge penalty, and optimizes using dual SGD for SVM. Determine how well the optimization algorithm fit the model to the data by extracting a fit summary.

[Mdl,FitInfo] = fitrlinear(X,Y)
Mdl = 
  RegressionLinear
         ResponseName: 'Y'
    ResponseTransform: 'none'
                 Beta: [1000x1 double]
                 Bias: -0.0056
               Lambda: 1.0000e-04
              Learner: 'svm'


  Properties, Methods

FitInfo = struct with fields:
                    Lambda: 1.0000e-04
                 Objective: 0.2725
                 PassLimit: 10
                 NumPasses: 10
                BatchLimit: []
             NumIterations: 100000
              GradientNorm: NaN
         GradientTolerance: 0
      RelativeChangeInBeta: 0.4907
             BetaTolerance: 1.0000e-04
             DeltaGradient: 1.5816
    DeltaGradientTolerance: 0.1000
           TerminationCode: 0
         TerminationStatus: {'Iteration limit exceeded.'}
                     Alpha: [10000x1 double]
                   History: []
                   FitTime: 0.1760
                    Solver: {'dual'}

Mdl is a RegressionLinear model. You can pass Mdl and the training or new data to loss to inspect the in-sample mean-squared error. Or, you can pass Mdl and new predictor data to predict to predict responses for new observations.

FitInfo is a structure array containing, among other things, the termination status (TerminationStatus) and how long the solver took to fit the model to the data (FitTime). It is good practice to use FitInfo to determine whether optimization-termination measurements are satisfactory. In this case, fitrlinear reached the maximum number of iterations. Because training time is fast, you can retrain the model, but increase the number of passes through the data. Or, try another solver, such as LBFGS.

Simulate 10000 observations from this model

  • is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

  • e is random normal error with mean 0 and standard deviation 0.3.

rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Hold out 5% of the data.

rng(1); % For reproducibility
cvp = cvpartition(n,'Holdout',0.05)
cvp = 
Hold-out cross validation partition
   NumObservations: 10000
       NumTestSets: 1
         TrainSize: 9500
          TestSize: 500

cvp is a CVPartition object that defines the random partition of n data into training and test sets.

Train a linear regression model using the training set. For faster training time, orient the predictor data matrix so that the observations are in columns.

idxTrain = training(cvp); % Extract training set indices
X = X';
Mdl = fitrlinear(X(:,idxTrain),Y(idxTrain),'ObservationsIn','columns');

Predict observations and the mean squared error (MSE) for the hold out sample.

idxTest = test(cvp); % Extract test set indices
yHat = predict(Mdl,X(:,idxTest),'ObservationsIn','columns');
L = loss(Mdl,X(:,idxTest),Y(idxTest),'ObservationsIn','columns')
L = 0.1851

The hold-out sample MSE is 0.1852.

Extended Capabilities

Introduced in R2016a