Main Content

Linear classification learner template

`templateLinear`

creates a template suitable for fitting a linear classification model to high-dimensional data for multiclass problems.

The template specifies the binary learner model, regularization type and strength, and solver, among other things. After creating the template, train the model by passing the template and data to `fitcecoc`

.

returns a linear classification learner template.`t`

= templateLinear()

If you specify a default template, then the software uses default values for all input arguments during training.

returns a template with additional options specified by one or more name-value pair arguments. For example, you can specify to implement logistic regression, specify the regularization type or strength, or specify the solver to use for objective-function minimization.`t`

= templateLinear(`Name,Value`

)

If you display `t`

in the Command Window, then all options appear empty (`[]`

) except options that you specify using name-value pair arguments. During training, the software uses default values for empty options.

It is a best practice to orient your predictor matrix so that observations correspond to columns and to specify

`'ObservationsIn','columns'`

. As a result, you can experience a significant reduction in optimization-execution time.For better optimization accuracy if the predictor data is high-dimensional and

`Regularization`

is`'ridge'`

, set any of these combinations for`Solver`

:`'sgd'`

`'asgd'`

`'dual'`

if`Learner`

is`'svm'`

`{'sgd','lbfgs'}`

`{'asgd','lbfgs'}`

`{'dual','lbfgs'}`

if`Learner`

is`'svm'`

Other combinations can result in poor optimization accuracy.

For better optimization accuracy if the predictor data is moderate- through low-dimensional and

`Regularization`

is`'ridge'`

, set`Solver`

to`'bfgs'`

.If

`Regularization`

is`'lasso'`

, set any of these combinations for`Solver`

:`'sgd'`

`'asgd'`

`'sparsa'`

`{'sgd','sparsa'}`

`{'asgd','sparsa'}`

When choosing between SGD and ASGD, consider that:

SGD takes less time per iteration, but requires more iterations to converge.

ASGD requires fewer iterations to converge, but takes more time per iteration.

If the predictor data has few observations, but many predictor variables, then:

Specify

`'PostFitBias',true`

.For SGD or ASGD solvers, set

`PassLimit`

to a positive integer that is greater than 1, for example, 5 or 10. This setting often results in better accuracy.

For SGD and ASGD solvers,

`BatchSize`

affects the rate of convergence.If

`BatchSize`

is too small, then the software achieves the minimum in many iterations, but computes the gradient per iteration quickly.If

`BatchSize`

is too large, then the software achieves the minimum in fewer iterations, but computes the gradient per iteration slowly.

Large learning rate (see

`LearnRate`

) speed-up convergence to the minimum, but can lead to divergence (that is, over-stepping the minimum). Small learning rates ensure convergence to the minimum, but can lead to slow termination.If

`Regularization`

is`'lasso'`

, then experiment with various values of`TruncationPeriod`

. For example, set`TruncationPeriod`

to`1`

,`10`

, and then`100`

.For efficiency, the software does not standardize predictor data. To standardize the predictor data (

`X`

), enterX = bsxfun(@rdivide,bsxfun(@minus,X,mean(X,2)),std(X,0,2));

The code requires that you orient the predictors and observations as the rows and columns of

`X`

, respectively. Also, for memory-usage economy, the code replaces the original predictor data the standardized data.