Linear classification learner template

`templateLinear`

creates a template
suitable for fitting a linear classification model to high-dimensional
data for multiclass problems.

The template specifies the binary learner model, regularization
type and strength, and solver, among other things. After creating
the template, train the model by passing the template and data to `fitcecoc`

.

returns
a linear classification learner template.`t`

= templateLinear()

If you specify a default template, then the software uses default values for all input arguments during training.

returns
a template with additional options specified by one or more name-value
pair arguments. For example, you can specify to implement logistic
regression, specify the regularization type or strength, or specify
the solver to use for objective-function minimization.`t`

= templateLinear(`Name,Value`

)

If you display `t`

in the Command Window, then
all options appear empty (`[]`

) except options that
you specify using name-value pair arguments. During training, the
software uses default values for empty options.

It is a best practice to orient your predictor matrix so that observations correspond to columns and to specify

`'ObservationsIn','columns'`

. As a result, you can experience a significant reduction in optimization-execution time.For better optimization accuracy if the predictor data is high-dimensional and

`Regularization`

is`'ridge'`

, set any of these combinations for`Solver`

:`'sgd'`

`'asgd'`

`'dual'`

if`Learner`

is`'svm'`

`{'sgd','lbfgs'}`

`{'asgd','lbfgs'}`

`{'dual','lbfgs'}`

if`Learner`

is`'svm'`

Other combinations can result in poor optimization accuracy.

For better optimization accuracy if the predictor data is moderate- through low-dimensional and

`Regularization`

is`'ridge'`

, set`Solver`

to`'bfgs'`

.If

`Regularization`

is`'lasso'`

, set any of these combinations for`Solver`

:`'sgd'`

`'asgd'`

`'sparsa'`

`{'sgd','sparsa'}`

`{'asgd','sparsa'}`

When choosing between SGD and ASGD, consider that:

SGD takes less time per iteration, but requires more iterations to converge.

ASGD requires fewer iterations to converge, but takes more time per iteration.

If the predictor data has few observations, but many predictor variables, then:

Specify

`'PostFitBias',true`

.For SGD or ASGD solvers, set

`PassLimit`

to a positive integer that is greater than 1, for example, 5 or 10. This setting often results in better accuracy.

For SGD and ASGD solvers,

`BatchSize`

affects the rate of convergence.If

`BatchSize`

is too small, then the software achieves the minimum in many iterations, but computes the gradient per iteration quickly.If

`BatchSize`

is too large, then the software achieves the minimum in fewer iterations, but computes the gradient per iteration slowly.

Large learning rate (see

`LearnRate`

) speed-up convergence to the minimum, but can lead to divergence (that is, over-stepping the minimum). Small learning rates ensure convergence to the minimum, but can lead to slow termination.If

`Regularization`

is`'lasso'`

, then experiment with various values of`TruncationPeriod`

. For example, set`TruncationPeriod`

to`1`

,`10`

, and then`100`

.For efficiency, the software does not standardize predictor data. To standardize the predictor data (

`X`

), enterX = bsxfun(@rdivide,bsxfun(@minus,X,mean(X,2)),std(X,0,2));

The code requires that you orient the predictors and observations as the rows and columns of

`X`

, respectively. Also, for memory-usage economy, the code replaces the original predictor data the standardized data.

[1] Hsieh, C. J., K. W. Chang, C. J. Lin,
S. S. Keerthi, and S. Sundararajan. “A Dual Coordinate Descent
Method for Large-Scale Linear SVM.” *Proceedings
of the 25th International Conference on Machine Learning, ICML ’08*,
2001, pp. 408–415.

[2] Langford, J., L. Li, and T. Zhang. “Sparse
Online Learning Via Truncated Gradient.” *J. Mach.
Learn. Res.*, Vol. 10, 2009, pp. 777–801.

[3] Nocedal, J. and S. J. Wright. *Numerical
Optimization*, 2nd ed., New York: Springer, 2006.

[4] Shalev-Shwartz, S., Y. Singer, and N.
Srebro. “Pegasos: Primal Estimated Sub-Gradient Solver for
SVM.” *Proceedings of the 24th International Conference
on Machine Learning, ICML ’07*, 2007, pp. 807–814.

[5] Wright, S. J., R. D. Nowak, and M. A.
T. Figueiredo. “Sparse Reconstruction by Separable Approximation.” *Trans.
Sig. Proc.*, Vol. 57, No 7, 2009, pp. 2479–2493.

[6] Xiao, Lin. “Dual Averaging Methods
for Regularized Stochastic Learning and Online Optimization.” *J.
Mach. Learn. Res.*, Vol. 11, 2010, pp. 2543–2596.

[7] Xu, Wei. “Towards Optimal One Pass
Large Scale Learning with Averaged Stochastic Gradient Descent.” *CoRR*,
abs/1107.2490, 2011.