# resubEdge

Find classification edge for support vector machine (SVM) classifier by resubstitution

## Syntax

``e = resubEdge(SVMModel)``

## Description

example

````e = resubEdge(SVMModel)` returns the resubstitution Classification Edge (`e`) for the support vector machine (SVM) classifier `SVMModel` using the training data stored in `SVMModel.X` and the corresponding class labels stored in `SVMModel.Y`.The classification edge is a scalar value that represents the weighted mean of the classification margins.```

## Examples

collapse all

Load the `ionosphere` data set.

`load ionosphere`

Train an SVM classifier. Standardize the data and specify that `'g'` is the positive class.

`SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});`

`SVMModel` is a trained `ClassificationSVM` classifier.

Estimate the resubstitution edge. This is the mean of the training sample margins.

`e = resubEdge(SVMModel)`
```e = 5.0998 ```

Perform feature selection by comparing training sample edges from multiple models. Based solely on this comparison, the classifier with the highest edge is the best classifier.

Load the `ionosphere` data set. Define two data sets:

• `fullX` contains all predictors (except the removed column of 0s).

• `partX` contains the last 20 predictors.

```load ionosphere fullX = X; partX = X(:,end-20:end);```

Train SVM classifiers for each predictor set.

```FullSVMModel = fitcsvm(fullX,Y); PartSVMModel = fitcsvm(partX,Y);```

Estimate the training sample edge for each classifier.

`fullEdge = resubEdge(FullSVMModel)`
```fullEdge = 3.3652 ```
`partEdge = resubEdge(PartSVMModel)`
```partEdge = 2.0470 ```

The edge for the classifier trained on the complete data set is greater, suggesting that the classifier trained with all the predictors has a better in-sample fit.

## Input Arguments

collapse all

Full, trained SVM classifier, specified as a `ClassificationSVM` model trained with `fitcsvm`.

collapse all

### Classification Edge

The edge is the weighted mean of the classification margins.

The weights are the prior class probabilities. If you supply weights, then the software normalizes them to sum to the prior probabilities in the respective classes. The software uses the renormalized weights to compute the weighted mean.

One way to choose among multiple classifiers, for example, to perform feature selection, is to choose the classifier that yields the highest edge.

### Classification Margin

The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class.

The software defines the classification margin for binary classification as

`$m=2yf\left(x\right).$`

x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x).

If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.

### Classification Score

The SVM classification score for classifying observation x is the signed distance from x to the decision boundary ranging from -∞ to +∞. A positive score for a class indicates that x is predicted to be in that class. A negative score indicates otherwise.

The positive class classification score $f\left(x\right)$ is the trained SVM classification function. $f\left(x\right)$ is also the numerical, predicted response for x, or the score for predicting x into the positive class.

`$f\left(x\right)=\sum _{j=1}^{n}{\alpha }_{j}{y}_{j}G\left({x}_{j},x\right)+b,$`

where $\left({\alpha }_{1},...,{\alpha }_{n},b\right)$ are the estimated SVM parameters, $G\left({x}_{j},x\right)$ is the dot product in the predictor space between x and the support vectors, and the sum includes the training set observations. The negative class classification score for x, or the score for predicting x into the negative class, is –f(x).

If G(xj,x) = xjx (the linear kernel), then the score function reduces to

`$f\left(x\right)=\left(x/s\right)\prime \beta +b.$`

s is the kernel scale and β is the vector of fitted linear coefficients.

For more details, see Understanding Support Vector Machines.

## Algorithms

For binary classification, the software defines the margin for observation j, mj, as

`${m}_{j}=2{y}_{j}f\left({x}_{j}\right),$`

where yj ∊ {-1,1}, and f(xj) is the predicted score of observation j for the positive class. However, mj = yjf(xj) is commonly used to define the margin.

## References

[1] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.