Note: This page has been translated by MathWorks. Please click here

To view all translated materials including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materials including this page, select Japan from the country navigator on the bottom of this page.

**MathWorks Machine Translation**

The automated translation of this page is provided by a general purpose third party translator tool.

MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.

Validate quality of credit scorecard model

`Stats = validatemodel(sc)`

`Stats = validatemodel(sc,data)`

```
[Stats,T]
= validatemodel(sc,Name,Value)
```

```
[Stats,T,hf]
= validatemodel(sc,Name,Value)
```

`[`

validates the quality of the `Stats`

,`T`

]
= validatemodel(`sc`

,`Name,Value`

)`creditscorecard`

model using
the optional name-value pair arguments, and returns
`Stats`

and `T`

outputs.

Create a `creditscorecard`

object using the `CreditCardData.mat`

file to load the `data`

(using a dataset from Refaat 2011).

load CreditCardData sc = creditscorecard(data, 'IDVar','CustID')

sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table]

Perform automatic binning using the default options. By default, `autobinning`

uses the `Monotone`

algorithm.

sc = autobinning(sc);

Fit the model.

sc = fitmodel(sc);

1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Format the unscaled points.

`sc = formatpoints(sc, 'PointsOddsAndPDO',[500,2,50]);`

Score the data.

scores = score(sc);

Validate the credit scorecard model by generating the CAP, ROC, and KS plots.

[Stats,T] = validatemodel(sc,'Plot',{'CAP','ROC','KS'});

disp(Stats)

Measure Value ______________________ _______ 'Accuracy Ratio' 0.32258 'Area under ROC curve' 0.66129 'KS statistic' 0.2246 'KS score' 499.62

disp(T(1:15,:))

Scores ProbDefault TrueBads FalseBads TrueGoods FalseGoods Sensitivity FalseAlarm PctObs ______ ___________ ________ _________ _________ __________ ___________ __________ __________ 369.54 0.75313 0 1 802 397 0 0.0012453 0.00083333 378.19 0.73016 1 1 802 396 0.0025189 0.0012453 0.0016667 380.28 0.72444 2 1 802 395 0.0050378 0.0012453 0.0025 391.49 0.69234 3 1 802 394 0.0075567 0.0012453 0.0033333 395.57 0.68017 4 1 802 393 0.010076 0.0012453 0.0041667 396.14 0.67846 4 2 801 393 0.010076 0.0024907 0.005 396.45 0.67752 5 2 801 392 0.012594 0.0024907 0.0058333 398.61 0.67094 6 2 801 391 0.015113 0.0024907 0.0066667 398.68 0.67072 7 2 801 390 0.017632 0.0024907 0.0075 401.33 0.66255 8 2 801 389 0.020151 0.0024907 0.0083333 402.66 0.65842 8 3 800 389 0.020151 0.003736 0.0091667 404.25 0.65346 9 3 800 388 0.02267 0.003736 0.01 404.73 0.65193 9 4 799 388 0.02267 0.0049813 0.010833 405.53 0.64941 11 4 799 386 0.027708 0.0049813 0.0125 405.7 0.64887 11 5 798 386 0.027708 0.0062267 0.013333

Use the `CreditCardData.mat`

file to load the data (`dataWeights`

) that contains a column (`RowWeights`

) for the weights (using a dataset from Refaat 2011).

```
load CreditCardData
```

Create a `creditscorecard`

object using the optional name-value pair argument for `'WeightsVar'`

.

sc = creditscorecard(dataWeights,'IDVar','CustID','WeightsVar','RowWeights')

sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: 'RowWeights' VarNames: {1x12 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x12 table]

Perform automatic binning.

sc = autobinning(sc)

sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: 'RowWeights' VarNames: {1x12 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x12 table]

Fit the model.

sc = fitmodel(sc);

1. Adding CustIncome, Deviance = 764.3187, Chi2Stat = 15.81927, PValue = 6.968927e-05 2. Adding TmWBank, Deviance = 751.0215, Chi2Stat = 13.29726, PValue = 0.0002657942 3. Adding AMBalance, Deviance = 743.7581, Chi2Stat = 7.263384, PValue = 0.007037455 Generalized linear regression model: logit(status) ~ 1 + CustIncome + TmWBank + AMBalance Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70642 0.088702 7.964 1.6653e-15 CustIncome 1.0268 0.25758 3.9862 6.7132e-05 TmWBank 1.0973 0.31294 3.5063 0.0004543 AMBalance 1.0039 0.37576 2.6717 0.0075464 1200 observations, 1196 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 36.4, p-value = 6.22e-08

Format the unscaled points.

```
sc = formatpoints(sc, 'PointsOddsAndPDO',[500,2,50]);
```

Score the data.

scores = score(sc);

Validate the credit scorecard model by generating the CAP, ROC, and KS plots. When the optional name-value pair argument `'WeightsVar'`

is used to specify observation (sample) weights, the `T`

table uses statistics, sums, and cumulative sums that are weighted counts.

[Stats,T] = validatemodel(sc,'Plot',{'CAP','ROC','KS'}); Stats T(1:10,:)

Stats = 4x2 table Measure Value ______________________ _______ 'Accuracy Ratio' 0.28972 'Area under ROC curve' 0.64486 'KS statistic' 0.23215 'KS score' 505.41 ans = 10x9 table Scores ProbDefault TrueBads FalseBads TrueGoods FalseGoods Sensitivity FalseAlarm PctObs ______ ___________ ________ _________ _________ __________ ___________ __________ _________ 401.34 0.66253 1.0788 0 411.95 201.95 0.0053135 0 0.0017542 407.59 0.64289 4.8363 1.2768 410.67 198.19 0.023821 0.0030995 0.0099405 413.79 0.62292 6.9469 4.6942 407.25 196.08 0.034216 0.011395 0.018929 420.04 0.60236 18.459 9.3899 402.56 184.57 0.090918 0.022794 0.045285 437.27 0.544 18.459 10.514 401.43 184.57 0.090918 0.025523 0.047113 442.83 0.52481 18.973 12.794 399.15 184.06 0.093448 0.031057 0.051655 446.19 0.51319 22.396 14.15 397.8 180.64 0.11031 0.034349 0.059426 449.08 0.50317 24.325 14.405 397.54 178.71 0.11981 0.034968 0.062978 449.73 0.50095 28.246 18.049 393.9 174.78 0.13912 0.043813 0.075279 452.44 0.49153 31.511 23.565 388.38 171.52 0.1552 0.057204 0.089557

`sc`

— Credit scorecard model`creditscorecard`

objectCredit scorecard model, specified as a
`creditscorecard`

object. To create this object,
use `creditscorecard`

.

`data`

— Validation datatable

(Optional) Validation data, specified as a MATLAB^{®} table, where each table row corresponds to individual
observations. The `data`

must contain columns for each
of the predictors in the credit scorecard model. The columns of data can
be any one of the following data types:

Numeric

Logical

Cell array of character vectors

Character array

Categorical

String

String array

In addition, the table must contain a binary response variable.

When observation weights are defined using the optional
`WeightsVar`

name-value pair argument when
creating a `creditscorecard`

object, the weights stored in the `WeightsVar`

column are used when validating the model on the training data.
If a different validation data set is provided using the
optional `data`

input, observation weights for
the validation data must be included in a column whose name
matches `WeightsVar`

, otherwise unit weights
are used for the validation data. For more information, see
Using validatemodel with Weights.

**Data Types: **`table`

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside single quotes (`' '`

). You can
specify several name and value pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

```
sc =
validatemodel(sc,data,'AnalysisLevel','Deciles','Plot','CAP')
```

`'AnalysisLevel'`

— Type of analysis level`'Scores'`

(default) | character vector with values `'Deciles'`

,
`'Scores'`

Type of analysis level, specified as character vector with one of the following values:

`'Scores'`

— Returns the statistics (`Stats`

) at the observation level. Scores are sorted from riskiest to safest, and duplicates are removed.`'Deciles'`

— Returns the statistics (`Stats`

) at decile level. Scores are sorted from riskiest to safest and binned with their corresponding statistics into 10 deciles (10%, 20%, ..., 100%).

**Data Types: **`char`

`'Plot'`

— Type of plot`'None'`

(default) | character vector with values `'None'`

,
`'CAP'`

,
`'ROC'`

,`'KS'`

| cell array of character vectors with values
`'None'`

, `'CAP'`

,
`'ROC'`

,`'KS'`

Type of plot, specified as character vector with one of the following values:

`'None'`

— No plot is displayed.`'CAP'`

— Cumulative Accuracy Profile. Plots the fraction of borrowers up to score “s” versus the fraction of defaulters up to score “s” (`'PctObs'`

versus`'Sensitivity'`

columns of`T`

optional output argument). For more details, see Cumulative Accuracy Profile (CAP).`'ROC'`

— Receiver Operating Characteristic. Plots the fraction of non-defaulters up to score “s” versus the fraction of defaulters up to score “s” (`'FalseAlarm'`

versus`'Sensitivity'`

columns of`T`

optional output argument). For more details, see Receiver Operating Characteristic (ROC).`'KS'`

— Kolmogorov-Smirnov. Plots each score “s” versus the fraction of defaulters up to score “s,” and also versus the fraction of non-defaulters up to score “s” (`'Scores'`

versus both`'Sensitivity'`

and`'FalseAlarm'`

columns of the optional output argument`T`

). For more details, see Kolmogorov-Smirnov statistic (KS).### Tip

For the Kolmogorov-Smirnov statistic option, you can enter

`'KS'`

or`'K-S'`

.

**Data Types: **`char`

| `cell`

`Stats`

— Validation measurestable

Validation measures, returned as a
`4`

-by-`2`

table. The first
column, `'Measure'`

, contains the names of the
following measures:

Accuracy ratio (AR)

Area under the ROC curve (AUROC)

The KS statistic

KS score

The second column, `'Value'`

, contains
the values corresponding to these measures.

`T`

— Validation statistics dataarray

Validation statistics data, returned as an
`N`

-by-`9`

table of validation
statistics data, sorted, by score, from riskiest to safest. When
`AnalysisLevel`

is set to
`'Deciles'`

, `N`

is equal to
`10`

. Otherwise, `N`

is equal to
the total number of unique scores, that is, scores without
duplicates.

The table `T`

contains the following nine columns, in
this order:

`'Scores'`

— Scores sorted from riskiest to safest. The data in this row corresponds to all observations up to, and including the score in this row.`'ProbDefault'`

— Probability of default for observations in this row. For deciles, the average probability of default for all observations in the given decile is reported.`'TrueBads'`

— Cumulative number of “bads” up to, and including, the corresponding score.`'FalseBads'`

— Cumulative number of “goods” up to, and including, the corresponding score.`'TrueGoods'`

— Cumulative number of “goods” above the corresponding score.`'FalseGoods'`

— Cumulative number of “bads” above the corresponding score.`'Sensitivity'`

— Fraction of defaulters (or the cumulative number of “bads” divided by total number of “bads”). This is the distribution of “bads” up to and including the corresponding score.`'FalseAlarm'`

— Fraction of non-defaulters (or the cumulative number of “goods” divided by total number of “goods”). This is the distribution of “goods” up to and including the corresponding score.`'PctObs'`

— Fraction of borrowers, or the cumulative number of observations, divided by total number of observations up to and including the corresponding score.

When creating the `creditscorecard`

object
with `creditscorecard`

,
if the optional name-value pair argument
`WeightsVar`

was used to specify
observation (sample) weights, then the `T`

table uses statistics, sums, and cumulative sums that are
weighted counts.

`hf`

— Handle to the plotted measuresfigure handle

Figure handle to plotted measures, returned as a figure handle or
array of handles. When `Plot`

is set to
`'None'`

, `hf`

is an empty
array.

CAP is generally a concave curve and is also known as the Gini curve, Power curve, or Lorenz curve.

The scores of given observations are sorted from riskiest to safest. For a
given fraction `M`

(0% to 100%) of the total borrowers, the
height of the CAP curve is the fraction of defaulters whose scores are less than
or equal to the maximum score of the fraction `M`

, also known
as “Sensitivity.”

The area under the CAP curve, known as the AUCAP, is then compared to that of
the perfect or “ideal” model, leading to the definition of a
summary index known as the accuracy ratio (*AR*) or the Gini coefficient:

$$AR=\frac{{A}_{R}}{{A}_{P}}$$

To find the receiver operating characteristic (ROC) curve, the proportion of defaulters up to a given score “s,” or “Sensitivity,” is computed.

This proportion is known as the true positive rate (TPR). Additionally, the proportion of nondefaulters up to score “s,“ or “False Alarm Rate,” is also computed. This proportion is also known as the false positive rate (FPR). The ROC curve is the plot of the “Sensitivity” vs. the “False Alarm Rate.” Computing the ROC curve is similar to computing the equivalent of a confusion matrix at each score level.

Similar to the CAP, the ROC has a summary statistic known as the area under
the ROC curve (AUROC). The closer to unity, the better the scoring model. The
accuracy ratio (*AR*) is related to the area under the curve by
the following formula:

$$AR=2(AUROC)-1$$

The Kolmogorov-Smirnov (KS) plot, also known as the fish-eye graph, is a common statistic used to measure the predictive power of scorecards.

The KS plot shows the distribution of defaulters and the distribution of non-defaulters on the same plot. For the distribution of defaulters, each score “s” is plotted versus the proportion of defaulters up to “s," or “Sensitivity." For the distribution of non-defaulters, each score “s” is plotted versus the proportion of non-defaulters up to “s," or “False Alarm." The statistic of interest is called the KS statistic and is the maximum difference between these two distributions (“Sensitivity” minus “False Alarm”). The score at which this maximum is attained is also of interest.

`validatemodel`

with WeightsModel validation statistics incorporate observation weights when these are provided by the user.

Without weights, the validation statistics are based on how many good and bad observations fall below a particular score. On the other hand, when observation weights are provided, the weight (not the count) is accumulated for the good and the bad observations that fall below a particular score.

When observation weights are defined using the optional
`WeightsVar`

name-value pair argument when creating a
`creditscorecard`

object, the
weights stored in the `WeightsVar`

column are used when
validating the model on the training data. When a different validation data set
is provided using the optional `data`

input, observation
weights for the validation data must be included in a column whose name matches
`WeightsVar`

, otherwise unit weights are used for the
validation data set.

Not only the validation statistics, but the credit scorecard scores themselves depend on the observation weights of the training data. For more information, see Using fitmodel with Weights and Credit Scorecard Modeling Using Observation Weights.

[1] *“Basel Committee on Banking Supervision: Studies on the
Validation of Internal Rating Systems.”* Working Paper No. 14,
February 2005.

[2] Refaat, M. *Credit Risk Scorecards: Development and
Implementation Using SAS.* lulu.com, 2011.

[3] Loeffler, G. and Posch, P. N. *Credit Risk Modeling Using Excel
and VBA.* Wiley Finance, 2007.

`bindata`

| `bininfo`

| `creditscorecard`

| `displaypoints`

| `fitmodel`

| `formatpoints`

| `modifybins`

| `modifypredictor`

| `plotbins`

| `predictorinfo`

| `probdefault`

| `score`

| `setmodel`

| `table`

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Was this topic helpful?

You can also select a location from the following list:

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)