Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

**MathWorks Machine Translation**

The automated translation of this page is provided by a general purpose third party translator tool.

MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.

Perform automatic binning of given predictors

`sc = autobinning(sc)`

`sc = autobinning(sc,PredictorNames)`

`sc = autobinning(___,Name,Value)`

performs automatic binning of all predictors.`sc`

= autobinning(`sc`

)

Automatic binning finds binning maps or rules to bin numeric data and to
group categories of categorical data. The binning rules are stored in the
`creditscorecard`

object. To apply the binning rules to
the `creditscorecard`

object data, or to a new dataset, use
`bindata`

.

performs automatic binning of the predictors given in
`sc`

= autobinning(`sc`

,`PredictorNames`

)`PredictorNames`

.

Automatic binning finds binning maps or rules to bin numeric data and to
group categories of categorical data. The binning rules are stored in the
`creditscorecard`

object. To apply the binning rules to
the `creditscorecard`

object data, or to a new dataset, use
`bindata`

.

performs automatic binning of the predictors given in
`sc`

= autobinning(___,`Name,Value`

)`PredictorNames`

using optional name-value pair
arguments. See the name-value argument `Algorithm`

for a
description of the supported binning algorithms.

Automatic binning finds binning maps or rules to bin numeric data and to
group categories of categorical data. The binning rules are stored in the
`creditscorecard`

object. To apply the binning rules to
the `creditscorecard`

object data, or to a new dataset, use
`bindata`

.

Create a `creditscorecard`

object using the `CreditCardData.mat`

file to load the data (using a dataset from Refaat 2011).

load CreditCardData sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning using the default options. By default, autobinning bins all predictors and uses the `Monotone`

algorithm.

sc = autobinning(sc);

Use `bininfo`

to display the binned data for the predictor `CustIncome`

.

`bi = bininfo(sc, 'CustIncome')`

`bi=`*8x6 table*
Bin Good Bad Odds WOE InfoValue
_______________ ____ ___ _______ _________ __________
'[-Inf,29000)' 53 58 0.91379 -0.79457 0.06364
'[29000,33000)' 74 49 1.5102 -0.29217 0.0091366
'[33000,35000)' 68 36 1.8889 -0.06843 0.00041042
'[35000,40000)' 193 98 1.9694 -0.026696 0.00017359
'[40000,42000)' 68 34 2 -0.011271 1.0819e-05
'[42000,47000)' 164 66 2.4848 0.20579 0.0078175
'[47000,Inf]' 183 56 3.2679 0.47972 0.041657
'Totals' 803 397 2.0227 NaN 0.12285

Use `plotbins`

to display the histogram and WOE curve for the predictor `CustIncome`

.

`plotbins(sc,'CustIncome')`

Create a `creditscorecard`

object using the `CreditCardData.mat`

file to load the `data`

(using a dataset from Refaat 2011).

```
load CreditCardData
sc = creditscorecard(data);
```

Perform automatic binning for the predictor `CustIncome`

using the default options. By default, `autobinning`

uses the `Monotone`

algorithm.

`sc = autobinning(sc,'CustIncome');`

Use `bininfo`

to display the binned data.

`bi = bininfo(sc, 'CustIncome')`

`bi=`*8x6 table null*
Bin Good Bad Odds WOE InfoValue
_______________ ____ ___ _______ _________ __________
'[-Inf,29000)' 53 58 0.91379 -0.79457 0.06364
'[29000,33000)' 74 49 1.5102 -0.29217 0.0091366
'[33000,35000)' 68 36 1.8889 -0.06843 0.00041042
'[35000,40000)' 193 98 1.9694 -0.026696 0.00017359
'[40000,42000)' 68 34 2 -0.011271 1.0819e-05
'[42000,47000)' 164 66 2.4848 0.20579 0.0078175
'[47000,Inf]' 183 56 3.2679 0.47972 0.041657
'Totals' 803 397 2.0227 NaN 0.12285

Create a `creditscorecard`

object using the `CreditCardData.mat`

file to load the `data`

(using a dataset from Refaat 2011).

```
load CreditCardData
sc = creditscorecard(data);
```

Perform automatic binning for the predictor `CustIncome`

using the `Monotone`

algorithm with the initial number of bins set to 20. This example explicitly sets both the `Algorithm`

and the `AlgorithmOptions`

name-value arguments.

AlgoOptions = {'InitialNumBins',20}; sc = autobinning(sc,'CustIncome','Algorithm','Monotone','AlgorithmOptions',... AlgoOptions);

Use `bininfo`

to display the binned data. Here, the cut points, which delimit the bins, are also displayed.

`[bi,cp] = bininfo(sc,'CustIncome')`

`bi=`*11x6 table*
Bin Good Bad Odds WOE InfoValue
_______________ ____ ___ _______ _________ __________
'[-Inf,19000)' 2 3 0.66667 -1.1099 0.0056227
'[19000,29000)' 51 55 0.92727 -0.77993 0.058516
'[29000,31000)' 29 26 1.1154 -0.59522 0.017486
'[31000,34000)' 80 42 1.9048 -0.060061 0.0003704
'[34000,35000)' 33 17 1.9412 -0.041124 7.095e-05
'[35000,40000)' 193 98 1.9694 -0.026696 0.00017359
'[40000,42000)' 68 34 2 -0.011271 1.0819e-05
'[42000,43000)' 39 16 2.4375 0.18655 0.001542
'[43000,47000)' 125 50 2.5 0.21187 0.0062972
'[47000,Inf]' 183 56 3.2679 0.47972 0.041657
'Totals' 803 397 2.0227 NaN 0.13175

```
cp =
19000
29000
31000
34000
35000
40000
42000
43000
47000
```

This example shows how to use the `autobinning`

default `Monotone`

algorithm and the `AlgorithmOptions`

name-value pair arguments associated with the `Monotone`

algorithm. The `AlgorithmOptions`

for the `Monotone`

algorithm are three name-value pair parameters: `‘InitialNumBins'`

, `'Trend'`

, and `'SortCategories'`

. `'InitialNumBins'`

and `'Trend'`

are applicable for numeric predictors and `'Trend'`

and `'SortCategories'`

are applicable for categorical predictors.

`creditscorecard`

object using the `CreditCardData.mat`

file to load the data (using a dataset from Refaat 2011).

load CreditCardData sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning for the numeric predictor `CustIncome`

using the `Monotone`

algorithm with 20 bins. This example explicitly sets both the `Algorithm`

argument and the `AlgorithmOptions`

name-value arguments for `'InitialNumBins'`

and `'Trend'`

.

AlgoOptions = {'InitialNumBins',20,'Trend','Increasing'}; sc = autobinning(sc,'CustIncome','Algorithm','Monotone',... 'AlgorithmOptions',AlgoOptions);

Use `bininfo`

to display the binned data.

`bi = bininfo(sc,'CustIncome')`

`bi=`*11x6 table*
Bin Good Bad Odds WOE InfoValue
_______________ ____ ___ _______ _________ __________
'[-Inf,19000)' 2 3 0.66667 -1.1099 0.0056227
'[19000,29000)' 51 55 0.92727 -0.77993 0.058516
'[29000,31000)' 29 26 1.1154 -0.59522 0.017486
'[31000,34000)' 80 42 1.9048 -0.060061 0.0003704
'[34000,35000)' 33 17 1.9412 -0.041124 7.095e-05
'[35000,40000)' 193 98 1.9694 -0.026696 0.00017359
'[40000,42000)' 68 34 2 -0.011271 1.0819e-05
'[42000,43000)' 39 16 2.4375 0.18655 0.001542
'[43000,47000)' 125 50 2.5 0.21187 0.0062972
'[47000,Inf]' 183 56 3.2679 0.47972 0.041657
'Totals' 803 397 2.0227 NaN 0.13175

`creditscorecard`

object using the `CreditCardData.mat`

file to load the `data`

(using a dataset from Refaat 2011).

load CreditCardData sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning for the predictor `CustIncome`

and `CustAge`

using the default `Monotone`

algorithm with `AlgorithmOptions`

for `InitialNumBins`

and `Trend`

.

AlgoOptions = {'InitialNumBins',20,'Trend','Increasing'}; sc = autobinning(sc,{'CustAge','CustIncome'},'Algorithm','Monotone',... 'AlgorithmOptions',AlgoOptions);

Use `bininfo`

to display the binned data.

`bi1 = bininfo(sc, 'CustIncome')`

`bi1=`*11x6 table*
Bin Good Bad Odds WOE InfoValue
_______________ ____ ___ _______ _________ __________
'[-Inf,19000)' 2 3 0.66667 -1.1099 0.0056227
'[19000,29000)' 51 55 0.92727 -0.77993 0.058516
'[29000,31000)' 29 26 1.1154 -0.59522 0.017486
'[31000,34000)' 80 42 1.9048 -0.060061 0.0003704
'[34000,35000)' 33 17 1.9412 -0.041124 7.095e-05
'[35000,40000)' 193 98 1.9694 -0.026696 0.00017359
'[40000,42000)' 68 34 2 -0.011271 1.0819e-05
'[42000,43000)' 39 16 2.4375 0.18655 0.001542
'[43000,47000)' 125 50 2.5 0.21187 0.0062972
'[47000,Inf]' 183 56 3.2679 0.47972 0.041657
'Totals' 803 397 2.0227 NaN 0.13175

`bi2 = bininfo(sc, 'CustAge')`

`bi2=`*8x6 table*
Bin Good Bad Odds WOE InfoValue
___________ ____ ___ ______ _________ __________
'[-Inf,35)' 93 76 1.2237 -0.50255 0.038003
'[35,40)' 114 71 1.6056 -0.2309 0.0085141
'[40,42)' 52 30 1.7333 -0.15437 0.0016687
'[42,44)' 58 32 1.8125 -0.10971 0.00091888
'[44,47)' 97 51 1.902 -0.061533 0.00047174
'[47,62)' 333 130 2.5615 0.23619 0.020605
'[62,Inf]' 56 7 8 1.375 0.071647
'Totals' 803 397 2.0227 NaN 0.14183

`creditscorecard`

object using the `CreditCardData.mat`

file to load the `data`

(using a dataset from Refaat 2011).

```
load CreditCardData
sc = creditscorecard(data);
```

Perform automatic binning for the predictor that is a categorical predictor called `ResStatus`

using the default options. By default, `autobinning`

uses the `Monotone`

algorithm.

`sc = autobinning(sc,'ResStatus');`

Use `bininfo`

to display the binned data.

`bi = bininfo(sc, 'ResStatus')`

`bi=`*4x6 table*
Bin Good Bad Odds WOE InfoValue
____________ ____ ___ ______ _________ _________
'Tenant' 307 167 1.8383 -0.095564 0.0036638
'Home Owner' 365 177 2.0621 0.019329 0.0001682
'Other' 131 53 2.4717 0.20049 0.0059418
'Totals' 803 397 2.0227 NaN 0.0097738

This example shows how to modify the data (for this example only) to illustrate binning categorical predictors using the `Monotone`

algorithm.

`creditscorecard`

object using the `CreditCardData.mat`

file to load the `data`

(using a dataset from Refaat 2011).

`load CreditCardData`

Add two new categories and updating the response variable.

newdata = data; rng('default'); %for reproducibility Predictor = 'ResStatus'; Status = newdata.status; NumObs = length(newdata.(Predictor)); Ind1 = randi(NumObs,100,1); Ind2 = randi(NumObs,100,1); newdata.(Predictor)(Ind1) = 'Subtenant'; newdata.(Predictor)(Ind2) = 'CoOwner'; Status(Ind1) = randi(2,100,1)-1; Status(Ind2) = randi(2,100,1)-1; newdata.status = Status;

Update the `creditscorecard`

object using the `newdata`

and plot the bins for a later comparison.

scnew = creditscorecard(newdata,'IDVar','CustID'); [bi,cg] = bininfo(scnew,Predictor)

`bi=`*6x6 table*
Bin Good Bad Odds WOE InfoValue
____________ ____ ___ ______ ________ _________
'Home Owner' 308 154 2 0.092373 0.0032392
'Tenant' 264 136 1.9412 0.06252 0.0012907
'Other' 109 49 2.2245 0.19875 0.0050386
'Subtenant' 42 42 1 -0.60077 0.026813
'CoOwner' 52 44 1.1818 -0.43372 0.015802
'Totals' 775 425 1.8235 NaN 0.052183

`cg=`*5x2 table*
Category BinNumber
____________ _________
'Home Owner' 1
'Tenant' 2
'Other' 3
'Subtenant' 4
'CoOwner' 5

plotbins(scnew,Predictor)

Perform automatic binning for the categorical `Predictor`

using the default `Monotone`

algorithm with the `AlgorithmOptions`

name-value pair arguments for `'SortCategories'`

and `'Trend'`

.

AlgoOptions = {'SortCategories','Goods','Trend','Increasing'}; scnew = autobinning(scnew,Predictor,'Algorithm','Monotone',... 'AlgorithmOptions',AlgoOptions);

Use `bininfo`

to display the bin information. The second output parameter `'cg'`

captures the bin membership, which is the bin number that each group belongs to.

[bi,cg] = bininfo(scnew,Predictor)

`bi=`*4x6 table*
Bin Good Bad Odds WOE InfoValue
________ ____ ___ ______ ________ _________
'Group1' 42 42 1 -0.60077 0.026813
'Group2' 52 44 1.1818 -0.43372 0.015802
'Group3' 681 339 2.0088 0.096788 0.0078459
'Totals' 775 425 1.8235 NaN 0.05046

`cg=`*5x2 table*
Category BinNumber
____________ _________
'Subtenant' 1
'CoOwner' 2
'Other' 3
'Tenant' 3
'Home Owner' 3

Plot bins and compare with the histogram plotted pre-binning.

plotbins(scnew,Predictor)

`sc`

— Credit scorecard model`creditscorecard`

objectCredit scorecard model, specified as a
`creditscorecard`

object. Use `creditscorecard`

to create
a `creditscorecard`

object.

`PredictorNames`

— Predictor or predictors names to automatically bincharacter vector | cell array of character vectors

Predictor or predictors names to automatically bin, specified as a
character vector or a cell array of character vectors containing the
name of the predictor or predictors. `PredictorNames`

are case-sensitive and when no `PredictorNames`

are
defined, all predictors in the `PredictorVars`

property
of the `creditscorecard`

object are binned.

**Data Types: **`char`

| `cell`

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside single quotes (`' '`

). You can
specify several name and value pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

```
sc =
autobinning(sc,'Algorithm','EqualFrequency')
```

`'Algorithm'`

— Algorithm selection`'Monotone'`

(default) | character vector with values
`'Monotone'`

,`'EqualFrequency'`

,
`'EqualWidth'`

Algorithm selection, specified using a character vector indicating
which algorithm to use. The same algorithm is used for all
predictors in `PredictorNames`

. Possible values
are:

`'Monotone'`

— (default) Monotone Adjacent Pooling Algorithm (MAPA), also known as Maximum Likelihood Monotone Coarse Classifier (MLMCC). Supervised optimal binning algorithm that aims to find bins with a monotone Weight-Of-Evidence (WOE) trend. This algorithm assumes that only neighboring attributes can be grouped. Thus, for categorical predictors, categories are sorted before applying the algorithm (see`'SortCategories'`

option for`AlgorithmOptions`

). For more information, see Monotone.`'EqualFrequency'`

— Unsupervised algorithm that divides the data into a predetermined number of bins that contain approximately the same number of observations. This algorithm is also known as “equal height” or “equal depth.” For categorical predictors, categories are sorted before applying the algorithm (see`'SortCategories'`

option for`AlgorithmOptions`

). For more information, see Equal Frequency.`'EqualWidth'`

— Unsupervised algorithm that divides the range of values in the domain of the predictor variable into a predetermined number of bins of “equal width.” For numeric data, the width is measured as the distance between bin edges. For categorical data, width is measured as the number of categories within a bin. For categorical predictors, categories are sorted before applying the algorithm (see`'SortCategories'`

option for`AlgorithmOptions`

). For more information, see Equal Width.

**Data Types: **`char`

`'AlgorithmOptions'`

— Algorithm options for selected `Algorithm`

`{'InitialNumBins',10,'Trend','Auto','SortCategories','Odds'}`

for `Monotone`

(default) | cell array with
`{`

`'OptionName'`

,`}`

for `Algorithm`

optionsAlgorithm options for the selected `Algorithm`

,
specified using a cell array. Possible values are:

For

`Monotone`

algorithm:`{`

`'InitialNumBins',`

*n*`}`

— Initial number (*n*) of bins (default is 10).`'InitialNumBins'`

must be an integer >`2`

. Used for numeric predictors only.`{'Trend','TrendOption'}`

— Determines whether the Weight-Of-Evidence (WOE) monotonic trend is expected to be increasing or decreasing. The values for`'TrendOption'`

are:`'Auto'`

— (Default) Automatically determines if the WOE trend is increasing or decreasing.`'Increasing'`

— Look for an increasing WOE trend.`'Decreasing'`

— Look for a decreasing WOE trend.

The value of the optional input parameter

`'Trend'`

does not necessarily reflect that of the resulting WOE curve. The parameter`'Trend'`

tells the algorithm to “look for” an increasing or decreasing trend, but the outcome may not show the desired trend. For example, the algorithm cannot find a decreasing trend when the data actually has an increasing WOE trend. For more information on the`'Trend'`

option, see Monotone.`{'SortCategories','SortOption'}`

— Used for categorical predictors only. Used to determine how the predictor categories are sorted as a preprocessing step before applying the algorithm. The values of`'SortOption'`

are:`'Odds'`

— (default) The categories are sorted by order of increasing values of odds, defined as the ratio of “Good” to “Bad” observations, for the given category.`'Goods'`

— The categories are sorted by order of increasing values of “Good.”`'Bads'`

— The categories are sorted by order of increasing values of “Bad.”`'Totals'`

— The categories are sorted by order of increasing values of total number of observations (“Good” plus “Bad”).`'None'`

— No sorting is applied. The existing order of the categories is unchanged before applying the algorithm. (The existing order of the categories can be seen in the category grouping optional output from`bininfo`

.)

For more information, see Sort Categories

For

`EqualFrequency`

algorithm:`{'NumBins',`

— Specifies the desired number (*n*}*n*) of bins. The default is`{'NumBins',5}`

and the number of bins must be a positive number.`{'SortCategories','SortOption'}`

— Used for categorical predictors only. Used to determine how the predictor categories are sorted as a preprocessing step before applying the algorithm. The values of`'SortOption'`

are:`'Odds'`

— (default) The categories are sorted by order of increasing values of odds, defined as the ratio of “Good” to “Bad” observations, for the given category.`'Goods'`

— The categories are sorted by order of increasing values of “Good.”`'Bads'`

— The categories are sorted by order of increasing values of “Bad.”`'Totals'`

— The categories are sorted by order of increasing values of total number of observations (“Good” plus “Bad”).`'None'`

— No sorting is applied. The existing order of the categories is unchanged before applying the algorithm. (The existing order of the categories can be seen in the category grouping optional output from`bininfo`

.)

For more information, see Sort Categories

For

`EqualWidth`

algorithm:`{'NumBins',`

— Specifies the desired number (*n*}*n*) of bins. The default is`{'NumBins',5}`

and the number of bins must be a positive number.`{'SortCategories','SortOption'}`

— Used for categorical predictors only. Used to determine how the predictor categories are sorted as a preprocessing step before applying the algorithm. The values of`'SortOption'`

are:`'Odds'`

— (default) The categories are sorted by order of increasing values of odds, defined as the ratio of “Good” to “Bad” observations, for the given category.`'Goods'`

— The categories are sorted by order of increasing values of “Good.”`'Bads'`

— The categories are sorted by order of increasing values of “Bad.”`'Totals'`

— The categories are sorted by order of increasing values of total number of observations (“Good” plus “Bad”).`'None'`

— No sorting is applied. The existing order of the categories is unchanged before applying the algorithm. (The existing order of the categories can be seen in the category grouping optional output from`bininfo`

.)

For more information, see Sort Categories

**Example: **```
sc =
autobinning(sc,'CustAge','Algorithm','Monotone','AlgorithmOptions',{'Trend','Increasing'})
```

**Data Types: **`cell`

`'Display'`

— Indicator to display information on status of the binning process at command line`'Off'`

(default) | character vector with values `'On'`

,
`'Off'`

Indicator to display the information on status of the binning
process at command line, specified using a character vector with a
value of `'On'`

or `'Off'`

.

**Data Types: **`char`

`sc`

— Credit scorecard model`creditscorecard`

objectCredit scorecard model, returned as an updated
`creditscorecard`

object containing the
automatically determined binning maps or rules (cut points or category
groupings) for one or more predictors. For more information on using the
`creditscorecard`

object, see `creditscorecard`

.

If you have previously used the `modifybins`

function to manually modify bins, these changes are lost when
running `autobinning`

because all the data is
automatically binned based on internal autobinning rules.

The `'Monotone'`

algorithm is an
implementation of the Monotone Adjacent Pooling Algorithm (MAPA), also known as
Maximum Likelihood Monotone Coarse Classifier (MLMCC); see Anderson or Thomas in
the References.

**Preprocessing**

During the preprocessing phase, preprocessing of numeric predictors consists
in applying equal frequency binning, with the number of bins determined by the
`'InitialNumBins'`

parameter (the default is 10 bins). The
preprocessing of categorical predictors consists in sorting the categories
according to the `'SortCategories'`

criterion (the default is
to sort by odds in increasing order). Sorting is not applied to ordinal
predictors. See the Sort Categories definition or the
description of `AlgorithmOptions`

option for
`'SortCategories'`

for more information.

**Main Algorithm**

The following example illustrates how the `'Monotone'`

algorithm arrives at cut points for numeric data.

Bin | Good | Bad | Iteration1 | Iteration2 | Iteration3 | Iteration4 |
---|---|---|---|---|---|---|

| 127 | 107 | 0.543 | |||

| 194 | 90 | 0.620 | 0.683 | ||

| 135 | 78 | 0.624 | 0.662 | ||

| 164 | 66 | 0.645 | 0.678 | 0.713 | |

| 183 | 56 | 0.669 | 0.700 | 0.740 | 0.766 |

Initially, the numeric data is preprocessed with an equal frequency binning. In this example, for simplicity, only the five initial bins are used. The first column indicates the equal frequency bin ranges, and the second and third columns have the “Good” and “Bad” counts per bin. (The number of observations is 1,200, so a perfect equal frequency binning would result in five bins with 240 observations each. In this case, the observations per bin do not match 240 exactly. This is a common situation when the data has repeated values.)

Monotone finds break points based on the cumulative proportion of
“Good” observations. In the`'Iteration1'`

column,
the first value (0.543) is the number of “Good” observations in
the first bin (127), divided by the total number of observations in the bin
(127+107). The second value (0.620) is the number of “Good”
observations in bins 1 and 2, divided by the total number of observations in
bins 1 and 2. And so forth. The first cut point is set where the minimum of this
cumulative ratio is found, which is in the first bin in this example. This is
the end of iteration 1.

Starting from the second bin (the first bin after the location of the minimum value in the previous iteration), cumulative proportions of “Good” observations are computed again. The second cut point is set where the minimum of this cumulative ratio is found. In this case, it happens to be in bin number 3, therefore bins 2 and 3 are merged.

The algorithm proceeds the same way for two more iterations. In this particular example, in the end it only merges bins 2 and 3. The final binning has four bins with cut points at 33,000, 42,000, and 47,000.

For categorical data, the only difference is that the preprocessing step consists in reordering the categories. Consider the following categorical data:

Bin | Good | Bad | Odds |
---|---|---|---|

| 365 | 177 | 2.062 |

| 307 | 167 | 1.838 |

| 131 | 53 | 2.474 |

The preprocessing step, by default, sorts the categories by
`'Odds'`

. (See the Sort Categories definition or the
description of `AlgorithmOptions`

option for
`'SortCategories'`

for more information.) Then, it applies
the same steps described above, shown in the following table:

Bin | Good | Bad | Odds | Iteration1 | Iteration2 | Iteration3 |
---|---|---|---|---|---|---|

'Tenant' | 307 | 167 | 1.838 | 0.648 | ||

'Home Owner' | 365 | 177 | 2.062 | 0.661 | 0.673 | |

'Other' | 131 | 53 | 2.472 | 0.669 | 0.683 | 0.712 |

In this case, the Monotone algorithm would not merge any categories. The only
difference, compared with the data before the application of the algorithm, is
that the categories are now sorted by `'Odds'`

.

In both the numeric and categorical examples above, the implicit
`'Trend'`

choice is `'Increasing'`

. (See
the description of `AlgorithmOptions`

option for the
`'Monotone'`

`'Trend'`

option.) If you set the trend to
`'Decreasing'`

, the algorithm looks for the maximum
(instead of the minimum) cumulative ratios to determine the cut points. In that
case, at iteration 1, the maximum would be in the last bin, which would imply
that all bins should be merged into a single bin. Binning into a single bin is a
total loss of information and has no practical use. Therefore, when the chosen
trend leads to a single bin, the Monotone implementation rejects it, and the
algorithm returns the bins found after the preprocessing step. This state is the
initial equal frequency binning for numeric data and the sorted categories for
categorical data. The implementation of the Monotone algorithm by default uses a
heuristic to identify the trend (`'Auto'`

option for
`'Trend'`

).

Unsupervised algorithm that divides the data into a predetermined number of bins that contain approximately the same number of observations.

`EqualFrequency`

is defined as:

Let v[1], v[2],..., v[N] be the sorted list of different values or categories
observed in the data. Let f[*i*] be the frequency of
v[*i*]. Let F[*k*] =
f[1]+...+f[*k*] be the cumulative sum of frequencies up to
the *k*th sorted value. Then F[*N*] is the
same as the total number of observations.

Define AvgFreq = F[*N*] / *NumBins*, which
is the ideal average frequency per bin after binning. The *n*th
cut point index is the index *k* such that the distance
abs(F[*k*] - *n**AvgFreq) is
minimized.

This rule attempts to match the cumulative frequency up to the
*n*th bin. If a single value contains too many
observations, equal frequency bins are not possible, and the above rule yields
less than *NumBins* total bins. In that case, the algorithm
determines *NumBins* bins by breaking up bins, in the order in
which the bins were constructed.

The preprocessing of categorical predictors consists in sorting the categories
according to the `'SortCategories'`

criterion (the default is
to sort by odds in increasing order). Sorting is not applied to ordinal
predictors. See the Sort Categories definition or the
description of `AlgorithmOptions`

option for
`'SortCategories'`

for more information.

Unsupervised algorithm that divides the range of values in the domain of the predictor variable into a predetermined number of bins of “equal width.” For numeric data, the width is measured as the distance between bin edges. For categorical data, width is measured as the number of categories within a bin.

The `EqualWidth`

option is defined as:

For numeric data, if `MinValue`

and
`MaxValue`

are the minimum and maximum data values,
then

Width = (MaxValue - MinValue)/NumBins

`CutPoints`

are set to `MinValue`

+ Width,
`MinValue`

+ 2*Width, ... `MaxValue`

–
Width. If a `MinValue`

or `MaxValue`

have not
been specified using the `modifybins`

function, the
`EqualWidth`

option sets `MinValue`

and
`MaxValue`

to the minimum and maximum values observed in
the data.For categorical data, if there are *NumCats* numbers of
original categories,
then

Width = NumCats / NumBins,

The preprocessing of categorical predictors consists in sorting the categories
according to the `'SortCategories'`

criterion (the default is
to sort by odds in increasing order). Sorting is not applied to ordinal
predictors. See the Sort Categories definition or the
description of `AlgorithmOptions`

option for
`'SortCategories'`

for more information.

As a preprocessing step for categorical data,
`'Monotone'`

, `'EqualFrequency'`

, and
`'EqualWidth'`

support the
`'SortCategories'`

input. This serves the purpose of
reordering the categories before applying the main algorithm. The default
sorting criterion is to sort by `'Odds'`

. For example, suppose
that the data originally looks like this:

Bin | Good | Bad | Odds |
---|---|---|---|

`'Home Owner'` | 365 | 177 | 2.062 |

`'Tenant'` | 307 | 167 | 1.838 |

`'Other'` | 131 | 53 | 2.472 |

After the preprocessing step, the rows would be sorted by
`'Odds'`

and the table looks like this:

Bin | Good | Bad | Odds |
---|---|---|---|

`'Tenant'` | 307 | 167 | 1.838 |

`'Home Owner'` | 365 | 177 | 2.062 |

`'Other'` | 131 | 53 | 2.472 |

The three algorithms only merge adjacent bins, so the initial order of the
categories makes a difference for the final binning. The
`'None'`

option for `'SortCategories'`

would leave the original table unchanged. For a description of the sorting
criteria supported, see the description of the
`AlgorithmOptions`

option for
`'SortCategories'`

.

Upon the construction of a scorecard, the initial order of the categories,
before any algorithm or any binning modifications are applied, is the order
shown in the first output of `bininfo`

. If the bins have been
modified (either manually with `modifybins`

or automatically
with `autobinning`

), use the optional output
(`cg`

,`'category grouping'`

) from
`bininfo`

to get the current
order of the categories.

The `'SortCategories'`

option has no effect on categorical
predictors for which the `'Ordinal'`

parameter is set to true
(see the `'Ordinal'`

input parameter in MATLAB^{®} categorical arrays for `categorical`

. Ordinal data has a
natural order, which is honored in the preprocessing step of the algorithms by
leaving the order of the categories unchanged. Only categorical predictors whose
`'Ordinal'`

parameter is false (default option) are subject
to reordering of categories according to the `'SortCategories'`

criterion.

`autobinning`

with WeightsWhen observation weights are defined using the optional
`WeightsVar`

argument when creating a
`creditscorecard`

object, instead of counting the rows that
are good or bad in each bin, the `autobinning`

function
accumulates the weight of the rows that are good or bad in each bin.

The “frequencies” reported are no longer the basic “count” of rows, but the
“cumulative weight” of the rows that are good or bad and fall in a particular
bin. Once these “weighted frequencies” are known, all other relevant statistics
(`Good`

, `Bad`

, `Odds`

,
`WOE`

, and `InfoValue`

) are computed with
the usual formulas. For more information, see Credit Scorecard Modeling Using Observation Weights.

[1] Anderson, R. *The Credit Scoring Toolkit.* Oxford
University Press, 2007.

[2] Refaat, M. *Data Preparation for Data Mining Using
SAS.* Morgan Kaufmann, 2006.

[3] Refaat, M. *Credit Risk Scorecards: Development and
Implementation Using SAS.* lulu.com, 2011.

[4] Thomas, L., et al. *Credit Scoring and Its
Applications.* Society for Industrial and Applied Mathematics,
2002.

`bindata`

| `bininfo`

| `creditscorecard`

| `displaypoints`

| `fitmodel`

| `formatpoints`

| `modifybins`

| `modifypredictor`

| `plotbins`

| `predictorinfo`

| `probdefault`

| `score`

| `setmodel`

| `validatemodel`

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Was this topic helpful?

You can also select a location from the following list:

- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)