Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

bininfo

Return predictor’s bin information

Syntax

bi = bininfo(sc,PredictorName)
bi = bininfo(___,Name,Value)
[bi,bm] = bininfo(sc,PredictorName,Name,Value)
[bi,bm,mv] = bininfo(sc,PredictorName,Name,Value)

Description

example

bi = bininfo(sc,PredictorName) returns information at bin level, such as frequencies of “Good,” “Bad,” and bin statistics for the predictor specified in PredictorName.

example

bi = bininfo(___,Name,Value) adds optional name-value arguments.

example

[bi,bm] = bininfo(sc,PredictorName,Name,Value) adds optional name-value arguments.bininfo also optionally returns the binning map (bm) or bin rules in the form of a vector of cut points for numeric predictors, or a table of category groupings for categorical predictors.

example

[bi,bm,mv] = bininfo(sc,PredictorName,Name,Value) returns information at bin level, such as frequencies of “Good,” “Bad," and bin statistics for the predictor specified in PredictorName using optional name-value pair arguments. bininfo optionally returns the binning map or bin rules in the form of a vector of cut points for numeric predictors, or a table of category groupings for categorical predictors. In addition, optional name-value pair arguments mv returns a numeric array containing the minimum and maximum values, as set (or defined) by the user. The mv output argument is set to an empty array for categorical predictors.

Examples

collapse all

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData 
sc = creditscorecard(data);

Display bin information for the categorical predictor ResStatus.

bi = bininfo(sc,'ResStatus')
bi=4x6 table
        Bin         Good    Bad     Odds        WOE       InfoValue
    ____________    ____    ___    ______    _________    _________

    'Home Owner'    365     177    2.0621     0.019329    0.0001682
    'Tenant'        307     167    1.8383    -0.095564    0.0036638
    'Other'         131      53    2.4717      0.20049    0.0059418
    'Totals'        803     397    2.0227          NaN    0.0097738

Use the CreditCardData.mat file to load the data (dataWeights) that contains a column (RowWeights) for the weights (using a dataset from Refaat 2011).

load CreditCardData

Create a creditscorecard object using the optional name-value pair argument for 'WeightsVar'.

sc = creditscorecard(dataWeights,'WeightsVar','RowWeights')
sc = 

  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: 'RowWeights'
                 VarNames: {1x12 cell}
        NumericPredictors: {1x7 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: ''
            PredictorVars: {1x10 cell}
                     Data: [1200x12 table]

Display bin information for the numerical predictor 'CustIncome'. When the optional name-value pair argument 'WeightsVar' is used to specify observation (sample) weights, the bi table contains weighted counts.

bi = bininfo(sc,'CustIncome');
bi(1:10,:)
ans =

  10x6 table

      Bin       Good        Bad       Odds        WOE       InfoValue 
    _______    _______    _______    _______    ________    __________

    '18000'    0.94515      1.496    0.63179     -1.1667     0.0059198
    '19000'    0.47588    0.80569    0.59065     -1.2341     0.0034716
    '20000'     2.1671     1.4636     1.4806    -0.31509    0.00061392
    '21000'     3.2522    0.88064      3.693     0.59889     0.0021303
    '22000'     1.5438     1.2714     1.2142    -0.51346     0.0012913
    '23000'      1.787     2.7529    0.64913     -1.1397      0.010509
    '24000'     3.4111     2.2538     1.5135    -0.29311    0.00082663
    '25000'     2.2333     6.1383    0.36383     -1.7186      0.042642
    '26000'     2.1246     4.4754    0.47474     -1.4525      0.024526
    '27000'     3.1058      3.528    0.88032    -0.83501     0.0082144

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData 
sc = creditscorecard(data);

Display customized bin information for the categorical predictor ResStatus, keeping only the WOE column.

bi = bininfo(sc,'ResStatus','Statistics','WOE')
bi=4x4 table
        Bin         Good    Bad       WOE   
    ____________    ____    ___    _________

    'Home Owner'    365     177     0.019329
    'Tenant'        307     167    -0.095564
    'Other'         131      53      0.20049
    'Totals'        803     397          NaN

Display customized bin information for the categorical predictor ResStatus, keeping only the Odds and WOE columns, without the Totals row.

bi = bininfo(sc,'ResStatus','Statistics',{'Odds','WOE'},'Totals','Off')
bi=3x5 table
        Bin         Good    Bad     Odds        WOE   
    ____________    ____    ___    ______    _________

    'Home Owner'    365     177    2.0621     0.019329
    'Tenant'        307     167    1.8383    -0.095564
    'Other'         131      53    2.4717      0.20049

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData 
sc = creditscorecard(data);

The binning map or rules for categorical data are summarized in a "category grouping" table, returned as an optional output. By default, each category is placed in a separate bin. Here is the information for the predictor ResStatus.

[bi,cg] = bininfo(sc,'ResStatus')
bi=4x6 table
        Bin         Good    Bad     Odds        WOE       InfoValue
    ____________    ____    ___    ______    _________    _________

    'Home Owner'    365     177    2.0621     0.019329    0.0001682
    'Tenant'        307     167    1.8383    -0.095564    0.0036638
    'Other'         131      53    2.4717      0.20049    0.0059418
    'Totals'        803     397    2.0227          NaN    0.0097738

cg=3x2 table
      Category      BinNumber
    ____________    _________

    'Home Owner'    1        
    'Tenant'        2        
    'Other'         3        

To group categories Tenant and Other, modify the category grouping table cg so that the bin number for Other is the same as the bin number for Tenant. Then use the modifybins function to update the scorecard.

cg.BinNumber(3) = 2;
sc = modifybins(sc,'ResStatus','CatGrouping',cg);

Display the updated bin information. The bin labels have been updated and that the bin membership information is contained in the category grouping cg.

[bi,cg] = bininfo(sc,'ResStatus')
bi=3x6 table
      Bin       Good    Bad     Odds        WOE       InfoValue 
    ________    ____    ___    ______    _________    __________

    'Group1'    365     177    2.0621     0.019329     0.0001682
    'Group2'    438     220    1.9909    -0.015827    0.00013772
    'Totals'    803     397    2.0227          NaN    0.00030592

cg=3x2 table
      Category      BinNumber
    ____________    _________

    'Home Owner'    1        
    'Tenant'        2        
    'Other'         2        

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData 
sc = creditscorecard(data);

The predictor CustIncome is numeric. By default, each value of the predictor is placed in a separate bin.

bi = bininfo(sc,'CustIncome')
bi=46x6 table
      Bin      Good    Bad     Odds         WOE       InfoValue 
    _______    ____    ___    _______    _________    __________

    '18000'     2       3     0.66667      -1.1099     0.0056227
    '19000'     1       2         0.5      -1.3976     0.0053002
    '20000'     4       2           2    -0.011271    6.3641e-07
    '21000'     6       3           2    -0.011271    9.5462e-07
    '22000'     4       2           2    -0.011271    6.3641e-07
    '23000'     4       4           1     -0.70442     0.0035885
    '24000'     5       5           1     -0.70442     0.0044856
    '25000'     4       9     0.44444      -1.5153      0.026805
    '26000'     4      11     0.36364       -1.716      0.038999
    '27000'     6       6           1     -0.70442     0.0053827
    '28000'    13      11      1.1818     -0.53736     0.0061896
    '29000'    11      10         1.1     -0.60911     0.0069988
    '30000'    18      16       1.125     -0.58664      0.010493
    '31000'    24       8           3      0.39419     0.0038382
    '32000'    21      15         1.4     -0.36795     0.0042797
    '33000'    35      19      1.8421    -0.093509    0.00039951

Reduce the number of bins using the autobinning function (the modifybins function can also be used).

sc = autobinning(sc,'CustIncome');

Display the updated bin information. The binning map or rules for numeric data are summarized as "cut points," returned as an optional output (cp).

[bi,cp] = bininfo(sc,'CustIncome')
bi=8x6 table
          Bin          Good    Bad     Odds         WOE       InfoValue 
    _______________    ____    ___    _______    _________    __________

    '[-Inf,29000)'      53      58    0.91379     -0.79457       0.06364
    '[29000,33000)'     74      49     1.5102     -0.29217     0.0091366
    '[33000,35000)'     68      36     1.8889     -0.06843    0.00041042
    '[35000,40000)'    193      98     1.9694    -0.026696    0.00017359
    '[40000,42000)'     68      34          2    -0.011271    1.0819e-05
    '[42000,47000)'    164      66     2.4848      0.20579     0.0078175
    '[47000,Inf]'      183      56     3.2679      0.47972      0.041657
    'Totals'           803     397     2.0227          NaN       0.12285

cp = 

       29000
       33000
       35000
       40000
       42000
       47000

Manually remove the second cut point (the boundary between the second and third bins) to merge bins two and three. Use the modifybins function to update the scorecard.

cp(2) = [];
sc = modifybins(sc,'CustIncome','CutPoints',cp,'MinValue',0);

Display the updated bin information.

[bi,cp,mv] = bininfo(sc,'CustIncome')
bi=7x6 table
          Bin          Good    Bad     Odds         WOE       InfoValue 
    _______________    ____    ___    _______    _________    __________

    '[0,29000)'         53      58    0.91379     -0.79457       0.06364
    '[29000,35000)'    142      85     1.6706     -0.19124     0.0071274
    '[35000,40000)'    193      98     1.9694    -0.026696    0.00017359
    '[40000,42000)'     68      34          2    -0.011271    1.0819e-05
    '[42000,47000)'    164      66     2.4848      0.20579     0.0078175
    '[47000,Inf]'      183      56     3.2679      0.47972      0.041657
    'Totals'           803     397     2.0227          NaN       0.12043

cp = 

       29000
       35000
       40000
       42000
       47000

mv = 

     0   Inf

Note, it is recommended to avoid having bins with frequencies of zero because they lead to infinite or undefined (NaN) statistics. Use the modifybins or autobinning functions to modify bins.

Input Arguments

collapse all

Credit scorecard model, specified as a creditscorecard object. Use creditscorecard to create a creditscorecard object.

Predictor name, specified using a character vector containing the name of the predictor. PredictorName is case-sensitive.

Data Types: char

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: bi = bininfo(sc, PredictorName,'Statistics','WOE','Totals','On')

collapse all

List of statistics to include in the bin information, specified as a character vector or a cell array of character vectors. For more information, see Statistics for a Credit Scorecard. Possible values are:

  • 'Odds' — Odds information is the ratio of “Goods” over “Bads.”

  • 'WOE' — Weight of Evidence. The WOE Statistic measures the deviation between the distribution of “Goods” and “Bads.”

  • 'InfoValue' — Information value. Closely tied to the WOE, it is a statistic used to determine how strong a predictor is to use in the fitting model. It measures how strong the deviation is between the distributions of “Goods” and “Bads.” However, bins with only “Good” or only “Bad” observations do lead to an infinite Information Value. Consider modifying the bins in those cases by using modifybins or autobinning.

  • 'Entropy' — Entropy is a measure of unpredictability contained in the bins. The more the number of “Goods” and “Bads” differ within the bins, the lower the entropy.

Note

Avoid having bins with frequencies of zero because they lead to infinite or undefined (NaN) statistics. Use modifybins or autobinning to modify bins.

Data Types: char | cell

Indicator to include a row of totals at the bottom of the information table, specified as a character vector with values On or Off.

Data Types: char

Output Arguments

collapse all

Bin information, returned as a table. The bin information table contains one row per bin and a row of totals. The columns contain bin descriptions, frequencies of “Good” and “Bad,” and bin statistics. Avoid having bins with frequencies of zero because they lead to infinite or undefined (NaN) statistics. Use modifybins or autobinning to modify bins.

Note

When creating the creditscorecard object with creditscorecard, if the optional name-value pair argument WeightsVar was used to specify observation (sample) weights, then thebi table contains weighted counts.

Binning map or rules, returned as a vector of cut points for numeric predictors, or a table of category groupings for categorical predictors. For more information, see modifybins.

Binning minimum and maximum values (as set or defined by the user), returned as a numeric array. The mv output argument is set to an empty array for categorical predictors.

More About

collapse all

Statistics for a Credit Scorecard

Weight of Evidence (WOE) is a measure of the difference of the distribution of “Goods” and “Bads” within a bin.

Suppose the predictor's data takes on M possible values b1, ..., bM. For binned data, M is a small number. The response takes on two values, “Good” and “Bad.” The frequency table of the data is given by:

 GoodBadTotal
b1:n11n12n1
b2:n21n22n2
bM:nM1nM2nM
Total:nGoodnBadnTotal

The Weight of Evidence (WOE) is defined for each data value bi as

 WOE(i) = log((ni1/nGood)/(ni2/nBad)).

If you define

 pGood(i) = ni1/nGood, pBad(i) = ni2/nBad

then pGood(i) is the proportion of “Good” observations that take on the value bi, and similarly for pBad(i). In other words, pGood(i) gives the distribution of good observations over the M observed values of the predictor, and similarly for pBad(i). With this, an equivalent formula for the WOE is

WOE(i) = log(pGood(i)/pBad(i)).
Using the same frequency table, the odds for row i are defined as
Odds(i) = ni1 / ni2,
and the odds for the sample are defined as
OddsTotal = nGood / nBad.

For each row i, you can also compute its contribution to the total Information Value, given by

InfoValue(i) = (pGood(i) - pBad(i)) * WOE(i),

and the total Information Value is simply the sum of all the InfoValuel(i) terms. (A nansum is returned to discard contributions from rows with no observations at all.)

Likewise, for each row i, we can compute its contribution to the total Entropy, given by

 Entropy(i) = -1/log(2)*(ni1/ni*log(ni1/ni)+ni2/ni*log(ni2/ni),
and the total Entropy is simply the weighted sum of the row entropies,
Entropy = sum(ni/nTotal * Entropy(i)), i = 1...M

Using bininfo with Weights

When observation weights are defined using the optional WeightsVar argument when creating a creditscorecard object, instead of counting the rows that are good or bad in each bin, the bininfo function accumulates the weight of the rows that are good or bad in each bin.

The “frequencies” reported are no longer the basic “count” of rows, but the “cumulative weight” of the rows that are good or bad and fall in a particular bin. Once these “weighted frequencies” are known, all other relevant statistics (Good, Bad, Odds, WOE, and InfoValue) are computed with the usual formulas. For more information, see Credit Scorecard Modeling Using Observation Weights.

References

[1] Anderson, R. The Credit Scoring Toolkit. Oxford University Press, 2007.

[2] Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS. lulu.com, 2011.

Introduced in R2014b

Was this topic helpful?