Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

modifybins

Modify predictor’s bins

Syntax

sc = modifybins(sc,PredictorName,Name,Value)

Description

example

sc = modifybins(sc,PredictorName,Name,Value) manually modifies predictor bins for numeric predictors or categorical predictors using optional name-value pair arguments. For numeric predictors, minimum value, maximum value, and cut points can be specified. For categorical predictors, category groupings can be specified. Bin labels can be specified for both types of predictors.

Examples

collapse all

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData 
sc = creditscorecard(data);

The predictor CustIncome is numeric. By default, each value of a predictor is placed in a separate bin.

bi = bininfo(sc,'CustIncome')
bi=46x6 table
      Bin      Good    Bad     Odds         WOE       InfoValue 
    _______    ____    ___    _______    _________    __________

    '18000'     2       3     0.66667      -1.1099     0.0056227
    '19000'     1       2         0.5      -1.3976     0.0053002
    '20000'     4       2           2    -0.011271    6.3641e-07
    '21000'     6       3           2    -0.011271    9.5462e-07
    '22000'     4       2           2    -0.011271    6.3641e-07
    '23000'     4       4           1     -0.70442     0.0035885
    '24000'     5       5           1     -0.70442     0.0044856
    '25000'     4       9     0.44444      -1.5153      0.026805
    '26000'     4      11     0.36364       -1.716      0.038999
    '27000'     6       6           1     -0.70442     0.0053827
    '28000'    13      11      1.1818     -0.53736     0.0061896
    '29000'    11      10         1.1     -0.60911     0.0069988
    '30000'    18      16       1.125     -0.58664      0.010493
    '31000'    24       8           3      0.39419     0.0038382
    '32000'    21      15         1.4     -0.36795     0.0042797
    '33000'    35      19      1.8421    -0.093509    0.00039951

Use modifybins to set a minimum value of 0, and cut points every 10000, from 20000 to 60000. Display updated bin information, including cut points.

sc = modifybins(sc,'CustIncome','MinValue',0,'CutPoints',20000:10000:60000);
[bi,cp] = bininfo(sc,'CustIncome')
bi=7x6 table
          Bin          Good    Bad     Odds         WOE       InfoValue
    _______________    ____    ___    _______    _________    _________

    '[0,20000)'          3       5        0.6      -1.2152     0.010765
    '[20000,30000)'     61      63    0.96825     -0.73668     0.060942
    '[30000,40000)'    324     173     1.8728    -0.076967    0.0024846
    '[40000,50000)'    304     123     2.4715      0.20042     0.013781
    '[50000,60000)'    103      32     3.2188      0.46457     0.022144
    '[60000,Inf]'        8       1          8        1.375     0.010235
    'Totals'           803     397     2.0227          NaN      0.12035

cp = 

       20000
       30000
       40000
       50000
       60000

The first and last bins contain very few points. To merge the first bin into the second one, remove the first cut point. Similarly, to merge the last bin into the second-to-last one, remove the last cut point. Then use modifybins to update the scorecard, and display updated bin information.

cp(1)=[];
cp(end)=[];
sc = modifybins(sc,'CustIncome','CutPoints',cp);
bi = bininfo(sc,'CustIncome')
bi=5x6 table
          Bin          Good    Bad     Odds         WOE       InfoValue
    _______________    ____    ___    _______    _________    _________

    '[0,30000)'         64      68    0.94118     -0.76504     0.070065
    '[30000,40000)'    324     173     1.8728    -0.076967    0.0024846
    '[40000,50000)'    304     123     2.4715      0.20042     0.013781
    '[50000,Inf]'      111      33     3.3636       0.5086     0.028028
    'Totals'           803     397     2.0227          NaN      0.11436

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData 
sc = creditscorecard(data);

The binning map or rules for categorical data are summarized in a "category grouping" table, returned as an optional output. By default, each category is placed in a separate bin. Here is the information for the predictor ResStatus.

[bi,cg] = bininfo(sc,'ResStatus')
bi=4x6 table
        Bin         Good    Bad     Odds        WOE       InfoValue
    ____________    ____    ___    ______    _________    _________

    'Home Owner'    365     177    2.0621     0.019329    0.0001682
    'Tenant'        307     167    1.8383    -0.095564    0.0036638
    'Other'         131      53    2.4717      0.20049    0.0059418
    'Totals'        803     397    2.0227          NaN    0.0097738

cg=3x2 table
      Category      BinNumber
    ____________    _________

    'Home Owner'    1        
    'Tenant'        2        
    'Other'         3        

To group categories 'Tenant' and 'Other', modify the category grouping table cg, so the bin number for 'Other' is the same as the bin number for 'Tenant'. Then use modifybins to update the scorecard.

cg.BinNumber(3) = 2;
sc = modifybins(sc,'ResStatus','CatGrouping',cg);

Display the updated bin information. Note that the bin labels has been updated and that the bin membership information is contained in the category grouping cg.

[bi,cg] = bininfo(sc,'ResStatus')
bi=3x6 table
      Bin       Good    Bad     Odds        WOE       InfoValue 
    ________    ____    ___    ______    _________    __________

    'Group1'    365     177    2.0621     0.019329     0.0001682
    'Group2'    438     220    1.9909    -0.015827    0.00013772
    'Totals'    803     397    2.0227          NaN    0.00030592

cg=3x2 table
      Category      BinNumber
    ____________    _________

    'Home Owner'    1        
    'Tenant'        2        
    'Other'         2        

Create a creditscorecard object (using a dataset from Refaat 2011).

load CreditCardData
sc = creditscorecard(data,'IDVar','CustID','GoodLabel',0);

For the numerical predictor CustAge, use the modifybins function to set the following cut points:

cp = [25 37 49 65];
sc = modifybins(sc,'CustAge','CutPoints',cp,'MinValue',0,'MaxValue',75);
bininfo(sc,'CustAge')
ans=6x6 table
       Bin       Good    Bad     Odds        WOE       InfoValue
    _________    ____    ___    ______    _________    _________

    '[0,25)'       9       8     1.125     -0.58664    0.0052464
    '[25,37)'    125      92    1.3587     -0.39789     0.030268
    '[37,49)'    340     183    1.8579    -0.084959    0.0031898
    '[49,65)'    298     108    2.7593      0.31054     0.030765
    '[65,75]'     31       6    5.1667      0.93781     0.022031
    'Totals'     803     397    2.0227          NaN       0.0915

Use the modifybins function to merge the 2nd and 3rd bins.

sc = modifybins(sc,'CustAge','CutPoints',cp([1 3 4]));
bininfo(sc,'CustAge')
ans=5x6 table
       Bin       Good    Bad     Odds       WOE       InfoValue
    _________    ____    ___    ______    ________    _________

    '[0,25)'       9       8     1.125    -0.58664    0.0052464
    '[25,49)'    465     275    1.6909    -0.17915     0.020355
    '[49,65)'    298     108    2.7593     0.31054     0.030765
    '[65,75]'     31       6    5.1667     0.93781     0.022031
    'Totals'     803     397    2.0227         NaN     0.078397

Display bin information for the categorical predictor ResStatus.

[bi,cg] = bininfo(sc,'ResStatus');
disp(bi)
        Bin         Good    Bad     Odds        WOE       InfoValue
    ____________    ____    ___    ______    _________    _________

    'Home Owner'    365     177    2.0621     0.019329    0.0001682
    'Tenant'        307     167    1.8383    -0.095564    0.0036638
    'Other'         131      53    2.4717      0.20049    0.0059418
    'Totals'        803     397    2.0227          NaN    0.0097738

Use the modifybins function to merge categories 2 and 3.

cg.BinNumber(3) = 2;
sc = modifybins(sc,'ResStatus','CatGrouping',cg);
bininfo(sc,'ResStatus')
ans=3x6 table
      Bin       Good    Bad     Odds        WOE       InfoValue 
    ________    ____    ___    ______    _________    __________

    'Group1'    365     177    2.0621     0.019329     0.0001682
    'Group2'    438     220    1.9909    -0.015827    0.00013772
    'Totals'    803     397    2.0227          NaN    0.00030592

Create a creditscorecard object (using a dataset from Refaat 2011).

load CreditCardData
sc = creditscorecard(data,'IDVar','CustID','GoodLabel',0)
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {1x11 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x11 table]

For the numerical predictor TmAtAddress, use the modifybins function to set the following cut points:

cp = [30 80 120];
sc = modifybins(sc,'TmAtAddress','CutPoints',cp,'MinValue',0,'MaxValue',210);
bininfo(sc,'TmAtAddress')
ans=5x6 table
        Bin        Good    Bad     Odds        WOE       InfoValue 
    ___________    ____    ___    ______    _________    __________

    '[0,30)'       330     154    2.1429     0.057722     0.0013305
    '[30,80)'      379     201    1.8856    -0.070187     0.0024086
    '[80,120)'      78      36    2.1667     0.068771    0.00044396
    '[120,210]'     16       6    2.6667      0.27641     0.0013301
    'Totals'       803     397    2.0227          NaN     0.0055131

Use the modifybins function to split the 2nd bin.

sc = modifybins(sc,'TmAtAddress','CutPoints',[cp(1) 50 cp(2:end)]);
bininfo(sc,'TmAtAddress')
ans=6x6 table
        Bin        Good    Bad     Odds        WOE       InfoValue 
    ___________    ____    ___    ______    _________    __________

    '[0,30)'       330     154    2.1429     0.057722     0.0013305
    '[30,50)'      211     104    2.0288    0.0030488    2.4387e-06
    '[50,80)'      168      97     1.732     -0.15517      0.005449
    '[80,120)'      78      36    2.1667     0.068771    0.00044396
    '[120,210]'     16       6    2.6667      0.27641     0.0013301
    'Totals'       803     397    2.0227          NaN     0.0085559

Display bin information for the categorical predictor ResStatus.

[bi,cg] = bininfo(sc,'ResStatus')
bi=4x6 table
        Bin         Good    Bad     Odds        WOE       InfoValue
    ____________    ____    ___    ______    _________    _________

    'Home Owner'    365     177    2.0621     0.019329    0.0001682
    'Tenant'        307     167    1.8383    -0.095564    0.0036638
    'Other'         131      53    2.4717      0.20049    0.0059418
    'Totals'        803     397    2.0227          NaN    0.0097738

cg=3x2 table
      Category      BinNumber
    ____________    _________

    'Home Owner'    1        
    'Tenant'        2        
    'Other'         3        

Use the modifybins function to merge categories 2 and 3.

cg.BinNumber(3) = 2;
sc = modifybins(sc,'ResStatus','CatGrouping',cg);
bininfo(sc,'ResStatus')
ans=3x6 table
      Bin       Good    Bad     Odds        WOE       InfoValue 
    ________    ____    ___    ______    _________    __________

    'Group1'    365     177    2.0621     0.019329     0.0001682
    'Group2'    438     220    1.9909    -0.015827    0.00013772
    'Totals'    803     397    2.0227          NaN    0.00030592

Use the modifybins function to split bin 2 and put Other under bin 3.

cg.BinNumber(3) = 3;
sc = modifybins(sc,'ResStatus','CatGrouping',cg);
[bi,cg] = bininfo(sc,'ResStatus')
bi=4x6 table
        Bin         Good    Bad     Odds        WOE       InfoValue
    ____________    ____    ___    ______    _________    _________

    'Home Owner'    365     177    2.0621     0.019329    0.0001682
    'Tenant'        307     167    1.8383    -0.095564    0.0036638
    'Other'         131      53    2.4717      0.20049    0.0059418
    'Totals'        803     397    2.0227          NaN    0.0097738

cg=3x2 table
      Category      BinNumber
    ____________    _________

    'Home Owner'    1        
    'Tenant'        2        
    'Other'         3        

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData 
sc = creditscorecard(data);

Use modifybins to reset the minimum value and create three bins for the predictor CustIncome and display updated bin information.

sc = modifybins(sc,'CustIncome','MinValue',0,'CutPoints',[30000 50000]);
bi = bininfo(sc,'CustIncome')
bi=4x6 table
          Bin          Good    Bad     Odds        WOE       InfoValue
    _______________    ____    ___    _______    ________    _________

    '[0,30000)'         64      68    0.94118    -0.76504     0.070065
    '[30000,50000)'    628     296     2.1216    0.047762    0.0017421
    '[50000,Inf]'      111      33     3.3636      0.5086     0.028028
    'Totals'           803     397     2.0227         NaN     0.099836

Modify the bin labels and display updated bin information.

NewLabels = {'Up to 30k','30k to 50k','50k and more'};
sc = modifybins(sc,'CustIncome','BinLabels',NewLabels);
bi = bininfo(sc,'CustIncome')
bi=4x6 table
         Bin          Good    Bad     Odds        WOE       InfoValue
    ______________    ____    ___    _______    ________    _________

    'Up to 30k'        64      68    0.94118    -0.76504     0.070065
    '30k to 50k'      628     296     2.1216    0.047762    0.0017421
    '50k and more'    111      33     3.3636      0.5086     0.028028
    'Totals'          803     397     2.0227         NaN     0.099836

Bin labels should be the last bin-modification step. As in this example, user-defined bin labels often contain information about the cut points, minimum, or maximum values for numeric data, or information about category groupings for categorical data. To prevent situations where user-defined labels and cut points are inconsistent (and labels are misleading), the creditscorecard object overrides user-defined labels every time the bins are modified using modifybins.

To illustrate modifybins overriding user-defined labels every time the bins are modified, reset the first cut point to 31000 and display updated bin information. Note that the bin labels are reset to their default format and accurately reflect the change in the cut points.

sc = modifybins(sc,'CustIncome','CutPoints',[31000 50000]);
bi = bininfo(sc,'CustIncome')
bi=4x6 table
          Bin          Good    Bad     Odds        WOE       InfoValue
    _______________    ____    ___    _______    ________    _________

    '[0,31000)'         82      84    0.97619    -0.72852     0.079751
    '[31000,50000)'    610     280     2.1786    0.074251    0.0040364
    '[50000,Inf]'      111      33     3.3636      0.5086     0.028028
    'Totals'           803     397     2.0227         NaN      0.11182

Input Arguments

collapse all

Credit scorecard model, specified as a creditscorecard object. Use creditscorecard to create a creditscorecard object.

Name of predictor, specified as a character vector containing the name of the predictor. PredictorName is case-sensitive.

Data Types: char

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: sc = modifybins(sc,PredictorName,'MinValue',10,'CutPoints',[23, 44, 66, 88])

collapse all

Minimum acceptable value, specified as a numeric value (for numeric predictors only). Values below this number are considered out of range.

Data Types: double

Maximum acceptable value, specified as a numeric value (for numeric predictors only). Values above this number are considered out of range.

Data Types: double

Split points between bins, specified using a nondecreasing numeric array. If there are NumBins bins, there are n = NumBins1 cut points so that C1, C2,..., Cn describe the bin boundaries with the following convention:

  • The first bin includes any values >= MinValue, but < C1.

  • The second bin includes any values >= C1, but < C2.

  • The last bin includes any values >= Cn, and <= MaxValue.

Note

Cut points do not include MinValue or MaxValue.

By default, cut points are defined so that each observed value of the predictor is placed in a separate bin. If the sorted observed values are V1, …, VM, the default cut points are V2, …, VM, which define M bins.

Data Types: double

Table with two columns named Category and BinNumber specified using a table, where the first column contains an exhaustive list of categories for the predictor, and the second column contains the bin number to which each category belongs.

By default, each category is placed in a separate bin. If the observed categories are 'Cat1'…,'CatM', the default category grouping is as follows.

CategoryBinNumber
'Cat1'1
'Cat2'2
......
'CatM’'M

Data Types: double

Bin labels for each bin, specified using a cell array of character vectors with bin label names. Bin labels are used to tag the bins in different object functions such as bininfo, plotbins, and displaypoints. A creditscorecard object automatically sets default bins whenever bins are modified. The default format for bin labels depends on the predictor’s type.

The format for BinLabels is:

  • Numeric data — Before any manual or automatic modification of the predictor bins, there is a bin for each observed predictor value by default. In that case, the bin labels simply show the predictor values. Once the predictor bins have been modified, there are nondefault values for MinValue or MaxValue, or nondefault cut points C1, C2,..., Cn. In that case, the bin labels are:

    • Bin 1 label: '[MinValue, C1)'

    • Bin 2 label: '[C1, C2)'

    • Last bin label: '[Cn, MaxValue]'

    For example, if there are three bins, MinValue is 0 and MaxValue is 40, and cut point 1 is 20 and cut point 2 is 30, then the corresponding three bin labels are:

    '[0,20)'
    '[20,30)'
    '[30,40]'

  • Categorical data — For categorical data, before any modification of the predictor bins, there is one bin per category. In that case, the bin labels simply show the predictor categories. Once the bins have been modified, the labels are set to 'Group1', 'Group2', etc., for bin 1, bin 2, etc., respectively. For example, suppose that we have the following category grouping

    CategoryBinNumber
    'Cat1'1
    'Cat2'2
    'Cat3'2

    Bin 1 contains 'Cat1' only and its bin label is set to 'Group1'. Bin 2 contains 'Cat2' and 'Cat3' and its bin label is set to 'Group2'.

Tip

Using BinLabels should be the last step (if needed) in modifying bins. BinLabels definitions are overridden each time that the bins are modified using the modifybins or autobinning functions.

Data Types: cell

Output Arguments

collapse all

Credit scorecard model, returned as an updated creditscorecard object. For more information on using the creditscorecard object, see creditscorecard.

References

[1] Anderson, R. The Credit Scoring Toolkit. Oxford University Press, 2007.

[2] Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS. lulu.com, 2011.

Introduced in R2014b

Was this topic helpful?