Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Case Study for a Credit Scorecard Analysis

This example shows how to create a creditscorecard object, bin data, display, and plot binned data information. This example also shows how to fit a logistic regression model, obtain a score for the scorecard model, and determine the probabilities of default and validate the credit scorecard model using three different metrics.

Step 1. Create a creditscorecard object.

Use the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). By default, 'ResponseVar' is set to the last column in the data ('status' in this example) and the 'GoodLabel' to the response value with the highest count (0 in this example). The syntax for creditscorecard indicates that 'CustID' is the 'IDVar' to remove from the list of predictors. Also, while not demonstrated in this example, when creating a creditscorecard object using creditscorecard, you can use the optional name-value pair argument WeightsVar to specify observation (sample) weights.

load CreditCardData
sc = creditscorecard(data,'IDVar','CustID')
sc = 

  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {1x11 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x11 table]

Perform some initial data exploration. Inquire about predictor statistics for the categorical variable 'ResStatus' and plot the bin information for 'ResStatus'.

bininfo(sc,'ResStatus')
plotbins(sc,'ResStatus')
ans =

  4x6 table

        Bin         Good    Bad     Odds        WOE       InfoValue
    ____________    ____    ___    ______    _________    _________

    'Home Owner'    365     177    2.0621     0.019329    0.0001682
    'Tenant'        307     167    1.8383    -0.095564    0.0036638
    'Other'         131      53    2.4717      0.20049    0.0059418
    'Totals'        803     397    2.0227          NaN    0.0097738

This bin information contains the frequencies of “Good” and “Bad,” and bin statistics. Avoid having bins with frequencies of zero because they lead to infinite or undefined (NaN) statistics. Use the modifybins or autobinning functions to bin the data accordingly.

For numeric data, a common first step is "fine classing." This means binning the data into several bins, defined with a regular grid. To illustrate this point, use the predictor 'CustIncome'.

cp = 20000:5000:60000;

sc = modifybins(sc,'CustIncome','CutPoints',cp);

bininfo(sc,'CustIncome')
plotbins(sc,'CustIncome')
ans =

  11x6 table

          Bin          Good    Bad     Odds         WOE       InfoValue 
    _______________    ____    ___    _______    _________    __________

    '[-Inf,20000)'       3       5        0.6      -1.2152      0.010765
    '[20000,25000)'     23      16     1.4375     -0.34151     0.0039819
    '[25000,30000)'     38      47    0.80851     -0.91698      0.065166
    '[30000,35000)'    131      75     1.7467     -0.14671      0.003782
    '[35000,40000)'    193      98     1.9694    -0.026696    0.00017359
    '[40000,45000)'    173      76     2.2763      0.11814     0.0028361
    '[45000,50000)'    131      47     2.7872      0.32063      0.014348
    '[50000,55000)'     82      24     3.4167      0.52425      0.021842
    '[55000,60000)'     21       8      2.625      0.26066     0.0015642
    '[60000,Inf]'        8       1          8        1.375      0.010235
    'Totals'           803     397     2.0227          NaN       0.13469

Step 2a. Automatically bin the data.

Use the autobinning function to perform automatic binning for every predictor variable, using the default 'Monotone' algorithm with default algorithm options.

sc = autobinning(sc);

After the automatic binning step, every predictor bin must be reviewed using the bininfo and plotbins functions and fine-tuned. A monotonic, ideally linear trend in the Weight of Evidence (WOE) is desirable for credit scorecards because this translates into linear points for a given predictor. The WOE trends can be visualized using plotbins.

plotbins(sc,sc.PredictorVars)

Unlike the initial plot of 'ResStatus' when the scorecard was created, the new plot for 'ResStatus' shows an increasing WOE trend. This is because the autobinning function, by default, sorts the order of the categories by increasing odds.

These plots show that the 'Monotone' algorithm does a good job finding monotone WOE trends for this dataset. To complete the binning process, it is necessary to make only a few manual adjustments for some predictors using the modifybins function.

Step 2b. Fine-tune the bins using manual binning.

Common steps to manually modify bins are:

  • Use the bininfo function with two output arguments where the second argument contains binning rules.

  • Manually modify the binning rules using the second output argument from bininfo.

  • Set the updated binning rules with modifybins and then use plotbins or bininfo to review the updated bins.

For example, based on the plot for 'CustAge' in Step 2a, bins number 1 and 2 have similar WOE's as do bins number 5 and 6. To merge these bins using the steps outlined above:

[bi,cp] = bininfo(sc,'CustAge');
cp([1 5]) = []; % To merge bins 1 and 2, and bins 5 and 6
sc = modifybins(sc,'CustAge','CutPoints',cp);
plotbins(sc,'CustAge')

For 'CustIncome', based on the plot above, it is best to merge bins 3, 4 and 5 because they have similar WOE's. To merge these bins:

[bi,cp] = bininfo(sc,'CustIncome');
cp([3 4]) = [];
sc = modifybins(sc,'CustIncome','CutPoints',cp);
plotbins(sc,'CustIncome')

For 'TmWBank', based on the plot above, it is best to merge bins 2 and 3 because they have similar WOE's. To merge these bins:

[bi,cp] = bininfo(sc,'TmWBank');
cp(2) = [];
sc = modifybins(sc,'TmWBank','CutPoints',cp);
plotbins(sc,'TmWBank')

For 'AMBalance', based on the plot above, it is best to merge bins 2 and 3 because they have similar WOE's. To merge these bins:

[bi,cp] = bininfo(sc,'AMBalance');
cp(2) = [];
sc = modifybins(sc,'AMBalance','CutPoints',cp);
plotbins(sc,'AMBalance')

Now that the binning fine-tuning is completed, the bins for all predictors have close-to-linear WOE trends.

Step 3. Fit a logistic regression model.

The fitmodel function fits a logistic regression model to the WOE data. fitmodel internally bins the training data, transforms it into WOE values, maps the response variable so that 'Good' is 1, and fits a linear logistic regression model. By default, fitmodel uses a stepwise procedure to determine which predictors should be in the model.

sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8954, Chi2Stat = 32.545914, PValue = 1.1640961e-08
2. Adding TmWBank, Deviance = 1467.3249, Chi2Stat = 23.570535, PValue = 1.2041739e-06
3. Adding AMBalance, Deviance = 1455.858, Chi2Stat = 11.466846, PValue = 0.00070848829
4. Adding EmpStatus, Deviance = 1447.6148, Chi2Stat = 8.2432677, PValue = 0.0040903428
5. Adding CustAge, Deviance = 1442.06, Chi2Stat = 5.5547849, PValue = 0.018430237
6. Adding ResStatus, Deviance = 1437.9435, Chi2Stat = 4.1164321, PValue = 0.042468555
7. Adding OtherCC, Deviance = 1433.7372, Chi2Stat = 4.2063597, PValue = 0.040272676

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate      SE       tStat       pValue  
                   ________    _______    ______    __________

    (Intercept)     0.7024       0.064    10.975    5.0407e-28
    CustAge        0.61562     0.24783    2.4841      0.012988
    ResStatus       1.3776     0.65266    2.1107      0.034799
    EmpStatus      0.88592     0.29296     3.024     0.0024946
    CustIncome     0.69836     0.21715     3.216     0.0013001
    TmWBank          1.106     0.23266    4.7538    1.9958e-06
    OtherCC         1.0933     0.52911    2.0662      0.038806
    AMBalance       1.0437     0.32292    3.2322     0.0012285


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.42e-16

Step 4. Review and format scorecard points.

After fitting the logistic model, by default the points are unscaled and come directly from the combination of WOE values and model coefficients. The displaypoints function summarizes the scorecard points.

p1 = displaypoints(sc);
disp(p1)
     Predictors            Bin             Points  
    ____________    __________________    _________

    'CustAge'       '[-Inf,37)'            -0.15314
    'CustAge'       '[37,40)'             -0.062247
    'CustAge'       '[40,46)'              0.045763
    'CustAge'       '[46,58)'               0.22888
    'CustAge'       '[58,Inf]'              0.48354
    'ResStatus'     'Tenant'              -0.031302
    'ResStatus'     'Home Owner'            0.12697
    'ResStatus'     'Other'                 0.37652
    'EmpStatus'     'Unknown'             -0.076369
    'EmpStatus'     'Employed'              0.31456
    'CustIncome'    '[-Inf,29000)'         -0.45455
    'CustIncome'    '[29000,33000)'         -0.1037
    'CustIncome'    '[33000,42000)'        0.077768
    'CustIncome'    '[42000,47000)'         0.24406
    'CustIncome'    '[47000,Inf]'           0.43536
    'TmWBank'       '[-Inf,12)'            -0.18221
    'TmWBank'       '[12,45)'             -0.038279
    'TmWBank'       '[45,71)'               0.39569
    'TmWBank'       '[71,Inf]'              0.95074
    'OtherCC'       'No'                     -0.193
    'OtherCC'       'Yes'                   0.15868
    'AMBalance'     '[-Inf,558.88)'          0.3552
    'AMBalance'     '[558.88,1597.44)'    -0.026797
    'AMBalance'     '[1597.44,Inf]'        -0.21168

This is a good time to modify the bin labels, if this is something of interest for cosmetic reasons. To do so, use modifybins to change the bin labels.

sc = modifybins(sc,'CustAge','BinLabels',...
{'Up to 36' '37 to 39' '40 to 45' '46 to 57' '58 and up'});

sc = modifybins(sc,'CustIncome','BinLabels',...
{'Up to 28999' '29000 to 32999' '33000 to 41999' '42000 to 46999' '47000 and up'});

sc = modifybins(sc,'TmWBank','BinLabels',...
{'Up to 11' '12 to 44' '45 to 70' '71 and up'});

sc = modifybins(sc,'AMBalance','BinLabels',...
{'Up to 558.87' '558.88 to 1597.43' '1597.44 and up'});

p1 = displaypoints(sc);
disp(p1)
     Predictors             Bin             Points  
    ____________    ___________________    _________

    'CustAge'       'Up to 36'              -0.15314
    'CustAge'       '37 to 39'             -0.062247
    'CustAge'       '40 to 45'              0.045763
    'CustAge'       '46 to 57'               0.22888
    'CustAge'       '58 and up'              0.48354
    'ResStatus'     'Tenant'               -0.031302
    'ResStatus'     'Home Owner'             0.12697
    'ResStatus'     'Other'                  0.37652
    'EmpStatus'     'Unknown'              -0.076369
    'EmpStatus'     'Employed'               0.31456
    'CustIncome'    'Up to 28999'           -0.45455
    'CustIncome'    '29000 to 32999'         -0.1037
    'CustIncome'    '33000 to 41999'        0.077768
    'CustIncome'    '42000 to 46999'         0.24406
    'CustIncome'    '47000 and up'           0.43536
    'TmWBank'       'Up to 11'              -0.18221
    'TmWBank'       '12 to 44'             -0.038279
    'TmWBank'       '45 to 70'               0.39569
    'TmWBank'       '71 and up'              0.95074
    'OtherCC'       'No'                      -0.193
    'OtherCC'       'Yes'                    0.15868
    'AMBalance'     'Up to 558.87'            0.3552
    'AMBalance'     '558.88 to 1597.43'    -0.026797
    'AMBalance'     '1597.44 and up'        -0.21168

Points are usually scaled and also often rounded. To do this, use the formatpoints function. For example, you can set a target level of points corresponding to a target odds level and also set the required points-to-double-the-odds (PDO).

TargetPoints = 500;
TargetOdds = 2;
PDO = 50; % Points to double the odds

sc = formatpoints(sc,'PointsOddsAndPDO',[TargetPoints TargetOdds PDO]);
p2 = displaypoints(sc);
disp(p2)
     Predictors             Bin            Points
    ____________    ___________________    ______

    'CustAge'       'Up to 36'             53.239
    'CustAge'       '37 to 39'             59.796
    'CustAge'       '40 to 45'             67.587
    'CustAge'       '46 to 57'             80.796
    'CustAge'       '58 and up'            99.166
    'ResStatus'     'Tenant'               62.028
    'ResStatus'     'Home Owner'           73.445
    'ResStatus'     'Other'                91.446
    'EmpStatus'     'Unknown'              58.777
    'EmpStatus'     'Employed'             86.976
    'CustIncome'    'Up to 28999'          31.497
    'CustIncome'    '29000 to 32999'       56.805
    'CustIncome'    '33000 to 41999'       69.896
    'CustIncome'    '42000 to 46999'       81.891
    'CustIncome'    '47000 and up'          95.69
    'TmWBank'       'Up to 11'             51.142
    'TmWBank'       '12 to 44'             61.524
    'TmWBank'       '45 to 70'             92.829
    'TmWBank'       '71 and up'            132.87
    'OtherCC'       'No'                   50.364
    'OtherCC'       'Yes'                  75.732
    'AMBalance'     'Up to 558.87'         89.908
    'AMBalance'     '558.88 to 1597.43'    62.353
    'AMBalance'     '1597.44 and up'       49.016

Step 5. Score the data.

The score function computes the scores for the training data. An optional data input can also be passed to score, for example, validation data. The points per predictor for each customer are provided as an optional output.

[Scores,Points] = score(sc);
disp(Scores(1:10))
disp(Points(1:10,:))
  528.2044
  554.8861
  505.2406
  564.0717
  554.8861
  586.1904
  441.8755
  515.8125
  524.4553
  508.3169

    CustAge    ResStatus    EmpStatus    CustIncome    TmWBank    OtherCC    AMBalance
    _______    _________    _________    __________    _______    _______    _________

    80.796     62.028       58.777        95.69        92.829     75.732     62.353   
    99.166     73.445       86.976        95.69        61.524     75.732     62.353   
    80.796     62.028       86.976       69.896        92.829     50.364     62.353   
    80.796     73.445       86.976        95.69        61.524     75.732     89.908   
    99.166     73.445       86.976        95.69        61.524     75.732     62.353   
    99.166     73.445       86.976        95.69        92.829     75.732     62.353   
    53.239     73.445       58.777       56.805        61.524     75.732     62.353   
    80.796     91.446       86.976        95.69        61.524     50.364     49.016   
    80.796     62.028       58.777        95.69        61.524     75.732     89.908   
    80.796     73.445       58.777        95.69        61.524     75.732     62.353   

Step 6. Calculate the probability of default.

To calculate the probability of default, use the probdefault function.

pd = probdefault(sc);

Define the probability of being “Good” and plot the predicted odds versus the formatted scores. Visually analyze that the target points and target odds match and that the points-to-double-the-odds (PDO) relationship holds.

ProbGood = 1-pd;
PredictedOdds = ProbGood./pd;

figure
scatter(Scores,PredictedOdds)
title('Predicted Odds vs. Score')
xlabel('Score')
ylabel('Predicted Odds')

hold on

xLimits = xlim;
yLimits = ylim;

% Target points and odds
plot([TargetPoints TargetPoints],[yLimits(1) TargetOdds],'k:')
plot([xLimits(1) TargetPoints],[TargetOdds TargetOdds],'k:')

% Target points plus PDO
plot([TargetPoints+PDO TargetPoints+PDO],[yLimits(1) 2*TargetOdds],'k:')
plot([xLimits(1) TargetPoints+PDO],[2*TargetOdds 2*TargetOdds],'k:')

% Target points minus PDO
plot([TargetPoints-PDO TargetPoints-PDO],[yLimits(1) TargetOdds/2],'k:')
plot([xLimits(1) TargetPoints-PDO],[TargetOdds/2 TargetOdds/2],'k:')

hold off

Step 7. Validate the credit scorecard model using the CAP, ROC, and Kolmogorov-Smirnov statistic

The creditscorecard class supports three validation methods, the Cumulative Accuracy Profile (CAP), the Receiver Operating Characteristic (ROC), and the Kolmogorov-Smirnov (K-S) statistic. For more information on CAP, ROC, and KS, see Cumulative Accuracy Profile (CAP), Receiver Operating Characteristic (ROC), and Kolmogorov-Smirnov statistic (KS).

[Stats,T] = validatemodel(sc,'Plot',{'CAP','ROC','KS'});
disp(Stats)
disp(T(1:15,:))
           Measure             Value 
    ______________________    _______

    'Accuracy Ratio'          0.32225
    'Area under ROC curve'    0.66113
    'KS statistic'            0.22324
    'KS score'                 499.18

    Scores    ProbDefault    TrueBads    FalseBads    TrueGoods    FalseGoods    Sensitivity    FalseAlarm      PctObs  
    ______    ___________    ________    _________    _________    __________    ___________    __________    __________

     369.4     0.7535         0          1            802          397                   0      0.0012453     0.00083333
    377.86    0.73107         1          1            802          396           0.0025189      0.0012453      0.0016667
    379.78     0.7258         2          1            802          395           0.0050378      0.0012453         0.0025
    391.81    0.69139         3          1            802          394           0.0075567      0.0012453      0.0033333
    394.77    0.68259         3          2            801          394           0.0075567      0.0024907      0.0041667
    395.78    0.67954         4          2            801          393            0.010076      0.0024907          0.005
    396.95    0.67598         5          2            801          392            0.012594      0.0024907      0.0058333
    398.37    0.67167         6          2            801          391            0.015113      0.0024907      0.0066667
    401.26    0.66276         7          2            801          390            0.017632      0.0024907         0.0075
    403.23    0.65664         8          2            801          389            0.020151      0.0024907      0.0083333
    405.09    0.65081         8          3            800          389            0.020151       0.003736      0.0091667
    405.15    0.65062        11          5            798          386            0.027708      0.0062267       0.013333
    405.37    0.64991        11          6            797          386            0.027708       0.007472       0.014167
    406.18    0.64735        12          6            797          385            0.030227       0.007472          0.015
    407.14    0.64433        13          6            797          384            0.032746       0.007472       0.015833

Was this topic helpful?