This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

formatpoints

Format scorecard points and scaling

Syntax

sc = formatpoints(sc,Name,Value)

Description

example

sc = formatpoints(sc,Name,Value) modifies the scorecard points and scaling using optional name-value pair arguments. For example, use optional name-value pair arguments to change the scaling of the scores or the rounding of the points.

Examples

collapse all

This example shows how to use formatpoints to scale by providing the Worst and Best score values. By using formatpoints to scale, you can put points and scores in a desired range that is more meaningful for practical purposes. Technically, this involves a linear transformation from the unscaled to the scaled points.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument in creditscorecard to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70239     0.064001    10.975    5.0538e-28
    CustAge        0.60833      0.24932      2.44      0.014687
    ResStatus        1.377      0.65272    2.1097      0.034888
    EmpStatus      0.88565        0.293    3.0227     0.0025055
    CustIncome     0.70164      0.21844    3.2121     0.0013179
    TmWBank         1.1074      0.23271    4.7589    1.9464e-06
    OtherCC         1.0883      0.52912    2.0569      0.039696
    AMBalance        1.045      0.32214    3.2439     0.0011792


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin           Points  
    ____________    _______________    _________

    'CustAge'       '[-Inf,33)'         -0.15894
    'CustAge'       '[33,37)'           -0.14036
    'CustAge'       '[37,40)'          -0.060323
    'CustAge'       '[40,46)'           0.046408
    'CustAge'       '[46,48)'            0.21445
    'CustAge'       '[48,58)'            0.23039
    'CustAge'       '[58,Inf]'             0.479
    'ResStatus'     'Tenant'           -0.031252
    'ResStatus'     'Home Owner'         0.12696
    'ResStatus'     'Other'              0.37641
    'EmpStatus'     'Unknown'          -0.076317
    'EmpStatus'     'Employed'           0.31449
    'CustIncome'    '[-Inf,29000)'      -0.45716
    'CustIncome'    '[29000,33000)'     -0.10466
    'CustIncome'    '[33000,35000)'     0.052329
    'CustIncome'    '[35000,40000)'     0.081611
      ⋮

MinScore = -1.3100
MaxScore = 3.0726

Scale by providing the 'Worst' and 'Best' score values. The range provided below is a common score range. Display the points information again to verify that they are now scaled and also display the scaled minimum and maximum scores.

sc = formatpoints(sc,'WorstAndBestScores',[300 850]);
[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin          Points
    ____________    _______________    ______

    'CustAge'       '[-Inf,33)'        46.396
    'CustAge'       '[33,37)'          48.727
    'CustAge'       '[37,40)'          58.772
    'CustAge'       '[40,46)'          72.167
    'CustAge'       '[46,48)'          93.256
    'CustAge'       '[48,58)'          95.256
    'CustAge'       '[58,Inf]'         126.46
    'ResStatus'     'Tenant'           62.421
    'ResStatus'     'Home Owner'       82.276
    'ResStatus'     'Other'            113.58
    'EmpStatus'     'Unknown'          56.765
    'EmpStatus'     'Employed'         105.81
    'CustIncome'    '[-Inf,29000)'     8.9706
    'CustIncome'    '[29000,33000)'    53.208
    'CustIncome'    '[33000,35000)'     72.91
    'CustIncome'    '[35000,40000)'    76.585
      ⋮

MinScore = 300.0000
MaxScore = 850

As expected, the values of MinScore and MaxScore correspond to the desired worst and best scores.

This example shows how to use formatpoints to scale by providing the Shift and Slope values. By using formatpoints to scale, you can put points and scores in a desired range that is more meaningful for practical purposes. Technically, this involves a linear transformation from the unscaled to the scaled points by the formatpoints function.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument in creditscorecard to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70239     0.064001    10.975    5.0538e-28
    CustAge        0.60833      0.24932      2.44      0.014687
    ResStatus        1.377      0.65272    2.1097      0.034888
    EmpStatus      0.88565        0.293    3.0227     0.0025055
    CustIncome     0.70164      0.21844    3.2121     0.0013179
    TmWBank         1.1074      0.23271    4.7589    1.9464e-06
    OtherCC         1.0883      0.52912    2.0569      0.039696
    AMBalance        1.045      0.32214    3.2439     0.0011792


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin           Points  
    ____________    _______________    _________

    'CustAge'       '[-Inf,33)'         -0.15894
    'CustAge'       '[33,37)'           -0.14036
    'CustAge'       '[37,40)'          -0.060323
    'CustAge'       '[40,46)'           0.046408
    'CustAge'       '[46,48)'            0.21445
    'CustAge'       '[48,58)'            0.23039
    'CustAge'       '[58,Inf]'             0.479
    'ResStatus'     'Tenant'           -0.031252
    'ResStatus'     'Home Owner'         0.12696
    'ResStatus'     'Other'              0.37641
    'EmpStatus'     'Unknown'          -0.076317
    'EmpStatus'     'Employed'           0.31449
    'CustIncome'    '[-Inf,29000)'      -0.45716
    'CustIncome'    '[29000,33000)'     -0.10466
    'CustIncome'    '[33000,35000)'     0.052329
    'CustIncome'    '[35000,40000)'     0.081611
      ⋮

MinScore = -1.3100
MaxScore = 3.0726

Scale by providing the 'Shift' and 'Slope' values. In this example, there is an arbitrary choice of shift and slope. Display the points information again to verify that they are now scaled and also display the scaled minimum and maximum scores.

sc = formatpoints(sc,'ShiftAndSlope',[300 6]);
[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin          Points
    ____________    _______________    ______

    'CustAge'       '[-Inf,33)'        41.904
    'CustAge'       '[33,37)'          42.015
    'CustAge'       '[37,40)'          42.495
    'CustAge'       '[40,46)'          43.136
    'CustAge'       '[46,48)'          44.144
    'CustAge'       '[48,58)'          44.239
    'CustAge'       '[58,Inf]'         45.731
    'ResStatus'     'Tenant'            42.67
    'ResStatus'     'Home Owner'       43.619
    'ResStatus'     'Other'            45.116
    'EmpStatus'     'Unknown'          42.399
    'EmpStatus'     'Employed'         44.744
    'CustIncome'    '[-Inf,29000)'     40.114
    'CustIncome'    '[29000,33000)'    42.229
    'CustIncome'    '[33000,35000)'    43.171
    'CustIncome'    '[35000,40000)'    43.347
      ⋮

MinScore = 292.1401
MaxScore = 318.4355

This example shows how to use formatpoints to scale by providing the points, odds levels, and PDO (points to double the odds). By using formatpoints to scale, you can put points and scores in a desired range that is more meaningful for practical purposes. Technically, this involves a linear transformation from the unscaled to the scaled points by the formatpoints function.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument in creditscorecard to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70239     0.064001    10.975    5.0538e-28
    CustAge        0.60833      0.24932      2.44      0.014687
    ResStatus        1.377      0.65272    2.1097      0.034888
    EmpStatus      0.88565        0.293    3.0227     0.0025055
    CustIncome     0.70164      0.21844    3.2121     0.0013179
    TmWBank         1.1074      0.23271    4.7589    1.9464e-06
    OtherCC         1.0883      0.52912    2.0569      0.039696
    AMBalance        1.045      0.32214    3.2439     0.0011792


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin           Points  
    ____________    _______________    _________

    'CustAge'       '[-Inf,33)'         -0.15894
    'CustAge'       '[33,37)'           -0.14036
    'CustAge'       '[37,40)'          -0.060323
    'CustAge'       '[40,46)'           0.046408
    'CustAge'       '[46,48)'            0.21445
    'CustAge'       '[48,58)'            0.23039
    'CustAge'       '[58,Inf]'             0.479
    'ResStatus'     'Tenant'           -0.031252
    'ResStatus'     'Home Owner'         0.12696
    'ResStatus'     'Other'              0.37641
    'EmpStatus'     'Unknown'          -0.076317
    'EmpStatus'     'Employed'           0.31449
    'CustIncome'    '[-Inf,29000)'      -0.45716
    'CustIncome'    '[29000,33000)'     -0.10466
    'CustIncome'    '[33000,35000)'     0.052329
    'CustIncome'    '[35000,40000)'     0.081611
      ⋮

MinScore = -1.3100
MaxScore = 3.0726

Scale by providing the points, odds levels, and PDO (points to double the odds). Suppose that you want a score of 500 points to have odds of 2 (twice as likely to be good than to be bad) and that the odds double every 50 points (so that 550 points would have odds of 4).

sc = formatpoints(sc,'PointsOddsAndPDO',[500 2 50]);
[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin          Points
    ____________    _______________    ______

    'CustAge'       '[-Inf,33)'        52.821
    'CustAge'       '[33,37)'          54.161
    'CustAge'       '[37,40)'          59.934
    'CustAge'       '[40,46)'          67.633
    'CustAge'       '[46,48)'          79.755
    'CustAge'       '[48,58)'          80.905
    'CustAge'       '[58,Inf]'         98.838
    'ResStatus'     'Tenant'           62.031
    'ResStatus'     'Home Owner'       73.444
    'ResStatus'     'Other'            91.438
    'EmpStatus'     'Unknown'          58.781
    'EmpStatus'     'Employed'         86.971
    'CustIncome'    '[-Inf,29000)'     31.309
    'CustIncome'    '[29000,33000)'    56.736
    'CustIncome'    '[33000,35000)'     68.06
    'CustIncome'    '[35000,40000)'    70.173
      ⋮

MinScore = 355.5051
MaxScore = 671.6403

This example shows how to use formatpoints to separate the base points from the rest of the points assigned to each predictor variable. The formatpoints name-value pair argument 'BasePoints' serves this purpose.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument in creditscorecard to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70239     0.064001    10.975    5.0538e-28
    CustAge        0.60833      0.24932      2.44      0.014687
    ResStatus        1.377      0.65272    2.1097      0.034888
    EmpStatus      0.88565        0.293    3.0227     0.0025055
    CustIncome     0.70164      0.21844    3.2121     0.0013179
    TmWBank         1.1074      0.23271    4.7589    1.9464e-06
    OtherCC         1.0883      0.52912    2.0569      0.039696
    AMBalance        1.045      0.32214    3.2439     0.0011792


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin           Points  
    ____________    _______________    _________

    'CustAge'       '[-Inf,33)'         -0.15894
    'CustAge'       '[33,37)'           -0.14036
    'CustAge'       '[37,40)'          -0.060323
    'CustAge'       '[40,46)'           0.046408
    'CustAge'       '[46,48)'            0.21445
    'CustAge'       '[48,58)'            0.23039
    'CustAge'       '[58,Inf]'             0.479
    'ResStatus'     'Tenant'           -0.031252
    'ResStatus'     'Home Owner'         0.12696
    'ResStatus'     'Other'              0.37641
    'EmpStatus'     'Unknown'          -0.076317
    'EmpStatus'     'Employed'           0.31449
    'CustIncome'    '[-Inf,29000)'      -0.45716
    'CustIncome'    '[29000,33000)'     -0.10466
    'CustIncome'    '[33000,35000)'     0.052329
    'CustIncome'    '[35000,40000)'     0.081611
      ⋮

MinScore = -1.3100
MaxScore = 3.0726

By setting the name-value pair argument BasePoints to true, the points information table reports the base points separately in the first row. The minimum and maximum possible scores are not affected by this option.

sc = formatpoints(sc,'BasePoints',true);
[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=31×3 table
     Predictors           Bin           Points  
    ____________    _______________    _________

    'BasePoints'    'BasePoints'         0.70239
    'CustAge'       '[-Inf,33)'         -0.25928
    'CustAge'       '[33,37)'           -0.24071
    'CustAge'       '[37,40)'           -0.16066
    'CustAge'       '[40,46)'          -0.053933
    'CustAge'       '[46,48)'            0.11411
    'CustAge'       '[48,58)'            0.13005
    'CustAge'       '[58,Inf]'           0.37866
    'ResStatus'     'Tenant'            -0.13159
    'ResStatus'     'Home Owner'        0.026616
    'ResStatus'     'Other'              0.27607
    'EmpStatus'     'Unknown'           -0.17666
    'EmpStatus'     'Employed'           0.21415
    'CustIncome'    '[-Inf,29000)'       -0.5575
    'CustIncome'    '[29000,33000)'       -0.205
    'CustIncome'    '[33000,35000)'    -0.048013
      ⋮

MinScore = -1.3100
MaxScore = 3.0726

This example shows how to use formatpoints to round points. Rounding is usually applied after scaling, otherwise, if the points for a particular predictor are all in a small range, rounding could cause the rounded points for different bins to be the same. Also, rounding all the points may slightly change the minimum and maximum total points.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument in creditscorecard to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70239     0.064001    10.975    5.0538e-28
    CustAge        0.60833      0.24932      2.44      0.014687
    ResStatus        1.377      0.65272    2.1097      0.034888
    EmpStatus      0.88565        0.293    3.0227     0.0025055
    CustIncome     0.70164      0.21844    3.2121     0.0013179
    TmWBank         1.1074      0.23271    4.7589    1.9464e-06
    OtherCC         1.0883      0.52912    2.0569      0.039696
    AMBalance        1.045      0.32214    3.2439     0.0011792


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Display unscaled points for predictors retained in the fitting model and display the minimum and maximum possible unscaled scores.

[PointsInfo,MinScore,MaxScore] = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin           Points  
    ____________    _______________    _________

    'CustAge'       '[-Inf,33)'         -0.15894
    'CustAge'       '[33,37)'           -0.14036
    'CustAge'       '[37,40)'          -0.060323
    'CustAge'       '[40,46)'           0.046408
    'CustAge'       '[46,48)'            0.21445
    'CustAge'       '[48,58)'            0.23039
    'CustAge'       '[58,Inf]'             0.479
    'ResStatus'     'Tenant'           -0.031252
    'ResStatus'     'Home Owner'         0.12696
    'ResStatus'     'Other'              0.37641
    'EmpStatus'     'Unknown'          -0.076317
    'EmpStatus'     'Employed'           0.31449
    'CustIncome'    '[-Inf,29000)'      -0.45716
    'CustIncome'    '[29000,33000)'     -0.10466
    'CustIncome'    '[33000,35000)'     0.052329
    'CustIncome'    '[35000,40000)'     0.081611
      ⋮

MinScore = -1.3100
MaxScore = 3.0726

Scale points, and display the points information. By default, no rounding is applied.

sc = formatpoints(sc,'WorstAndBestScores',[300 850]);
PointsInfo = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin          Points
    ____________    _______________    ______

    'CustAge'       '[-Inf,33)'        46.396
    'CustAge'       '[33,37)'          48.727
    'CustAge'       '[37,40)'          58.772
    'CustAge'       '[40,46)'          72.167
    'CustAge'       '[46,48)'          93.256
    'CustAge'       '[48,58)'          95.256
    'CustAge'       '[58,Inf]'         126.46
    'ResStatus'     'Tenant'           62.421
    'ResStatus'     'Home Owner'       82.276
    'ResStatus'     'Other'            113.58
    'EmpStatus'     'Unknown'          56.765
    'EmpStatus'     'Employed'         105.81
    'CustIncome'    '[-Inf,29000)'     8.9706
    'CustIncome'    '[29000,33000)'    53.208
    'CustIncome'    '[33000,35000)'     72.91
    'CustIncome'    '[35000,40000)'    76.585
      ⋮

Use the name-value pair argument Round to apply rounding for all points and then display the points information again.

sc = formatpoints(sc,'Round','AllPoints');
PointsInfo = displaypoints(sc)
PointsInfo=30×3 table
     Predictors           Bin          Points
    ____________    _______________    ______

    'CustAge'       '[-Inf,33)'          46  
    'CustAge'       '[33,37)'            49  
    'CustAge'       '[37,40)'            59  
    'CustAge'       '[40,46)'            72  
    'CustAge'       '[46,48)'            93  
    'CustAge'       '[48,58)'            95  
    'CustAge'       '[58,Inf]'          126  
    'ResStatus'     'Tenant'             62  
    'ResStatus'     'Home Owner'         82  
    'ResStatus'     'Other'             114  
    'EmpStatus'     'Unknown'            57  
    'EmpStatus'     'Employed'          106  
    'CustIncome'    '[-Inf,29000)'        9  
    'CustIncome'    '[29000,33000)'      53  
    'CustIncome'    '[33000,35000)'      73  
    'CustIncome'    '[35000,40000)'      77  
      ⋮

This example shows how to use formatpoints to score missing or out-of-range data. When data is scored, some observations can be either missing (NaN, or undefined) or out of range. You will need to decide whether or not points are assigned to these cases. Use the name-value pair argument Missing to do so.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument in creditscorecard to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID');

Perform automatic binning to bin for all predictors.

sc = autobinning(sc);

Indicate that the minimum allowed value for 'CustAge' is zero. This makes any negative values for age invalid or out-of-range.

sc = modifybins(sc,'CustAge','MinValue',0);

Fit a linear regression model using default parameters.

sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70239     0.064001    10.975    5.0538e-28
    CustAge        0.60833      0.24932      2.44      0.014687
    ResStatus        1.377      0.65272    2.1097      0.034888
    EmpStatus      0.88565        0.293    3.0227     0.0025055
    CustIncome     0.70164      0.21844    3.2121     0.0013179
    TmWBank         1.1074      0.23271    4.7589    1.9464e-06
    OtherCC         1.0883      0.52912    2.0569      0.039696
    AMBalance        1.045      0.32214    3.2439     0.0011792


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Suppose there are missing or out of range observations in the data that you want to score. Notice that by default, the points and score assigned to the missing value is NaN.

% Set up a data set with missing and out of range data for illustration purposes
newdata = data(1:5,:);
newdata.CustAge(1) = NaN; % missing
newdata.CustAge(2) = -100; % invalid
newdata.ResStatus(3) = '<undefined>'; % missing
newdata.ResStatus(4) = 'House'; % invalid
disp(newdata)
    CustID    CustAge    TmAtAddress     ResStatus     EmpStatus    CustIncome    TmWBank    OtherCC    AMBalance    UtilRate    status
    ______    _______    ___________    ___________    _________    __________    _______    _______    _________    ________    ______

      1         NaN          62         Tenant         Unknown        50000         55         Yes       1055.9        0.22        0   
      2        -100          22         Home Owner     Employed       52000         25         Yes       1161.6        0.24        0   
      3          47          30         <undefined>    Employed       37000         61         No        877.23        0.29        0   
      4          50          75         House          Employed       53000         20         Yes       157.37        0.08        0   
      5          68          56         Home Owner     Employed       53000         14         Yes       561.84        0.11        0   
[Scores,Points] = score(sc,newdata);
disp(Scores)
       NaN
       NaN
       NaN
       NaN
    1.4535
disp(Points)
    CustAge    ResStatus    EmpStatus    CustIncome     TmWBank     OtherCC     AMBalance
    _______    _________    _________    __________    _________    ________    _________

        NaN    -0.031252    -0.076317      0.43693       0.39607     0.15842    -0.017472
        NaN      0.12696      0.31449      0.43693     -0.033752     0.15842    -0.017472
    0.21445          NaN      0.31449     0.081611       0.39607    -0.19168    -0.017472
    0.23039          NaN      0.31449      0.43693     -0.044811     0.15842      0.35551
      0.479      0.12696      0.31449      0.43693     -0.044811     0.15842    -0.017472

Use the name-value pair argument Missing to replace NaN with points corresponding to a zero Weight-of-Evidence (WOE).

sc = formatpoints(sc,'Missing','ZeroWOE');
[Scores,Points] = score(sc,newdata);
disp(Scores)
    0.9667
    1.0859
    0.8978
    1.5513
    1.4535
disp(Points)
    CustAge    ResStatus    EmpStatus    CustIncome     TmWBank     OtherCC     AMBalance
    _______    _________    _________    __________    _________    ________    _________

    0.10034    -0.031252    -0.076317      0.43693       0.39607     0.15842    -0.017472
    0.10034      0.12696      0.31449      0.43693     -0.033752     0.15842    -0.017472
    0.21445      0.10034      0.31449     0.081611       0.39607    -0.19168    -0.017472
    0.23039      0.10034      0.31449      0.43693     -0.044811     0.15842      0.35551
      0.479      0.12696      0.31449      0.43693     -0.044811     0.15842    -0.017472

Alternatively, use the name-value pair argument Missing to replace the missing value with the minimum points for the predictors that have the missing values.

sc = formatpoints(sc,'Missing','MinPoints');
[Scores,Points] = score(sc,newdata);
disp(Scores)
    0.7074
    0.8266
    0.7662
    1.4197
    1.4535
disp(Points)
    CustAge     ResStatus    EmpStatus    CustIncome     TmWBank     OtherCC     AMBalance
    ________    _________    _________    __________    _________    ________    _________

    -0.15894    -0.031252    -0.076317      0.43693       0.39607     0.15842    -0.017472
    -0.15894      0.12696      0.31449      0.43693     -0.033752     0.15842    -0.017472
     0.21445    -0.031252      0.31449     0.081611       0.39607    -0.19168    -0.017472
     0.23039    -0.031252      0.31449      0.43693     -0.044811     0.15842      0.35551
       0.479      0.12696      0.31449      0.43693     -0.044811     0.15842    -0.017472

As a third alternative, use the name-value pair argument Missing to replace the missing value with the maximum points for the predictors that have the missing values.

sc = formatpoints(sc,'Missing','MaxPoints');
[Scores,Points] = score(sc,newdata);
disp(Scores)
    1.3454
    1.4646
    1.1739
    1.8273
    1.4535
disp(Points)
    CustAge    ResStatus    EmpStatus    CustIncome     TmWBank     OtherCC     AMBalance
    _______    _________    _________    __________    _________    ________    _________

      0.479    -0.031252    -0.076317      0.43693       0.39607     0.15842    -0.017472
      0.479      0.12696      0.31449      0.43693     -0.033752     0.15842    -0.017472
    0.21445      0.37641      0.31449     0.081611       0.39607    -0.19168    -0.017472
    0.23039      0.37641      0.31449      0.43693     -0.044811     0.15842      0.35551
      0.479      0.12696      0.31449      0.43693     -0.044811     0.15842    -0.017472

Verify that the minimum and maximum points assigned to the missing data correspond to the minimum and maximum points for the corresponding predictors. The points for 'CustAge' are reported in the first seven rows of the points information table. For 'ResStatus' the points are in rows 8 through 10.

PointsInfo = displaypoints(sc);
PointsInfo(1:7,:)
ans=7×3 table
    Predictors       Bin         Points  
    __________    __________    _________

    'CustAge'     '[0,33)'       -0.15894
    'CustAge'     '[33,37)'      -0.14036
    'CustAge'     '[37,40)'     -0.060323
    'CustAge'     '[40,46)'      0.046408
    'CustAge'     '[46,48)'       0.21445
    'CustAge'     '[48,58)'       0.23039
    'CustAge'     '[58,Inf]'        0.479

min(PointsInfo.Points(1:7))
ans = -0.1589
max(PointsInfo.Points(1:7))
ans = 0.4790
PointsInfo(8:10,:)
ans=3×3 table
    Predictors         Bin          Points  
    ___________    ____________    _________

    'ResStatus'    'Tenant'        -0.031252
    'ResStatus'    'Home Owner'      0.12696
    'ResStatus'    'Other'           0.37641

min(PointsInfo.Points(8:10))
ans = -0.0313
max(PointsInfo.Points(8:10))
ans = 0.3764

This example describes the assignment of points for missing data when the 'BinMissingData' option is set to true.

  • Predictors that have missing data in the training set have an explicit bin for <missing> with corresponding points in the final scorecard. These points are computed from the Weight-of-Evidence (WOE) value for the <missing> bin and the logistic model coefficients. For scoring purposes, these points are assigned to missing values and to out-of-range values.

  • Predictors with no missing data in the training set have no <missing> bin, therefore no WOE can be estimated from the training data. By default, the points for missing and out-of-range values are set to NaN, and this leads to a score of NaN when running score. For predictors that have no explicit <missing> bin, use the name-value argument 'Missing' in formatpoints to indicate how missing data should be treated for scoring purposes.

Create a creditscorecard object using the CreditCardData.mat file to load the dataMissing with missing values.

load CreditCardData.mat 
head(dataMissing,5)
ans=5×11 table
    CustID    CustAge    TmAtAddress     ResStatus     EmpStatus    CustIncome    TmWBank    OtherCC    AMBalance    UtilRate    status
    ______    _______    ___________    ___________    _________    __________    _______    _______    _________    ________    ______

      1          53          62         <undefined>    Unknown        50000         55         Yes       1055.9        0.22        0   
      2          61          22         Home Owner     Employed       52000         25         Yes       1161.6        0.24        0   
      3          47          30         Tenant         Employed       37000         61         No        877.23        0.29        0   
      4         NaN          75         Home Owner     Employed       53000         20         Yes       157.37        0.08        0   
      5          68          56         Home Owner     Employed       53000         14         Yes       561.84        0.11        0   

fprintf('Number of rows: %d\n',height(dataMissing))
Number of rows: 1200
fprintf('Number of missing values CustAge: %d\n',sum(ismissing(dataMissing.CustAge)))
Number of missing values CustAge: 30
fprintf('Number of missing values ResStatus: %d\n',sum(ismissing(dataMissing.ResStatus)))
Number of missing values ResStatus: 40

Use creditscorecard with the name-value argument 'BinMissingData' set to true to bin the missing numeric or categorical data in a separate bin. Apply automatic binning.

sc = creditscorecard(dataMissing,'IDVar','CustID','BinMissingData',true);
sc = autobinning(sc);

disp(sc)
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {1x11 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
           BinMissingData: 1
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x11 table]

Set a minimum value of zero for CustAge and CustIncome. With this, any negative age or income information becomes invalid or "out-of-range". For scoring purposes, out-of-range values are given the same points as missing values.

sc = modifybins(sc,'CustAge','MinValue',0);
sc = modifybins(sc,'CustIncome','MinValue',0);

Display and plot bin information for numeric data for 'CustAge' that includes missing data in a separate bin labelled <missing>.

[bi,cp] = bininfo(sc,'CustAge');
disp(bi)
        Bin        Good    Bad     Odds       WOE       InfoValue 
    ___________    ____    ___    ______    ________    __________

    '[0,33)'        69      52    1.3269    -0.42156      0.018993
    '[33,37)'       63      45       1.4    -0.36795      0.012839
    '[37,40)'       72      47    1.5319     -0.2779     0.0079824
    '[40,46)'      172      89    1.9326    -0.04556     0.0004549
    '[46,48)'       59      25      2.36     0.15424     0.0016199
    '[48,51)'       99      41    2.4146     0.17713     0.0035449
    '[51,58)'      157      62    2.5323     0.22469     0.0088407
    '[58,Inf]'      93      25      3.72     0.60931      0.032198
    '<missing>'     19      11    1.7273    -0.15787    0.00063885
    'Totals'       803     397    2.0227         NaN      0.087112
plotbins(sc,'CustAge')

Display and plot bin information for categorical data for 'ResStatus' that includes missing data in a separate bin labelled <missing>.

[bi,cg] = bininfo(sc,'ResStatus');
disp(bi)
        Bin         Good    Bad     Odds        WOE       InfoValue 
    ____________    ____    ___    ______    _________    __________

    'Tenant'        296     161    1.8385    -0.095463     0.0035249
    'Home Owner'    352     171    2.0585     0.017549    0.00013382
    'Other'         128      52    2.4615      0.19637     0.0055808
    '<missing>'      27      13    2.0769     0.026469    2.3248e-05
    'Totals'        803     397    2.0227          NaN     0.0092627
plotbins(sc,'ResStatus')

For the 'CustAge' and 'ResStatus' predictors, there is missing data (NaNs and <undefined>) in the training data, and the binning process estimates a WOE value of -0.15787 and 0.026469 respectively for missing data in these predictors, as shown above.

For EmpStatus and CustIncome there is no explicit bin for missing values because the training data has no missing values for these predictors.

bi = bininfo(sc,'EmpStatus');
disp(bi)
       Bin        Good    Bad     Odds       WOE       InfoValue
    __________    ____    ___    ______    ________    _________

    'Unknown'     396     239    1.6569    -0.19947    0.021715 
    'Employed'    407     158    2.5759      0.2418    0.026323 
    'Totals'      803     397    2.0227         NaN    0.048038 
bi = bininfo(sc,'CustIncome');
disp(bi)
          Bin          Good    Bad     Odds         WOE       InfoValue 
    _______________    ____    ___    _______    _________    __________

    '[0,29000)'         53      58    0.91379     -0.79457       0.06364
    '[29000,33000)'     74      49     1.5102     -0.29217     0.0091366
    '[33000,35000)'     68      36     1.8889     -0.06843    0.00041042
    '[35000,40000)'    193      98     1.9694    -0.026696    0.00017359
    '[40000,42000)'     68      34          2    -0.011271    1.0819e-05
    '[42000,47000)'    164      66     2.4848      0.20579     0.0078175
    '[47000,Inf]'      183      56     3.2679      0.47972      0.041657
    'Totals'           803     397     2.0227          NaN       0.12285

Use fitmodel to fit a logistic regression model using Weight of Evidence (WOE) data. fitmodel internally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process. fitmodel then fits a logistic regression model using a stepwise method (by default). For predictors that have missing data, there is an explicit <missing> bin, with a corresponding WOE value computed from the data. When using fitmodel, the corresponding WOE value for the <missing> bin is applied when performing the WOE transformation.

[sc,mdl] = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1442.8477, Chi2Stat = 4.4974731, PValue = 0.033944979
6. Adding ResStatus, Deviance = 1438.9783, Chi2Stat = 3.86941, PValue = 0.049173805
7. Adding OtherCC, Deviance = 1434.9751, Chi2Stat = 4.0031966, PValue = 0.045414057

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70229     0.063959     10.98    4.7498e-28
    CustAge        0.57421      0.25708    2.2335      0.025513
    ResStatus       1.3629      0.66952    2.0356       0.04179
    EmpStatus      0.88373       0.2929    3.0172      0.002551
    CustIncome     0.73535       0.2159     3.406    0.00065929
    TmWBank         1.1065      0.23267    4.7556    1.9783e-06
    OtherCC         1.0648      0.52826    2.0156      0.043841
    AMBalance       1.0446      0.32197    3.2443     0.0011775


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 88.5, p-value = 2.55e-16

Scale the scorecard points by the "points, odds, and points to double the odds (PDO)" method using the 'PointsOddsAndPDO' argument of formatpoints. Suppose that you want a score of 500 points to have odds of 2 (twice as likely to be good than to be bad) and that the odds double every 50 points (so that 550 points would have odds of 4).

Display the scorecard showing the scaled points for predictors retained in the fitting model.

sc = formatpoints(sc,'PointsOddsAndPDO',[500 2 50]);
PointsInfo = displaypoints(sc)
PointsInfo=33×3 table
     Predictors         Bin         Points
    ____________    ____________    ______

    'CustAge'       '[0,33)'        54.062
    'CustAge'       '[33,37)'       56.282
    'CustAge'       '[37,40)'       60.012
    'CustAge'       '[40,46)'       69.636
    'CustAge'       '[46,48)'       77.912
    'CustAge'       '[48,51)'        78.86
    'CustAge'       '[51,58)'        80.83
    'CustAge'       '[58,Inf]'       96.76
    'CustAge'       '<missing>'     64.984
    'ResStatus'     'Tenant'        62.138
    'ResStatus'     'Home Owner'    73.248
    'ResStatus'     'Other'         90.828
    'ResStatus'     '<missing>'     74.125
    'EmpStatus'     'Unknown'       58.807
    'EmpStatus'     'Employed'      86.937
    'CustIncome'    '[0,29000)'     29.375
      ⋮

Notice that points for the <missing> bin for CustAge and ResStatus are explicitly shown (as 64.9836 and 74.1250, respectively). These points are computed from the WOE value for the <missing> bin, and the logistic model coefficients.

For predictors that have no missing data in the training set, there is no explicit <missing> bin. By default the points are set to NaN for missing data and they lead to a score of NaN when running score. For predictors that have no explicit <missing> bin, use the name-value argument 'Missing' in formatpoints to indicate how missing data should be treated for scoring purposes.

For the purpose of illustration, take a few rows from the original data as test data and introduce some missing data. Also introduce some invalid, or out-of-range values. For numeric data, values below the minimum (or above the maximum) allowed are considered invalid, such as a negative value for age (recall 'MinValue' was earlier set to 0 for CustAge and CustIncome). For categorical data, invalid values are categories not explicitly included in the scorecard, for example, a residential status not previously mapped to scorecard categories, such as "House", or a meaningless string such as "abc123".

tdata = dataMissing(11:18,mdl.PredictorNames); % Keep only the predictors retained in the model
% Set some missing values
tdata.CustAge(1) = NaN;
tdata.ResStatus(2) = '<undefined>';
tdata.EmpStatus(3) = '<undefined>';
tdata.CustIncome(4) = NaN;
% Set some invalid values
tdata.CustAge(5) = -100;
tdata.ResStatus(6) = 'House';
tdata.EmpStatus(7) = 'Freelancer';
tdata.CustIncome(8) = -1;
disp(tdata)
    CustAge     ResStatus      EmpStatus     CustIncome    TmWBank    OtherCC    AMBalance
    _______    ___________    ___________    __________    _______    _______    _________

      NaN      Tenant         Unknown          34000         44         Yes        119.8  
       48      <undefined>    Unknown          44000         14         Yes       403.62  
       65      Home Owner     <undefined>      48000          6         No        111.88  
       44      Other          Unknown            NaN         35         No        436.41  
     -100      Other          Employed         46000         16         Yes       162.21  
       33      House          Employed         36000         36         Yes       845.02  
       39      Tenant         Freelancer       34000         40         Yes       756.26  
       24      Home Owner     Employed            -1         19         Yes       449.61  

Score the new data and see how points are assigned for missing CustAge and ResStatus, because we have an explicit bin with points for <missing>. However, for EmpStatus and CustIncome the score function sets the points to NaN.

[Scores,Points] = score(sc,tdata);
disp(Scores)
  481.2231
  520.8353
       NaN
       NaN
  551.7922
  487.9588
       NaN
       NaN
disp(Points)
    CustAge    ResStatus    EmpStatus    CustIncome    TmWBank    OtherCC    AMBalance
    _______    _________    _________    __________    _______    _______    _________

    64.984      62.138       58.807        67.893      61.858     75.622      89.922  
     78.86      74.125       58.807        82.439      61.061     75.622      89.922  
     96.76      73.248          NaN        96.969      51.132     50.914      89.922  
    69.636      90.828       58.807           NaN      61.858     50.914      89.922  
    64.984      90.828       86.937        82.439      61.061     75.622      89.922  
    56.282      74.125       86.937        70.107      61.858     75.622      63.028  
    60.012      62.138          NaN        67.893      61.858     75.622      63.028  
    54.062      73.248       86.937           NaN      61.061     75.622      89.922  

Use the name-value argument 'Missing' in formatpoints to choose how to assign points to missing values for predictors that do not have an explicit <missing> bin. In this example, use the 'MinPoints' option for the 'Missing' argument. The minimum points for EmpStatus in the scorecard displayed above are 58.8072, and for CustIncome the minimum points are 29.3753.

sc = formatpoints(sc,'Missing','MinPoints');
[Scores,Points] = score(sc,tdata);
disp(Scores)
  481.2231
  520.8353
  517.7532
  451.3405
  551.7922
  487.9588
  449.3577
  470.2267
disp(Points)
    CustAge    ResStatus    EmpStatus    CustIncome    TmWBank    OtherCC    AMBalance
    _______    _________    _________    __________    _______    _______    _________

    64.984      62.138       58.807        67.893      61.858     75.622      89.922  
     78.86      74.125       58.807        82.439      61.061     75.622      89.922  
     96.76      73.248       58.807        96.969      51.132     50.914      89.922  
    69.636      90.828       58.807        29.375      61.858     50.914      89.922  
    64.984      90.828       86.937        82.439      61.061     75.622      89.922  
    56.282      74.125       86.937        70.107      61.858     75.622      63.028  
    60.012      62.138       58.807        67.893      61.858     75.622      63.028  
    54.062      73.248       86.937        29.375      61.061     75.622      89.922  

Input Arguments

collapse all

Credit scorecard model, specified as a creditscorecard object. Use creditscorecard to create a creditscorecard object.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: sc = formatpoints(sc,'BasePoints',true,'Round','AllPoints','WorstAndBestScores',[100, 700])

Note

ShiftAndSlope, PointsOddsAndPDO, and WorstAndBestScores are scaling methods and you can use only one of these name-value pair arguments at one time. The other three name-value pair arguments (BasePoints, Missing, and Round) are not scaling methods and can be used together or with any one of the three scaling methods.

Indicator for separating base points, specified as the comma-separated pair consisting of 'BasePoints' and a logical scalar. If true, the scorecard explicitly separates base points. If false, the base points are spread across all variables in the creditscorecard object.

Data Types: char

Indicator for points assigned to missing or out-of-range information when scoring, specified as the comma-separated pair consisting of 'Missing' and a character vector with a value for NoScore, ZeroWOE, MinPoints, or MaxPoints, where:

  • NoScore — Missing and out-of-range data do not get points assigned and points are set to NaN. Also, the total score is set to NaN.

  • ZeroWOE — Missing or out-of-range data get assigned a zero Weight-of-Evidence (WOE) value.

  • MinPoints — Missing or out-of-range data get the minimum possible points for that predictor. This penalizes the score if higher scores are better.

  • MaxPoints — Missing or out-of-range data get the maximum possible points for that predictor. This penalizes the score if lower scores are better.

    Note

    When using the creditscorecard name-value argument 'BinMissingData' with a value of true, missing data for numeric and categorical predictors is binned in a separate bin labeled <missing>. The <missing> bin only contains missing values for a predictor and does not contain invalid or out-of-range values for a predictor.

Data Types: char

Indicator whether to round points or scores, specified as the comma-separated pair consisting of 'Round' and a character vector with values 'AllPoints', 'FinalScore' or 'None', where:

  • None — No rounding is applied.

  • AllPoints — Apply rounding to each predictor's points before adding up the total score.

  • FinalScore — Round the final score only (rounding is applied after all points are added up).

Data Types: char

Indicator for shift and slope scaling parameters for the credit scorecard, specified as the comma-separated pair consisting of 'ShiftAndSlope' and a numeric array with two elements [Shift, Slope]. Slope cannot be zero. The ShiftAndSlope values are used scale the scoring model.

Note

ShiftAndSlope, PointsOddsAndPDO, and WorstAndBestScores are scaling methods and you can use only one of these name-value pair arguments at one time. The other three name-value pair arguments (BasePoints, Missing, and Round) are not scaling methods and can be used together or with any one of the three scaling methods.

To remove a previous scaling and revert to unscaled scores, set ShiftAndSlope to[0,1].

Data Types: double

Indicator for target points (Points) for a given odds level (Odds) and the desired number of points to double the odds (PDO), specified as the comma-separated pair consisting of 'PointsOddsAndPDO' and a numeric array with three elements [Points,Odds,PDO]. Odds must be a positive number. The PointsOddsAndPDO values are used to find scaling parameters for the scoring model.

Note

The points to double the odds (PDO) may be positive or negative, depending on whether higher scores mean lower risk, or vice versa.

ShiftAndSlope, PointsOddsAndPDO, and WorstAndBestScores are scaling methods and you can use only one of these name-value pair arguments at one time. The other three name-value pair arguments (BasePoints, Missing, and Round) are not scaling methods and can be used together or with any one of the three scaling methods.

To remove a previous scaling and revert to unscaled scores, set ShiftAndSlope to[0,1].

Data Types: double

Indicator for worst (highest risk) and best (lowest risk) scores in the scorecard, specified as the comma-separated pair consisting of 'WorstAndBestScores' and a numeric array with two elements [WorstScore,BestScore]. WorstScore and BestScore must be different values. These WorstAndBestScores values are used to find scaling parameters for the scoring model.

Note

WorstScore means the riskiest score, and its value could be lower or higher than the ‘best’ score. In other words, the ‘minimum’ score may be the ‘worst‘ score or the 'best' score, depending on the desired scoring scale.

ShiftAndSlope, PointsOddsAndPDO, and WorstAndBestScores are scaling methods and you can use only one of these name-value pair arguments at one time. The other three name-value pair arguments (BasePoints, Missing, and Round) are not scaling methods and can be used together or with any one of the three scaling methods.

To remove a previous scaling and revert to unscaled scores, set ShiftAndSlope to[0,1].

Data Types: double

Output Arguments

collapse all

Credit scorecard model returned as an updated creditscorecard object. For more information on using the creditscorecard object, see creditscorecard.

Algorithms

The score of an individual i is given by the formula

Score(i) = Shift + Slope*(b0 + b1*WOE1(i) + b2*WOE2(i)+ ... +bp*WOEp(i))

where bj is the coefficient of the jth variable in the model, and WOEj(i) is the Weight of Evidence (WOE) value for the ith individual corresponding to the jth model variable. Shift and Slope are scaling constants further discussed below. The scaling constant can be controlled with formatpoints.

If the data for individual i is in the i-th row of a given dataset, to compute a score, the data(i,j) is binned using existing binning maps, and converted into a corresponding Weight of Evidence value WOEj(i). Using the model coefficients, the unscaled score is computed as

 s = b0 + b1*WOE1(i) + ... +bp*WOEp(i).

For simplicity, assume in the description above that the j-th variable in the model is the j-th column in the data input, although, in general, the order of variables in a given dataset does not have to match the order of variables in the model, and the dataset could have additional variables that are not used in the model.

The formatting options can be controlled using formatpoints. When the base points are reported separately (see the formatpoints parameter BasePoints), the base points are given by

Base Points = Shift + Slope*b0,
and the points for the j-th predictor, i-th row are given by
Points_ji = Slope*(bj*WOEj(i))).

By default, the base points are not reported separately, in which case

Points_ji = (Shift + Slope*b0)/p + Slope*(bj*WOEj(i)),
where p is the number of predictors in the scorecard model.

By default, no rounding is applied to the points by the score function (Round is None). If Round is set to AllPoints using formatpoints, then the points for individual i for variable j are given by

 points if rounding is 'AllPoints': round( Points_ji )
and, if base points are reported separately, the are also rounded. This yields integer-valued points per predictor, hence also integer-valued scores. If Round is set to FinalScore using formatpoints, then the points per predictor are not rounded, and only the final score is rounded
 score if rounding is 'FinalScore': round(Score(i)).

Regarding the scaling parameters, the Shift parameter, and the Slope parameter can be set directly with the ShiftAndSlope parameter of formatpoints. Alternatively, you can use the formatpoints parameter for WorstAndBestScores. In this case, the parameters Shift and Slope are found internally by solving the system

Shift + Slope*smin = WorstScore,
Shift + Slope*smax = BestScore,
where WorstScore and BestScore are the first and second elements in the formatpoints parameter for WorstAndBestScores and smin and smax are the minimum and maximum possible unscaled scores:
smin = b0 + min(b1*WOE1) + ... +min(bp*WOEp),
smax = b0 + max(b1*WOE1) + ... +max(bp*WOEp).

A third alternative to scale scores is the PointsOddsAndPDO parameter in formatpoints. In this case, assume that the unscaled score s gives the log-odds for a row, and the Shift and Slope parameters are found by solving the following system

Points = Shift + Slope*log(Odds)
Points + PDO = Shift + Slope*log(2*Odds)
where Points, Odds, and PDO ("points to double the odds") are the first, second, and third elements in the PointsOddsAndPDO parameter.

Whenever a given dataset has a missing or out-of-range value data (i,j), the points for predictor j, for individual i, are set to NaN by default, which results in a missing score for that row (a NaN score). Using the Missing parameter for formatpoints, you can modify this behavior and set the corresponding Weight-of-Evidence (WOE) value to zero, or set the points to the minimum points, or the maximum points for that predictor.

References

[1] Anderson, R. The Credit Scoring Toolkit. Oxford University Press, 2007.

[2] Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS. lulu.com, 2011.

Introduced in R2014b