Probit

Create Probit model object for lifetime probability of default

expand all in page

Description

Create and analyze a Probit model object to calculate lifetime probability of default (PD) using this workflow:

Use fitLifetimePDModel to create a Probit model object.
Use predict to predict the conditional PD and predictLifetime to predict the lifetime PD.
Use modelDiscrimination to return AUROC and ROC data. You can plot the results using modelDiscriminationPlot.
Use modelCalibration to return the RMSE of observed and predicted PD data. You can plot the results using modelCalibrationPlot.

Creation

Syntax

ProbitPDModel = fitLifetimePDModel(data,ModelType)

ProbitPDModel = fitLifetimePDModel(___,Name,Value)

Description

ProbitPDModel = fitLifetimePDModel(data,ModelType) creates a Probit PD model object.

If you do not specify variable information for IDVar, AgeVar, LoanVars, MacroVars, and ResponseVar, then:

IDVar is set to the first column in the data input.
LoanVars is set to include all columns from the second to the second-to-last columns of the data input.
ResponseVar is set to the last column in the data input.

example

ProbitPDModel = fitLifetimePDModel(___,Name,Value) specifies options using one or more name-value arguments in addition to the input arguments in the previous syntax. The optional name-value arguments set the model object properties. For example, ProbitPDModel = fitLifetimePDModel(data(TrainDataInd,:),"Probit",ModelID="Probit_A",Description="Probit_model",AgeVar="YOB",IDVar="ID",LoanVars="ScoreGroup",MacroVars={'GDP','Market'},ResponseVar="Default",WeightsVar="Weights") creates a ProbitPDModel object using a Probit model type.

example

Input Arguments

expand all

`data` — Data
table

Data, specified as a table, in panel data form. The data must contain an ID column. The response variable must be a binary variable with the value 0 or 1, with 1 indicating default.

Data, specified as a table where the first column is IDVar, the last column is the ResponseVar, and all other columns are LoanVars.

Data Types: table

`ModelType` — Model type
string with value `"Probit"` | character vector with value `'Probit'`

Model type, specified as a string with the value "Probit" or a character vector with the value 'Probit'.

Data Types: char | string

Name-Value Arguments

expand all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: ProbitPDModel = fitLifetimePDModel(data(TrainDataInd,:),"Probit",ModelID="Probit_A",Description="Probit_model",AgeVar="YOB",IDVar="ID",LoanVars="ScoreGroup",MacroVars={'GDP','Market'},ResponseVar="Default",WeightsVar="Weights")

`ModelID` — User-defined model ID
`Probit` (default) | string | character vector

User-defined model ID, specified as the comma-separated pair consisting of 'ModelID' and a string or character vector. The software uses the ModelID to format outputs and is expected to be short.

Data Types: string | char

`Description` — User-defined description for model
`""` (default) | string | character vector

User-defined description for model, specified as the comma-separated pair consisting of 'Description' and a string or character vector.

Data Types: string | char

`IDVar` — ID variable indicating which column in `data` contains loan or borrower ID
1st column of `data` (default) | string | character vector

ID variable indicating which column in data contains the loan or borrower ID, specified as the comma-separated pair consisting of 'IDVar' and a string or character vector.

Data Types: string | char

`AgeVar` — Age variable indicating which column in `data` contains loan age information
if not specified, then `LoanVars` (default) | string | character vector

Age variable indicating which column in data contains the loan age information, specified as the comma-separated pair consisting of 'AgeVar' and a string or character vector.

Data Types: string | char

`LoanVars` — Loan variables indicating which column in `data` contains loan-specific information
all columns of `data` that are not the first or last column (default) | string array | cell array of character vectors

Loan variables indicating which column in data contains the loan-specific information, such as origination score or loan-to-value ratio, specified as the comma-separated pair consisting of 'LoanVars' and a string array or cell array of character vectors.

Data Types: string | cell

`MacroVars` — Macro variables indicating which column in `data` contains macroeconomic information
if not specified, then `LoanVars` (default) | string array | cell array of character vectors

Macro variables indicating which column in data contains the macroeconomic information, such as gross domestic product (GDP) growth or unemployment rate, specified as the comma-separated pair consisting of 'MacroVars' and a string array or cell array of character vectors.

Data Types: string | cell

`ResponseVar` — Variable indicating which column in `data` contains response variable
string | character vector

Variable indicating which column in data contains the response variable, specified as the comma-separated pair consisting of 'ResponseVar' and a string or character vector.

Note

The response variable values in the data must be a binary variable with 0 or 1 values, with 1 indicating default.

Data Types: string | char

`WeightsVar` — Column name containing weights
`""` (default) | string array

Column name of the input table containing weights, specified as a string scalar.

Note

The default value ("") results in a weight of 1 for each row in data. All weight values in data must be nonnegative.

For an example using WeightsVar, see Create Weighted Lifetime PD Model.

Data Types: string

`TimeInterval` — Time interval value
most frequent `AgeVar` increment (default) | positive numeric scalar

Time interval value, specified as a positive numeric scalar indicating the time interval used to define the 0-1 default indicator values in the response variable. The time interval typically coincides with the distance between age values in training data in the panel data input. For example, if the age data (AgeVar) is 1, 2, 3, ..., then the TimeInterval is 1; if the age data is 0.25, 0.5, 0.75, ..., then the TimeInterval is 0.25. For more information, see Time Interval for Probit Models and Lifetime Prediction and Time Interval.

By default, if you do not specify a TimeInterval when creating a Probit model, the TimeInterval is inferred from the increments in the AgeVar values in the training data. If AgeVar does not contain numeric values, TimeInterval is set to [].

Data Types: double

Properties

expand all

`ModelID` — User-defined Model ID
`Probit` (default) | string

User-defined model ID, returned as a string.

Data Types: string

`Description` — User-defined description
`""` (default) | string

User-defined description, returned as a string.

Data Types: string

`UnderlyingModel` — Underlying statistical model
compact linear model

Underlying statistical model, returned as a compact generalized linear model object. For more information, see fitglm and CompactGeneralizedLinearModel.

Data Types: CompactGneralizedLinearModel

`IDVar` — ID variable indicating which column in `data` contains loan or borrower ID
1st column of `data` (default) | string

ID variable indicating which column in data contains the loan or borrower ID, returned as a string.

Data Types: string

`AgeVar` — Age variable indicating which column in `data` contains loan age information
if not specified, then `LoanVars` (default) | string

Age variable indicating which column in data contains the loan age information, returned as a string.

Data Types: string

`LoanVars` — Loan variables indicating which column in `data` contains loan-specific information
all columns of `data` that are not the first or last column (default) | string array

Loan variables indicating which column in data contains the loan-specific information, returned as a string array.

Data Types: string

`MacroVars` — Macro variables indicating which column in `data` contains macroeconomic information
if not specified, then `LoanVars` (default) | string array

Macro variables indicating which column in data contains the macroeconomic information, returned as a string array.

Data Types: string

`ResponseVar` — Variable indicating which column in `data` contains response variable
string

Variable indicating which column in data contains the response variable, returned as a string.

Data Types: logical

`WeightsVar` — Column name containing weights
`""` (default) | string scalar

Column name of the input table containing weights, returned as a string scalar.

Data Types: string

`TimeInterval` — Time interval value
most frequent `AgeVar` increment (default) | positive numeric scalar

Time interval value, returned as a positive numeric scalar.

Data Types: double

Object Functions

`predict`	Compute conditional PD
`predictLifetime`	Compute cumulative lifetime PD, marginal PD, and survival probability
`modelDiscrimination`	Compute AUROC and ROC data
`modelCalibration`	Compute RMSE of predicted and observed PDs on grouped data
`modelDiscriminationPlot`	Plot ROC curve
`modelCalibrationPlot`	Plot observed default rates compared to predicted PDs on grouped data

Examples

collapse all

Create Probit Lifetime PD Model

Open Live Script

This example shows how to use fitLifetimePDModel to create a Probit model using credit and macroeconomic data.

Load Data

Load the credit portfolio data.

load RetailCreditPanelData.mat
disp(head(data))

    ID    ScoreGroup    YOB    Default    Year
    __    __________    ___    _______    ____

    1      Low Risk      1        0       1997
    1      Low Risk      2        0       1998
    1      Low Risk      3        0       1999
    1      Low Risk      4        0       2000
    1      Low Risk      5        0       2001
    1      Low Risk      6        0       2002
    1      Low Risk      7        0       2003
    1      Low Risk      8        0       2004

disp(head(dataMacro))

    Year     GDP     Market
    ____    _____    ______

    1997     2.72      7.61
    1998     3.57     26.24
    1999     2.86      18.1
    2000     2.43      3.19
    2001     1.26    -10.51
    2002    -0.59    -22.95
    2003     0.63      2.78
    2004     1.85      9.48

Join the two data components into a single data set.

data = join(data,dataMacro);
disp(head(data))

    ID    ScoreGroup    YOB    Default    Year     GDP     Market
    __    __________    ___    _______    ____    _____    ______

    1      Low Risk      1        0       1997     2.72      7.61
    1      Low Risk      2        0       1998     3.57     26.24
    1      Low Risk      3        0       1999     2.86      18.1
    1      Low Risk      4        0       2000     2.43      3.19
    1      Low Risk      5        0       2001     1.26    -10.51
    1      Low Risk      6        0       2002    -0.59    -22.95
    1      Low Risk      7        0       2003     0.63      2.78
    1      Low Risk      8        0       2004     1.85      9.48

Partition Data

Separate the data into training and test partitions.

nIDs = max(data.ID);
uniqueIDs = unique(data.ID);

rng('default'); % for reproducibility
c = cvpartition(nIDs,'HoldOut',0.4);

TrainIDInd = training(c);
TestIDInd = test(c);

TrainDataInd = ismember(data.ID,uniqueIDs(TrainIDInd));
TestDataInd = ismember(data.ID,uniqueIDs(TestIDInd));

Create a Probit Lifetime PD Model

Use fitLifetimePDModel to create a Probit model using the training data.

pdModel = fitLifetimePDModel(data(TrainDataInd,:),"Probit",...
    'AgeVar','YOB',...
    'IDVar','ID',...
    'LoanVars','ScoreGroup',...
    'MacroVars',{'GDP','Market'},...
    'ResponseVar','Default');
disp(pdModel)

  Probit with properties:

            ModelID: "Probit"
        Description: ""
    UnderlyingModel: [1×1 classreg.regr.CompactGeneralizedLinearModel]
              IDVar: "ID"
             AgeVar: "YOB"
           LoanVars: "ScoreGroup"
          MacroVars: ["GDP"    "Market"]
        ResponseVar: "Default"
         WeightsVar: ""
       TimeInterval: 1

Display the underlying model.

disp(pdModel.UnderlyingModel)

Compact generalized linear regression model:
    probit(Default) ~ 1 + ScoreGroup + YOB + GDP + Market
    Distribution = Binomial

Estimated Coefficients:
                               Estimate        SE         tStat       pValue   
                              __________    _________    _______    ___________

    (Intercept)                  -1.6267      0.03811    -42.685              0
    ScoreGroup_Medium Risk      -0.26542      0.01419    -18.704     4.5503e-78
    ScoreGroup_Low Risk         -0.46794     0.016364    -28.595     7.775e-180
    YOB                         -0.11421    0.0049724    -22.969    9.6208e-117
    GDP                        -0.041537     0.014807    -2.8052      0.0050291
    Market                    -0.0029609    0.0010618    -2.7885      0.0052954


388097 observations, 388091 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 1.85e+03, p-value = 0

Predict Conditional and Lifetime PD

Use the predict function to predict conditional PD values. The prediction is a row-by-row prediction.

dataCustomer1 = data(1:8,:);
CondPD = predict(pdModel,dataCustomer1)

CondPD = 8×1

    0.0095
    0.0054
    0.0045
    0.0039
    0.0036
    0.0036
    0.0017
    0.0009

Use predictLifetime to predict the lifetime cumulative PD values (computing marginal and survival PD values is also supported). The predictLifetime function uses the ID variable (see the 'IDVar' property for the Logistic object) to transform conditional PDs to cumulative PDs for each ID.

LifetimePD = predictLifetime(pdModel,dataCustomer1)

LifetimePD = 8×1

    0.0095
    0.0149
    0.0193
    0.0232
    0.0267
    0.0302
    0.0318
    0.0327

Validate Model

Use modelDiscrimination to measure the ranking of customers by PD.

DiscMeasure = modelDiscrimination(pdModel,data(TestDataInd,:),DataID='test data');
disp(DiscMeasure)

                          AUROC 
                         _______

    Probit, test data    0.69984

Use modelDiscriminationPlot to visualize the ROC curve.

modelDiscriminationPlot(pdModel,data(TestDataInd,:),DataID='test data');

Figure contains an axes object. The axes object with title ROC test data Probit, AUROC = 0.69984, xlabel Fraction of Non-Defaulters, ylabel Fraction of Defaulters contains an object of type line. This object represents Probit.

Use modelCalibration to measure the calibration of the predicted PD values. The modelCalibration function requires a grouping variable and compares the accuracy of the observed default rate in the group with the average predicted PD for the group. For example, you can group by calendar year using the 'Year' variable.

CalMeasure = modelCalibration(pdModel,data(TestDataInd,:),'Year',DataID='test data');
disp(CalMeasure)

                                             RMSE   
                                          __________

    Probit, grouped by Year, test data    0.00039494

Use modelCalibrationPlot to visualize the observed default rates compared to the predicted probabilities of default (PD).

modelCalibrationPlot(pdModel,data(TestDataInd,:),'Year',DataID='test data');

Figure contains an axes object. The axes object with title Scatter Grouped by Year test data Probit, RMSE = 0.00039494, xlabel Year, ylabel PD contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Observed, Probit.

More About

expand all

Time Interval for `Probit` Models

For Logistic and Probit models, there is a time interval implicit in the data, specifically, in the definition of the default variable. For example, if the default indicator is defined so that it takes the value 1 if there is a default over a 3-month period, the time interval is 3-months. In this case, the predicted PD values are 3-month PD predictions. Then the PD for month 18 would be the conditional probability that there is a default between months 15 and 18, given that there has been no default in the first 15 months.

Because the data input for fitLifetimePDModel is in panel data form, there is an implicit or explicit time stamp for each row, and the time interval for the default definition should be the same as the time increments between consecutive rows. If there is an optional age variable (AgeVar) in the training data, the time interval is the same as the age increments (for the same ID) from one row to the next.

Logistic and Probit models infer the time interval from the increments in the age values in the training data when AgeVar is specified and contains numeric values. These models store the time interval value as the TimeInterval property. The predicted PD values returned by the predict function are consistent with the time interval implicit in the panel training data, which in turn should be the same as the time interval used to define the default variable. The TimeInterval property is also used to validate the data input to the predictLifetime function. For more information, see Validation of Data Input for Lifetime Prediction and Lifetime Prediction and Time Interval.

References

[1] Baesens, Bart, Daniel Roesch, and Harald Scheule. Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS. Wiley, 2016.

[2] Bellini, Tiziano. IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical Guide with Examples Worked in R and SAS. San Diego, CA: Elsevier, 2019.

[3] Breeden, Joseph. Living with CECL: The Modeling Dictionary. Santa Fe, NM: Prescient Models LLC, 2018.

[4] Roesch, Daniel and Harald Scheule. Deep Credit Risk: Machine Learning with Python. Independently published, 2020.

Version History

Introduced in R2020b

expand all

R2024a: Added `TimeInterval` name-value argument for `Probit` model

The Probit model supports a TimeInterval name-value argument.

R2023b: Added `WeightsVar` name-value argument for `Probit` model

The Probit model supports a WeightsVar name-value argument for observation weights.

R2023a: `modelAccuracy` object function is renamed to `modelCalibration` function

The modelAccuracy object function is renamed to modelCalibration function. The use of modelAccuracy is discouraged, use modelCalibration instead.

R2023a: `modelAccuracyPlot` object function is renamed to `modelCalibrationPlot` function

The modelAccuracyPlot object function is renamed to modelCalibrationPlot function. The use of modelAccuracyPlot is discouraged, use modelCalibrationPlot instead.

R2023a: `Model` property renamed to `UnderlyingModel`

The Model property is renamed to UnderlyingModel.

Probit

Description

Creation

Syntax

Description

Input Arguments

data — Data table

ModelType — Model type string with value "Probit" | character vector with value 'Probit'

Name-Value Arguments

ModelID — User-defined model ID Probit (default) | string | character vector

Description — User-defined description for model "" (default) | string | character vector

IDVar — ID variable indicating which column in data contains loan or borrower ID 1st column of data (default) | string | character vector

AgeVar — Age variable indicating which column in data contains loan age information if not specified, then LoanVars (default) | string | character vector

LoanVars — Loan variables indicating which column in data contains loan-specific information all columns of data that are not the first or last column (default) | string array | cell array of character vectors

MacroVars — Macro variables indicating which column in data contains macroeconomic information if not specified, then LoanVars (default) | string array | cell array of character vectors

ResponseVar — Variable indicating which column in data contains response variable string | character vector

WeightsVar — Column name containing weights "" (default) | string array

TimeInterval — Time interval value most frequent AgeVar increment (default) | positive numeric scalar

Properties

ModelID — User-defined Model ID Probit (default) | string

Description — User-defined description "" (default) | string

UnderlyingModel — Underlying statistical model compact linear model

IDVar — ID variable indicating which column in data contains loan or borrower ID 1st column of data (default) | string

AgeVar — Age variable indicating which column in data contains loan age information if not specified, then LoanVars (default) | string

LoanVars — Loan variables indicating which column in data contains loan-specific information all columns of data that are not the first or last column (default) | string array

MacroVars — Macro variables indicating which column in data contains macroeconomic information if not specified, then LoanVars (default) | string array

ResponseVar — Variable indicating which column in data contains response variable string

WeightsVar — Column name containing weights "" (default) | string scalar

TimeInterval — Time interval value most frequent AgeVar increment (default) | positive numeric scalar

Object Functions

Examples

Create Probit Lifetime PD Model

More About

Time Interval for Probit Models

References

Version History

R2024a: Added TimeInterval name-value argument for Probit model

R2023b: Added WeightsVar name-value argument for Probit model

R2023a: modelAccuracy object function is renamed to modelCalibration function

R2023a: modelAccuracyPlot object function is renamed to modelCalibrationPlot function

R2023a: Model property renamed to UnderlyingModel

See Also

Functions

Topics

`data` — Data
table

`ModelType` — Model type
string with value `"Probit"` | character vector with value `'Probit'`

`ModelID` — User-defined model ID
`Probit` (default) | string | character vector

`Description` — User-defined description for model
`""` (default) | string | character vector

`IDVar` — ID variable indicating which column in `data` contains loan or borrower ID
1st column of `data` (default) | string | character vector

`AgeVar` — Age variable indicating which column in `data` contains loan age information
if not specified, then `LoanVars` (default) | string | character vector

`LoanVars` — Loan variables indicating which column in `data` contains loan-specific information
all columns of `data` that are not the first or last column (default) | string array | cell array of character vectors

`MacroVars` — Macro variables indicating which column in `data` contains macroeconomic information
if not specified, then `LoanVars` (default) | string array | cell array of character vectors

`ResponseVar` — Variable indicating which column in `data` contains response variable
string | character vector

`WeightsVar` — Column name containing weights
`""` (default) | string array

`TimeInterval` — Time interval value
most frequent `AgeVar` increment (default) | positive numeric scalar

`ModelID` — User-defined Model ID
`Probit` (default) | string

`Description` — User-defined description
`""` (default) | string

`UnderlyingModel` — Underlying statistical model
compact linear model

`IDVar` — ID variable indicating which column in `data` contains loan or borrower ID
1st column of `data` (default) | string

`AgeVar` — Age variable indicating which column in `data` contains loan age information
if not specified, then `LoanVars` (default) | string

`LoanVars` — Loan variables indicating which column in `data` contains loan-specific information
all columns of `data` that are not the first or last column (default) | string array

`MacroVars` — Macro variables indicating which column in `data` contains macroeconomic information
if not specified, then `LoanVars` (default) | string array

`ResponseVar` — Variable indicating which column in `data` contains response variable
string

`WeightsVar` — Column name containing weights
`""` (default) | string scalar

`TimeInterval` — Time interval value
most frequent `AgeVar` increment (default) | positive numeric scalar

Time Interval for `Probit` Models

R2024a: Added `TimeInterval` name-value argument for `Probit` model

R2023b: Added `WeightsVar` name-value argument for `Probit` model

R2023a: `modelAccuracy` object function is renamed to `modelCalibration` function

R2023a: `modelAccuracyPlot` object function is renamed to `modelCalibrationPlot` function

R2023a: `Model` property renamed to `UnderlyingModel`