This example shows how to compare two lifetime PD models using cross-validation.
Load the portfolio data, which includes load and macro information. This is a simulated data set used for illustration purposes.
load RetailCreditPanelData.mat
data = join(data,dataMacro);
disp(head(data))
ID ScoreGroup YOB Default Year GDP Market __ __________ ___ _______ ____ _____ ______ 1 Low Risk 1 0 1997 2.72 7.61 1 Low Risk 2 0 1998 3.57 26.24 1 Low Risk 3 0 1999 2.86 18.1 1 Low Risk 4 0 2000 2.43 3.19 1 Low Risk 5 0 2001 1.26 -10.51 1 Low Risk 6 0 2002 -0.59 -22.95 1 Low Risk 7 0 2003 0.63 2.78 1 Low Risk 8 0 2004 1.85 9.48
Because the data is panel data, there are multiple rows for each customer. You set up cross validation partitions over the customer IDs, not over the rows of the data set. In this way, a customer can be in either a training set or a test set, but the rows corresponding to the same customer are not split between training and testing.
nIDs = max(data.ID); uniqueIDs = unique(data.ID); NumFolds = 5; rng('default'); % for reproducibility c = cvpartition(nIDs,'KFold',NumFolds);
Compare a Logistic
and Probit
lifetime PD models using the same variables.
CVModels = ["logistic";"probit"]; NumModels = length(CVModels); AUROC = zeros(NumFolds,NumModels); RMSE = zeros(NumFolds,NumModels); for ii=1:NumFolds fprintf('Fitting models, fold %d\n',ii); % Get indices for ID partition TrainIDInd = training(c,ii); TestIDInd = test(c,ii); % Convert to row indices TrainDataInd = ismember(data.ID,uniqueIDs(TrainIDInd)); TestDataInd = ismember(data.ID,uniqueIDs(TestIDInd)); % For each model, fit with training data, measure with test data for jj=1:NumModels % Fit model with training data pdModel = fitLifetimePDModel(data(TrainDataInd,:),CVModels(jj),... 'IDVar','ID','AgeVar','YOB','LoanVars','ScoreGroup',... 'MacroVars',{'GDP','Market'},'ResponseVar','Default'); % Measure discrimination on test data DiscMeasure = modelDiscrimination(pdModel,data(TestDataInd,:)); AUROC(ii,jj) = DiscMeasure.AUROC; % Measure accuracy on test data, grouping by YOB (age) and score group AccMeasure = modelAccuracy(pdModel,data(TestDataInd,:),["YOB" "ScoreGroup"]); RMSE(ii,jj) = AccMeasure.RMSE; end end
Fitting models, fold 1 Fitting models, fold 2 Fitting models, fold 3 Fitting models, fold 4 Fitting models, fold 5
Using the discrimination and accuracy measures for the different folds, you can compare the models. In this example, the metrics are displayed. You can also compare the mean AUROC or the mean RMSE by comparing the proportion of times a model is superior regarding discrimination or accuracy. The two models in this example are very comparable.
AUROCTable = array2table(AUROC,"RowNames",strcat("Fold ",string(1:NumFolds)),"VariableNames",strcat("AUROC_",CVModels))
AUROCTable=5×2 table
AUROC_logistic AUROC_probit
______________ ____________
Fold 1 0.69558 0.6957
Fold 2 0.70265 0.70335
Fold 3 0.69055 0.69037
Fold 4 0.70268 0.70232
Fold 5 0.68784 0.68781
RMSETable = array2table(RMSE,"RowNames",strcat("Fold ",string(1:NumFolds)),"VariableNames",strcat("RMSE_",CVModels))
RMSETable=5×2 table
RMSE_logistic RMSE_probit
_____________ ___________
Fold 1 0.0019412 0.0020972
Fold 2 0.0011167 0.0011644
Fold 3 0.0011536 0.0011802
Fold 4 0.0010269 0.00097877
Fold 5 0.0015965 0.001485
fitLifetimePDModel
| Logistic
| modelAccuracy
| modelDiscrimination
| predict
| predictLifetime
| Probit