Compare Lifetime PD Models Using Cross-Validation

This example shows how to compare three lifetime PD models using cross-validation.

Load Data

Load the portfolio data, which includes load and macro information. This is a simulated data set used for illustration purposes.

load RetailCreditPanelData.mat
data = join(data,dataMacro);
disp(head(data))

    ID    ScoreGroup    YOB    Default    Year     GDP     Market
    __    __________    ___    _______    ____    _____    ______

    1      Low Risk      1        0       1997     2.72      7.61
    1      Low Risk      2        0       1998     3.57     26.24
    1      Low Risk      3        0       1999     2.86      18.1
    1      Low Risk      4        0       2000     2.43      3.19
    1      Low Risk      5        0       2001     1.26    -10.51
    1      Low Risk      6        0       2002    -0.59    -22.95
    1      Low Risk      7        0       2003     0.63      2.78
    1      Low Risk      8        0       2004     1.85      9.48

Cross Validation

Because the data is panel data, there are multiple rows for each customer. You set up cross validation partitions over the customer IDs, not over the rows of the data set. In this way, a customer can be in either a training set or a test set, but the rows corresponding to the same customer are not split between training and testing.

nIDs = max(data.ID);
uniqueIDs = unique(data.ID);

NumFolds = 5;
rng('default'); % for reproducibility
c = cvpartition(nIDs,'KFold',NumFolds);

Compare Logistic, Probit, Cox lifetime PD models using the same variables.

CVModels = ["logistic";"probit";"cox"];
NumModels = length(CVModels);

AUROC = zeros(NumFolds,NumModels);
RMSE = zeros(NumFolds,NumModels);

for ii=1:NumFolds
      
   fprintf('Fitting models, fold %d\n',ii);
      
   % Get indices for ID partition
   TrainIDInd = training(c,ii);
   TestIDInd = test(c,ii);
   % Convert to row indices
   TrainDataInd = ismember(data.ID,uniqueIDs(TrainIDInd));
   TestDataInd = ismember(data.ID,uniqueIDs(TestIDInd));
   
   % For each model, fit with training data, measure with test data
   for jj=1:NumModels
      % Fit model with training data
      pdModel = fitLifetimePDModel(data(TrainDataInd,:),CVModels(jj),...
             'IDVar','ID','AgeVar','YOB','LoanVars','ScoreGroup',...
             'MacroVars',{'GDP','Market'},'ResponseVar','Default');
      % Measure discrimination on test data
      DiscMeasure = modelDiscrimination(pdModel,data(TestDataInd,:));
      AUROC(ii,jj) = DiscMeasure.AUROC;
      
      % Measure calibration on test data, grouping by YOB (age) and score group
      CalMeasure = modelCalibration(pdModel,data(TestDataInd,:),["YOB" "ScoreGroup"]);
      RMSE(ii,jj) = CalMeasure.RMSE;
   end
end

Fitting models, fold 1
Fitting models, fold 2
Fitting models, fold 3
Fitting models, fold 4
Fitting models, fold 5

Using the discrimination and accuracy measures for the different folds, you can compare the models. In this example, the metrics are displayed. You can also compare the mean AUROC or the mean RMSE by comparing the proportion of times a model is superior regarding discrimination or accuracy. The three models in this example are very comparable.

AUROCTable = array2table(AUROC,"RowNames",strcat("Fold ",string(1:NumFolds)),"VariableNames",strcat("AUROC_",CVModels))

AUROCTable=5×3 table
              AUROC_logistic    AUROC_probit    AUROC_cox
              ______________    ____________    _________

    Fold 1       0.69558           0.6957        0.69565 
    Fold 2       0.70265          0.70335        0.70366 
    Fold 3       0.69055          0.69037        0.69008 
    Fold 4       0.70268          0.70232        0.70296 
    Fold 5       0.68784          0.68781        0.68811

RMSETable = array2table(RMSE,"RowNames",strcat("Fold ",string(1:NumFolds)),"VariableNames",strcat("RMSE_",CVModels))

RMSETable=5×3 table
              RMSE_logistic    RMSE_probit     RMSE_cox 
              _____________    ___________    __________

    Fold 1      0.0019412       0.0020972      0.0020048
    Fold 2      0.0011167       0.0011644      0.0011612
    Fold 3      0.0011536       0.0011802      0.0012766
    Fold 4      0.0010269      0.00097877     0.00099473
    Fold 5      0.0015965        0.001485      0.0015829

Compare Lifetime PD Models Using Cross-Validation

Load Data

Cross Validation

See Also

Related Topics