Main Content

Compare Logisitic Model for Lifetime PD to Champion Model

This example shows how to compare a new Logistic model for lifetime PD against a "champion" model.

Load Data

Load the portfolio data, which includes loan and macro information.

load RetailCreditPanelData.mat
data = join(data,dataMacro);
disp(head(data))
    ID    ScoreGroup    YOB    Default    Year     GDP     Market
    __    __________    ___    _______    ____    _____    ______

    1      Low Risk      1        0       1997     2.72      7.61
    1      Low Risk      2        0       1998     3.57     26.24
    1      Low Risk      3        0       1999     2.86      18.1
    1      Low Risk      4        0       2000     2.43      3.19
    1      Low Risk      5        0       2001     1.26    -10.51
    1      Low Risk      6        0       2002    -0.59    -22.95
    1      Low Risk      7        0       2003     0.63      2.78
    1      Low Risk      8        0       2004     1.85      9.48

nIDs = max(data.ID);
uniqueIDs = unique(data.ID);

rng('default'); % for reproducibility
c = cvpartition(nIDs,'HoldOut',0.4);

TrainIDInd = training(c);
TestIDInd = test(c);

TrainDataInd = ismember(data.ID,uniqueIDs(TrainIDInd));
TestDataInd = ismember(data.ID,uniqueIDs(TestIDInd));

Fit Logisitic Model

For this example, fit a new model using only score group information but no age information. First, you can validate this model in a standalone fashion. For more information, see Basic Lifetime PD Model Validation.

Age information is important in this data set. The new model does not perform as well as the champion model (which includes age, score group, and macro vars).

Fit a new Logistic model using fitLifetimePDModel.

ModelType = "logistic";
pdModel = fitLifetimePDModel(data(TrainDataInd,:),ModelType,...
   'ModelID','LogisticNoAge',...
   'IDVar','ID',...
   'LoanVars','ScoreGroup',...
   'MacroVars',{'GDP','Market'},...
   'ResponseVar','Default');
disp(pdModel)
  Logistic with properties:

        ModelID: "LogisticNoAge"
    Description: ""
          Model: [1x1 classreg.regr.CompactGeneralizedLinearModel]
          IDVar: "ID"
         AgeVar: ""
       LoanVars: "ScoreGroup"
      MacroVars: ["GDP"    "Market"]
    ResponseVar: "Default"

Compare Performance of the Logistic Model to Champion Model

To compare the new Logistic model to a champion model, you need access to the predictions of the champion model. The champion model might even have different predictors, so the mapping between the data being used and the exact inputs of the champion model might require an intermediate preprocessing step. This example assumes that you have a black-box tool to get the predictions from the champion model.

Compare the model performance for both models using modelDiscrimination.

DataSetChoice = "Testing";
if DataSetChoice=="Training"
    Ind = TrainDataInd;
else
    Ind = TestDataInd;
end

ChampionPD = getChampionModelPDs(data(Ind,:));

[DiscMeasure,DiscData] = modelDiscrimination(pdModel,data(Ind,:),'DataID',DataSetChoice,...
   'ReferencePD',ChampionPD,'ReferenceID',"Champion");
disp(DiscMeasure)
                               AUROC 
                              _______

    LogisticNoAge, Testing    0.66503
    Champion, Testing         0.70018
disp(head(DiscData))
        ModelID           X           Y           T    
    _______________    ________    ________    ________

    "LogisticNoAge"           0           0     0.02287
    "LogisticNoAge"     0.04673    0.090978     0.02287
    "LogisticNoAge"    0.064656     0.14922    0.022711
    "LogisticNoAge"     0.10982     0.22764    0.020553
    "LogisticNoAge"     0.14421       0.311    0.018483
    "LogisticNoAge"     0.19237     0.41454     0.01722
    "LogisticNoAge"     0.23558     0.43738    0.014125
    "LogisticNoAge"     0.27979     0.52037    0.012812
disp(tail(DiscData))
     ModelID         X          Y           T     
    __________    _______    _______    __________

    "Champion"    0.88743    0.98021     0.0032242
    "Champion"    0.90293    0.98477     0.0025583
    "Champion"    0.91884    0.98896     0.0023801
    "Champion"    0.93303    0.99239     0.0018756
    "Champion"    0.94995    0.99391     0.0017711
    "Champion"    0.96705    0.99695     0.0016436
    "Champion"    0.98295    0.99886     0.0012847
    "Champion"          1          1    0.00086887
IndModel = DiscData.ModelID=="LogisticNoAge";
plot(DiscData.X(IndModel),DiscData.Y(IndModel))
hold on
IndModel = DiscData.ModelID=="Champion";
plot(DiscData.X(IndModel),DiscData.Y(IndModel),':')
hold off
title(strcat("ROC ",pdModel.ModelID))
xlabel('Fraction of non-defaulters')
ylabel('Fraction of defaulters')
legend(strcat(DiscMeasure.Properties.RowNames,", AUROC = ",num2str(DiscMeasure.AUROC)),'Location','southeast')

[DiscMeasure,DiscData] = modelDiscrimination(pdModel,data(Ind,:),'SegmentBy','YOB','DataID',DataSetChoice,...
   'ReferencePD',ChampionPD,'ReferenceID',"Champion");
disp(DiscMeasure)
                                      AUROC 
                                     _______

    LogisticNoAge, YOB=1, Testing    0.64879
    Champion, YOB=1, Testing         0.64972
    LogisticNoAge, YOB=2, Testing    0.65699
    Champion, YOB=2, Testing         0.66496
    LogisticNoAge, YOB=3, Testing    0.63508
    Champion, YOB=3, Testing         0.64774
    LogisticNoAge, YOB=4, Testing    0.62656
    Champion, YOB=4, Testing         0.66204
    LogisticNoAge, YOB=5, Testing     0.6205
    Champion, YOB=5, Testing         0.65439
    LogisticNoAge, YOB=6, Testing    0.61739
    Champion, YOB=6, Testing         0.63156
    LogisticNoAge, YOB=7, Testing    0.64016
    Champion, YOB=7, Testing         0.63117
    LogisticNoAge, YOB=8, Testing    0.63339
    Champion, YOB=8, Testing         0.63339
disp(head(DiscData))
        ModelID        YOB       X          Y           T    
    _______________    ___    _______    _______    _________

    "LogisticNoAge"     1           0          0     0.022711
    "LogisticNoAge"     1     0.12062    0.22401     0.022711
    "LogisticNoAge"     1     0.23459    0.41435     0.018483
    "LogisticNoAge"     1     0.33329    0.59151      0.01722
    "LogisticNoAge"     1     0.45578    0.69107      0.01151
    "LogisticNoAge"     1      0.5683    0.77452     0.009347
    "LogisticNoAge"     1     0.67031    0.84919    0.0087028
    "LogisticNoAge"     1     0.78943     0.9063    0.0064814
disp(tail(DiscData))
        ModelID        YOB       X         Y           T     
    _______________    ___    _______    ______    __________

    "LogisticNoAge"     8           0         0      0.014125
    "LogisticNoAge"     8     0.31762    0.5625      0.014125
    "LogisticNoAge"     8     0.65751    0.8125     0.0071273
    "LogisticNoAge"     8           1         1     0.0040058
    "Champion"          8           0         0     0.0040291
    "Champion"          8     0.31762    0.5625     0.0040291
    "Champion"          8     0.65751    0.8125     0.0017711
    "Champion"          8           1         1    0.00086887

Compare Accuracy Against Champion Model

Compare the accuracy of the two models with modelAccuracy.

GroupingVar = "YOB";
[AccMeasure,AccData] = modelAccuracy(pdModel,data(Ind,:),GroupingVar,'DataID',DataSetChoice,...
   'ReferencePD',ChampionPD,'ReferenceID',"Champion");
disp(AccMeasure)
                                                 RMSE   
                                              __________

    LogisticNoAge, grouped by YOB, Testing     0.0031021
    Champion, grouped by YOB, Testing         0.00046476
disp(head(AccData))
     ModelID      YOB       PD    
    __________    ___    _________

    "Observed"     1      0.017636
    "Observed"     2      0.013303
    "Observed"     3      0.010846
    "Observed"     4      0.010709
    "Observed"     5     0.0093528
    "Observed"     6     0.0060197
    "Observed"     7     0.0034776
    "Observed"     8     0.0012535
disp(tail(AccData))
     ModelID      YOB       PD    
    __________    ___    _________

    "Champion"     1      0.017244
    "Champion"     2      0.012999
    "Champion"     3      0.011428
    "Champion"     4      0.010693
    "Champion"     5     0.0085574
    "Champion"     6      0.005937
    "Champion"     7     0.0035193
    "Champion"     8     0.0021802
AccDataUnstacked = unstack(AccData,"PD","ModelID");
figure;
plot(AccDataUnstacked.(GroupingVar),AccDataUnstacked.(pdModel.ModelID),'-o')
hold on
plot(AccDataUnstacked.(GroupingVar),AccDataUnstacked.Observed,'*')
plot(AccDataUnstacked.(GroupingVar),AccDataUnstacked.("Champion"),':s')
hold off
title(strcat(AccMeasure.Properties.RowNames,", RMSE = ",num2str(AccMeasure.RMSE)))
xlabel(GroupingVar)
ylabel('PD')
legend(pdModel.ModelID,"Observed","Champion")
grid on

[AccMeasure,AccData] = modelAccuracy(pdModel,data(Ind,:),["YOB","ScoreGroup"],'DataID',DataSetChoice,...
   'ReferencePD',ChampionPD,'ReferenceID',"Champion");
disp(AccMeasure)
                                                            RMSE   
                                                          _________

    LogisticNoAge, grouped by YOB, ScoreGroup, Testing    0.0036974
    Champion, grouped by YOB, ScoreGroup, Testing         0.0010716
disp(head(AccData))
     ModelID      YOB    ScoreGroup        PD    
    __________    ___    ___________    _________

    "Observed"     1     High Risk       0.030877
    "Observed"     1     Medium Risk     0.013541
    "Observed"     1     Low Risk       0.0081449
    "Observed"     2     High Risk       0.022838
    "Observed"     2     Medium Risk     0.012376
    "Observed"     2     Low Risk       0.0046482
    "Observed"     3     High Risk       0.017651
    "Observed"     3     Medium Risk    0.0092652
unstack(AccData,'PD','ModelID')
ans=24×5 table
    YOB    ScoreGroup     Champion     LogisticNoAge    Observed 
    ___    ___________    _________    _____________    _________

     1     High Risk       0.028165       0.019641       0.030877
     1     Medium Risk     0.014833      0.0099388       0.013541
     1     Low Risk        0.008422      0.0055911      0.0081449
     2     High Risk        0.02167       0.019337       0.022838
     2     Medium Risk     0.011123      0.0098141       0.012376
     2     Low Risk       0.0061856      0.0055194      0.0046482
     3     High Risk       0.019285       0.020139       0.017651
     3     Medium Risk    0.0098085       0.010179      0.0092652
     3     Low Risk       0.0054096      0.0057356       0.005813
     4     High Risk       0.018136       0.019175       0.018562
     4     Medium Risk    0.0091921      0.0096563      0.0094929
     4     Low Risk       0.0050562      0.0054292       0.004392
     5     High Risk       0.014818       0.014806       0.016288
     5     Medium Risk    0.0072853       0.007454      0.0080033
     5     Low Risk       0.0039358      0.0041822      0.0041745
     6     High Risk        0.01049       0.012153      0.0096889
      ⋮

Compare Two Models Under Development

You can also compare two new models under development.

pdModelTTC = fitLifetimePDModel(data(TrainDataInd,:),"probit",...
   'ModelID','ProbitTTC',...
   'AgeVar','YOB',...
   'IDVar','ID',...
   'LoanVars','ScoreGroup',...
   'ResponseVar','Default',...
   'Description',"TTC model, no macro variables, probit.");
disp(pdModelTTC)
  Probit with properties:

        ModelID: "ProbitTTC"
    Description: "TTC model, no macro variables, probit."
          Model: [1x1 classreg.regr.CompactGeneralizedLinearModel]
          IDVar: "ID"
         AgeVar: "YOB"
       LoanVars: "ScoreGroup"
      MacroVars: ""
    ResponseVar: "Default"

Compare the accuracy.

[AccMeasureTTC,AccDataTTC] = modelAccuracy(pdModelTTC,data(Ind,:),["YOB","ScoreGroup"],'DataID',DataSetChoice,...
   'ReferencePD',predict(pdModel,data(Ind,:)),'ReferenceID',pdModel.ModelID);
disp(AccMeasureTTC)
                                                            RMSE   
                                                          _________

    ProbitTTC, grouped by YOB, ScoreGroup, Testing        0.0016726
    LogisticNoAge, grouped by YOB, ScoreGroup, Testing    0.0036974
unstack(AccDataTTC,'PD','ModelID')
ans=24×5 table
    YOB    ScoreGroup     LogisticNoAge    Observed     ProbitTTC
    ___    ___________    _____________    _________    _________

     1     High Risk         0.019641       0.030877     0.028114
     1     Medium Risk      0.0099388       0.013541     0.014865
     1     Low Risk         0.0055911      0.0081449    0.0087364
     2     High Risk         0.019337       0.022838     0.023239
     2     Medium Risk      0.0098141       0.012376     0.012053
     2     Low Risk         0.0055194      0.0046482    0.0069786
     3     High Risk         0.020139       0.017651     0.019096
     3     Medium Risk       0.010179      0.0092652    0.0097145
     3     Low Risk         0.0057356       0.005813    0.0055406
     4     High Risk         0.019175       0.018562     0.015599
     4     Medium Risk      0.0096563      0.0094929    0.0077825
     4     Low Risk         0.0054292       0.004392    0.0043722
     5     High Risk         0.014806       0.016288     0.012666
     5     Medium Risk       0.007454      0.0080033    0.0061971
     5     Low Risk         0.0041822      0.0041745    0.0034292
     6     High Risk         0.012153      0.0096889     0.010223
      ⋮

Black-Box Champion Prediction Function

function PD = getChampionModelPDs(data)

m = load('LifetimeChampionModel.mat');
PD = predict(m.pdModel,data);

end

See Also

| | | | | |

Related Topics