Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

fitmodel

Fit logistic regression model to Weight of Evidence (WOE) data

Syntax

sc = fitmodel(sc)
[sc,mdl] = fitmodel(sc)
[sc,mdl] = fitmodel(___,Name,Value)

Description

example

sc = fitmodel(sc) fits a logistic regression model to the Weight of Evidence (WOE) data and stores the model predictor names and corresponding coefficients in the creditscorecard object.

fitmodel internally transforms all the predictor variables into WOE values, using the bins found with the automatic or manual binning process. The response variable is mapped so that "Good" is 1, and "Bad" is 0. This implies that higher (unscaled) scores correspond to better (less risky) individuals (smaller probability of default).

Alternatively, you can use setmodel to provide names of the predictors that you want in the logistic regression model, along with their corresponding coefficients.

example

[sc,mdl] = fitmodel(sc) fits a logistic regression model to the Weight of Evidence (WOE) data and stores the model predictor names and corresponding coefficients in the creditscorecard object. fitmodel returns an updated creditscorecard object and a GeneralizedLinearModel object containing the fitted model.

fitmodel internally transforms all the predictor variables into WOE values, using the bins found with the automatic or manual binning process. The response variable is mapped so that "Good" is 1, and "Bad" is 0. This implies that higher (unscaled) scores correspond to better (less risky) individuals (smaller probability of default).

Alternatively, you can use setmodel to provide names of the predictors that you want in the logistic regression model, along with their corresponding coefficients.

example

[sc,mdl] = fitmodel(___,Name,Value) fits a logistic regression model to the Weight of Evidence (WOE) data using optional name-value pair arguments and stores the model predictor names and corresponding coefficients in the creditscorecard object. Using name-value pair arguments, you can select which Generalized Linear Model to fit the data. fitmodel returns an updated creditscorecard object and a GeneralizedLinearModel object containing the fitted model.

Examples

collapse all

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData
sc = creditscorecard(data,'IDVar','CustID')
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {1x11 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x11 table]

Perform automatic binning.

sc = autobinning(sc)
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {1x11 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x11 table]

Use fitmodel to fit a logistic regression model using Weight of Evidence (WOE) data. fitmodel internally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process. fitmodel then fits a logistic regression model using a stepwise method (by default).

sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769

Generalized linear regression model:
    status ~ [Linear formula with 8 terms in 7 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70239     0.064001    10.975    5.0538e-28
    CustAge        0.60833      0.24932      2.44      0.014687
    ResStatus        1.377      0.65272    2.1097      0.034888
    EmpStatus      0.88565        0.293    3.0227     0.0025055
    CustIncome     0.70164      0.21844    3.2121     0.0013179
    TmWBank         1.1074      0.23271    4.7589    1.9464e-06
    OtherCC         1.0883      0.52912    2.0569      0.039696
    AMBalance        1.045      0.32214    3.2439     0.0011792


1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

Use the CreditCardData.mat file to load the data (dataWeights) that contains a column (RowWeights) for the weights (using a dataset from Refaat 2011).

load CreditCardData

Create a creditscorecard object using the optional name-value pair argument for 'WeightsVar'.

sc = creditscorecard(dataWeights,'IDVar','CustID','WeightsVar','RowWeights')
sc = 

  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: 'RowWeights'
                 VarNames: {1x12 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x12 table]

Perform automatic binning.

sc = autobinning(sc)
sc = 

  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: 'RowWeights'
                 VarNames: {1x12 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x12 table]

Use fitmodel to fit a logistic regression model using Weight of Evidence (WOE) data. fitmodel internally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process. fitmodel then fits a logistic regression model using a stepwise method (by default). When the optional name-value pair argument 'WeightsVar' is used to specify observation (sample) weights, the mdl output uses the weighted counts with stepwiseglm and fitglm.

[sc,mdl] = fitmodel(sc);
1. Adding CustIncome, Deviance = 764.3187, Chi2Stat = 15.81927, PValue = 6.968927e-05
2. Adding TmWBank, Deviance = 751.0215, Chi2Stat = 13.29726, PValue = 0.0002657942
3. Adding AMBalance, Deviance = 743.7581, Chi2Stat = 7.263384, PValue = 0.007037455

Generalized linear regression model:
    logit(status) ~ 1 + CustIncome + TmWBank + AMBalance
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE       tStat       pValue  
                   ________    ________    ______    __________

    (Intercept)    0.70642     0.088702     7.964    1.6653e-15
    CustIncome      1.0268      0.25758    3.9862    6.7132e-05
    TmWBank         1.0973      0.31294    3.5063     0.0004543
    AMBalance       1.0039      0.37576    2.6717     0.0075464


1200 observations, 1196 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 36.4, p-value = 6.22e-08

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData
sc = creditscorecard(data,'IDVar','CustID')
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {1x11 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x11 table]

Perform automatic binning.

sc = autobinning(sc,'Algorithm','EqualFrequency')
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {1x11 cell}
        NumericPredictors: {1x6 cell}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
                    IDVar: 'CustID'
            PredictorVars: {1x9 cell}
                     Data: [1200x11 table]

Use fitmodel to fit a logistic regression model using Weight of Evidence (WOE) data. fitmodel internally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process. Set the VariableSelection name-value pair argument to FullModel to specify that all predictors must be included in the fitted logistic regression model.

sc = fitmodel(sc,'VariableSelection','FullModel');
Generalized linear regression model:
    status ~ [Linear formula with 10 terms in 9 predictors]
    Distribution = Binomial

Estimated Coefficients:
                   Estimate       SE        tStat      pValue  
                   ________    ________    _______    _________

    (Intercept)    0.70262     0.063862     11.002    3.734e-28
    CustAge        0.57683      0.27064     2.1313     0.033062
    TmAtAddress     1.0653      0.55233     1.9287     0.053762
    ResStatus       1.4189      0.65162     2.1775     0.029441
    EmpStatus      0.89916      0.29217     3.0776     0.002087
    CustIncome     0.77506      0.21942     3.5323    0.0004119
    TmWBank         1.0826      0.26583     4.0727    4.648e-05
    OtherCC         1.1354      0.52827     2.1493     0.031612
    AMBalance      0.99315      0.32642     3.0425    0.0023459
    UtilRate       0.16723      0.55745    0.29999      0.76419


1200 observations, 1190 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 85.6, p-value = 1.25e-14

Input Arguments

collapse all

Credit scorecard model, specified as a creditscorecard object. Use creditscorecard to create a creditscorecard object.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: [sc,mdl] = fitmodel(sc,'VariableSelection','FullModel')

collapse all

Predictor variables for fitting the creditscorecard object, specified when using a cell array of character vectors. When provided, the creditscorecard object property PredictorsVars is updated. When not provided, the predictors used to create the creditscorecard object (by using creditscorecard) are used.

The variable selection method to fit the logistic regression model, specified as a character vector with values 'Stepwise' or 'FullModel':

  • Stepwise — Uses a stepwise selection method which calls the Statistics and Machine Learning Toolbox™ function stepwiseglm. Only variables in PredictorVars can potentially become part of the model and uses the StartingModel name-value pair argument to select the starting model.

  • FullModel — Fits a model with all predictor variables in the PredictorVars name-value pair argument and calls fitglm.

Note

Only variables in the PredictorVars property of the creditscorecard object can potentially become part of the logistic regression model and only linear terms are included in this model with no interactions or any other higher-order terms.

The response variable is mapped so that “Good” is 1 and “Bad” is 0.

Data Types: char

Initial model for the Stepwise variable selection method, specified using a character vector with values 'Constant' or 'Linear'. This option determines the initial model (constant or linear) that the Statistics and Machine Learning Toolbox function stepwiseglm starts with.

  • Constant — Starts the stepwise method with an empty (constant only) model.

  • Linear — Starts the stepwise method from a full (all predictors in) model.

Note

StartingModel is used only for the Stepwise option of VariableSelection and has no effect for the FullModel option of VariableSelection.

Data Types: char

Indicator to display model information at command line, specified using a character vector with value 'On' or 'Off'.

Data Types: char

Output Arguments

collapse all

Credit scorecard model, returned as an updated creditscorecard object. The creditscorecard object contains information about the model predictors and coefficients used to fit the WOE data. For more information on using the creditscorecard object, see creditscorecard.

Fitted logistic model, retuned as an object of type GeneralizedLinearModel containing the fitted model. For more information on a GeneralizedLinearModel object, see GeneralizedLinearModel.

Note

When creating the creditscorecard object with creditscorecard, if the optional name-value pair argument WeightsVar was used to specify observation (sample) weights, then mdl uses the weighted counts with stepwiseglm and fitglm.

More About

collapse all

Using fitmodel with Weights

When observation weights are provided in the credit scorecard data, the weights are used to calibrate the model coefficients.

The underlying Statistics and Machine Learning Toolboxfunctionality for stepwiseglm and fitglm supports observation weights. The weights also affect the logistic model through the WOE values. The WOE transformation is applied to all predictors before fitting the logistic model. The observation weights directly impact the WOE values. For more information, see Using bininfo with Weights and Credit Scorecard Modeling Using Observation Weights.

Therefore, the credit scorecard points and final score depend on the observation weights through both the logistic model coefficients and the WOE values.

Algorithms

For the logistic regression model used in the creditscorecard object, the probability of being “Bad” is given by

ProbBad = exp(-s) / (1 + exp(-s)).

References

[1] Anderson, R. The Credit Scoring Toolkit. Oxford University Press, 2007.

[2] Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS. lulu.com, 2011.

Introduced in R2014b

Was this topic helpful?