This example shows how to use setmodel
to fit a logistic regression model directly, without using the fitmodel
function, and then set the new model predictors and coefficients back into the creditscorecard
object. This approach gives more flexibility regarding options to control the stepwise procedure. This example fits a logistic regression model with a nondefault value for the 'PEnter'
parameter, the criterion to admit a new predictor in the logistic regression model during the stepwise procedure.
Create a creditscorecard
object using the CreditCardData.mat
file to load the data
(using a dataset from Refaat 2011). Use the 'IDVar'
argument to indicate that 'CustID'
contains ID information and should not be included as a predictor variable.
sc =
creditscorecard with properties:
GoodLabel: 0
ResponseVar: 'status'
WeightsVar: ''
VarNames: {1x11 cell}
NumericPredictors: {1x6 cell}
CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'}
BinMissingData: 0
IDVar: 'CustID'
PredictorVars: {1x9 cell}
Data: [1200x11 table]
Perform automatic binning.
The logistic regression model needs to be fit with Weight of Evidence (WOE) data. The WOE transformation is a special case of binning, since the data first needs to be binned, and then the binned information is mapped to the corresponding WOE values. This transformation is done using the bindata
function. bindata
has an argument that prepares the data for the model fitting step. By setting the bindata
name-value pair argument for 'OutputType'
to WOEModelInput'
:
All predictors are converted to WOE values.
The output contains only predictors and response (no 'IDVar'
or any unused variables).
Predictors with infinite or undefined (NaN
) WOE values are discarded.
The response values are mapped so that "Good" is 1
and "Bad" is 0
(this implies that higher unscaled scores correspond to better, less risky customers).
For example, the first ten rows in the original data for the variables 'CustAge'
, 'ResStatus'
, 'CustIncome'
, and 'status'
(response variable) look like this:
ans=10×4 table
CustAge ResStatus CustIncome status
_______ __________ __________ ______
53 Tenant 50000 0
61 Home Owner 52000 0
47 Tenant 37000 0
50 Home Owner 53000 0
68 Home Owner 53000 0
65 Home Owner 48000 0
34 Home Owner 32000 1
50 Other 51000 0
50 Tenant 52000 1
49 Home Owner 53000 1
Here is how the same ten rows look after calling bindata
with the name-value pair argument 'OutputType'
set to 'WOEModelInput'
:
ans=10×4 table
CustAge ResStatus CustIncome status
________ _________ __________ ______
0.21378 -0.095564 0.47972 1
0.62245 0.019329 0.47972 1
0.18758 -0.095564 -0.026696 1
0.21378 0.019329 0.47972 1
0.62245 0.019329 0.47972 1
0.62245 0.019329 0.47972 1
-0.39568 0.019329 -0.29217 0
0.21378 0.20049 0.47972 1
0.21378 -0.095564 0.47972 0
0.21378 0.019329 0.47972 0
Fit a logistic linear regression model using a stepwise method with the Statistics and Machine Learning Toolbox™ function stepwiseglm
, but use a nondefault value for the 'PEnter'
and 'PRemove'
optional arguments. The predictors 'ResStatus'
and 'OtherCC'
would normally be included in the logistic linear regression model using default options for the stepwise procedure.
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
mdl =
Generalized linear regression model:
logit(status) ~ 1 + CustAge + EmpStatus + CustIncome + TmWBank + AMBalance
Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue
________ ________ ______ __________
(Intercept) 0.70263 0.063759 11.02 3.0544e-28
CustAge 0.57265 0.2482 2.3072 0.021043
EmpStatus 0.88356 0.29193 3.0266 0.002473
CustIncome 0.70399 0.21781 3.2321 0.001229
TmWBank 1.1 0.23185 4.7443 2.0924e-06
AMBalance 1.0313 0.32007 3.2221 0.0012724
1200 observations, 1194 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 81.4, p-value = 4.18e-16
Use setmodel
to update the model predictors and model coefficients in the creditscorecard
object. The ModelPredictors
input argument does not explicitly include a string for the intercept. However, the ModelCoefficients
input argument does have the intercept information as its first element.
ModelPredictors = 5x1 cell
{'CustAge' }
{'EmpStatus' }
{'CustIncome'}
{'TmWBank' }
{'AMBalance' }
ModelCoefficients = 6×1
0.7026
0.5726
0.8836
0.7040
1.1000
1.0313
Verify that the desired model predictors are part of the scorecard predictors by displaying the scorecard points.
pi=30×3 table
Predictors Bin Points
______________ _________________ _________
{'CustAge' } {'[-Inf,33)' } -0.10354
{'CustAge' } {'[33,37)' } -0.086059
{'CustAge' } {'[37,40)' } -0.010713
{'CustAge' } {'[40,46)' } 0.089757
{'CustAge' } {'[46,48)' } 0.24794
{'CustAge' } {'[48,58)' } 0.26294
{'CustAge' } {'[58,Inf]' } 0.49697
{'CustAge' } {'<missing>' } NaN
{'EmpStatus' } {'Unknown' } -0.035716
{'EmpStatus' } {'Employed' } 0.35417
{'EmpStatus' } {'<missing>' } NaN
{'CustIncome'} {'[-Inf,29000)' } -0.41884
{'CustIncome'} {'[29000,33000)'} -0.065161
{'CustIncome'} {'[33000,35000)'} 0.092353
{'CustIncome'} {'[35000,40000)'} 0.12173
{'CustIncome'} {'[40000,42000)'} 0.13259
⋮