Interpret Regression Models Trained in Regression Learner App
Understanding how some machine learning models make predictions can be difficult. Interpretability tools help reveal how predictors contribute (or do not contribute) to predictions. For trained regression models, partial dependence plots (PDPs) show the relationship between a predictor and the predicted response. The partial dependence on the selected predictor is defined by the averaged prediction obtained by marginalizing out the effect of the other predictors.
This example shows how to train regression models in the Regression Learner app and interpret the best-performing models using PDPs. You can use PDP results to confirm that models use features as expected, or to remove unhelpful features from model training.
In the MATLAB® Command Window, load the
carbigdata set, which contains measurements of cars made in the 1970s and early 1980s.
Categorize the cars based on whether they were made in the USA.
Origin = categorical(cellstr(Origin)); Origin = mergecats(Origin,["France","Japan","Germany", ... "Sweden","Italy","England"],"NotUSA");
Create a table containing the predictor variables
Displacement, and so on, as well as the response variable
cars = table(Acceleration,Displacement,Horsepower, ... Model_Year,Origin,Weight,MPG);
Remove rows of
carswhere the table has missing values.
cars = rmmissing(cars);
Open Regression Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.
On the Regression Learner tab, in the File section, click New Session and select From Workspace.
In the New Session from Workspace dialog box, select the
carstable from the Data Set Variable list. The app selects the response and predictor variables. The default response variable is
MPG. The default validation option is 5-fold cross-validation, to protect against overfitting.
In the Test section, click the check box to set aside a test data set. Specify
15percent of the imported data as a test set.
To accept the options and continue, click Start Session.
Train all preset models. On the Regression Learner tab, in the Models section, click the arrow to open the gallery. In the Get Started group, click All. In the Train section, click Train All and select Train All. The app trains one of each preset model type, along with the default fine tree model, and displays the models in the Models pane.
If you have Parallel Computing Toolbox™, then the app has the Use Parallel button toggled on by default. After you click Train All and select Train All or Train Selected, the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can continue to interact with the app while models train in parallel.
If you do not have Parallel Computing Toolbox, then the app has the Use Background Training check box in the Train All menu selected by default. After you select an option to train models, the app opens a background pool. After the pool opens, you can continue to interact with the app while models train in the background.
Sort the trained models based on the validation root mean squared error (RMSE). In the Models pane, open the Sort by list and select
In the Models pane, click the star icon next to the model with the lowest validation RMSE values. The app highlights the lowest validation RMSE by outlining it in a box. In this example, the trained Exponential GPR model has the lowest validation RMSE.
Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.
For the starred model, you can check the model performance by using various plots (for example, response, Predicted vs. Actual, and residuals plots). In the Models pane, select the model. On the Regression Learner tab, in the Plot and Interpret section, click the arrow to open the gallery. Then, click any of the buttons in the Validation Results group to open the corresponding plot.
After opening multiple plots, you can change the layout of the plots by using the Document Actions arrow located to the far right of the model plot tabs. For example, click the arrow, select the
Sub-Tileoption, and specify a layout. For more information on how to use and display validation plots, see Visualize and Assess Model Performance in Regression Learner.
To return to the original layout, you can click the Layout button in the Plot and Interpret section and select Single model (Default).
For the starred model, see how the model features relate to the model predictions by using partial dependence plots (PDPs). On the Regression Learner tab, in the Plot and Interpret section, click the arrow to open the gallery. In the Interpretation Results section, click Partial Dependence. The PDP allows you to visualize the marginal effect of each predictor on the predicted response of the trained model. To compute the partial dependence values, the app uses the model trained on the 85% of observations in
carsnot reserved for testing.
Examine the relationship between the model predictors and model predictions on the training data (that is, 85% of the observations in
cars). Under Data, select Training set.
Look for features that seem to contribute to model predictions. For example, under Feature, select
The blue plotted line represents the averaged partial relationship between the
Weightfeature and the predicted
MPGresponse. The tick marks along the x-axis indicate the unique
Weightvalues in the training data set. According to this model (Model 2.18), the MPG (miles per gallon) value tends to decrease as the car weight increases.
In general, consider the distribution of values when interpreting partial dependence plots. Results tend to be more reliable in intervals where you have sufficient observations whose predictor values are spread evenly.
You can tune your best-performing model by removing predictors that do not seem to contribute to model predictions. A PDP where the predicted response remains constant across all predictor values can indicate a poor predictor.
In this example, none of the predictors have a PDP where the plotted line is flat. However, two predictors,
Horsepower, show a similar relationship to the model predicted response as the
Under Feature, first select
Displacementand then select
Horsepowerpredictors from the best-performing model. Create a copy of the starred model. After selecting the model in the Models pane, click the Duplicate button in the Models section of the Regression Learner tab.
Then, in the model Summary tab, expand the Feature Selection section, and clear the Select check boxes for the Displacement and Horsepower features.
Train the new model. In the Train section of the Regression Learner tab, click Train All and select Train Selected.
In the Models pane, click the star icon next to the new model. To group the starred models together, open the Sort by list and select
The model trained with fewer features, Model 3, performs slightly worse than the model trained with all features, Model 2.18.
For each starred model, compute the RMSE of the model on the test set. First, select the model in the Models pane. Then, on the Regression Learner tab, in the Test section, click Test All and select Test Selected.
Compare the validation and test RMSE results for the starred models by using a table. On the Regression Learner tab, in the Models section, click Results Table. In the Results Table tab, click the "Select columns to display" button at the top right of the table.
In the Select Columns to Display dialog box, check the Select box for the Preset column, and clear the Select check boxes for the MSE (Validation), RSquared (Validation), MAE (Validation), MSE (Test), RSquared (Test), and MAE (Test) columns. Click OK.
In this example, both of the starred models perform well on the test set.
For the best-performing model, look at the PDPs on the test data set. Ensure that the partial relationships meet expectations.
For this example, because the model trained on fewer features still performs well on the test set, select this model (Model 3). Compare the training set and test set PDPs for the
Accelerationfeature and the Model 3 predicted response. In the Partial Dependence Plot tab, under Feature, select
Acceleration. Under Data, select Training set and then select Test set to see each plot.
The PDPs have similar trends for the training and test data sets. However, the predicted response values vary slightly between the plots. This discrepancy might be due to a difference in the distribution of training set observations and test set observations.
If you are satisfied with the best-performing model, you can export the trained model to the workspace. For more information, see Export Model to Workspace. You can also export any of the partial dependence plots you create in Regression Learner. For more information, see Export Plots in Regression Learner App.