Main Content

Test Metrics in Modelscape

This example shows how to implement test metrics in MATLAB® using Modelscape™ software.

For information about test metrics from the model development and validation point of view, see Credit Scorecard Validation Metrics and Fairness Metrics in Modelscape, respectively.

Write Test Metrics

The basic building block of Modelscape metrics framework is the class. This class defines these properties:

  • Name — Name for the test metric, specified as a string scalar.

  • ShortName — Concise name for accessing metrics in MetricsHandler objects, specified as a valid MATLAB property name.

  • Value — Values that the metric carries, specified as a numeric scalar or row vector.

  • Keys — Value parameterization keys, specified as an an n-by-m string array. m is the length of Value. The keys default to an empty string.

  • KeyNames — Key names, specified as a length-m string vector, where m is the length of Value. The default value is "Key".

  • Diagnostics — Diagnostics related to metric calculation, specified as a structure.

Any subclass of TestMetric must implement a constructor and a compute method to fill in these values.

For example, the Modelscape statistical parity difference (SPD) metric for bias detection has Name "Statistical Parity Difference" and a ShortName "StatisticalParityDifference". This table shows the arrangement of Keys and KeyNames.

The KeyNames property comprises the values "SensitiveAttribute" and "Group". The Keys property comprises the two columns with attribute-group combinations. The ShortName property value appears as the third header. The third column of the table carries the Value property value of the metric.

The base class has these methods, which you can override:

  • ComparisonValue(this) — Change the value against which the software compares the threshold. For example, in statistical hypothesis testing, this method returns the p-value of the computed statistic.

  • formatResult(this) — Table for the SPD metric.

  • project(this) — Restriction of a non-scalar metric to a subset of keys. Extend the default implementation in a subclass to cover any diagnostic or auxiliary data carried that the subclass objects carry.

Write Metrics With Visualizations

To write test metrics equipped with visualizations, define metrics that inherit from This class adds an additional requirement to the TestMetric base class to implement a visualization method with the signature fig = visualize(this, options). Use options to define name-value arguments for the metric. For example, use a sensitive attribute with the StatisticalParityDifference metric for visualization.

spdFig = visualize(spdMetric,SensitiveAttribute="ResStatus");

Write Metrics Projecting onto Selected Keys

The visualization shows the SPD metrics for the ResStatus attribute only. This plot uses the project method of the TestMetric class which uses selected keys of a metric. For a metric with N key names, project accepts an array of up to N strings as the Keys property value. The output restricts the metric to keys for which the kth element matches the kth element of the array.

spdResStatus = project(spdMetric,Keys="ResStatus")

When you specify both keys, the result is a scalar metric:

spdTenant = project(spdMetric,Keys=["ResStatus","Tenant"])

The base class implementation of project does not handle diagnostics or other auxiliary data that the subclass carries. If necessary, implement diagnostics and auxiliary data handling in the subclass using the secondary keySelection output in project.

Write Summarizable Metrics

Summary metrics reveal a different aspect of nonscalar metrics. In the case of the SPD metric, across all the attribute-group pairs, the summary SPD value is the value with the largest deviation from the completely unbiased value of zero.

spdSummary = summary(spdMetric)

Summarize a TestMetric class by inheriting from the class and implementing the abstract summary method. This method returns a metric of the same type with a singleton Value property. The meaning of the summary value depends on the metric, so this method has no default implementation. However, you can use the protected summaryCore method in TestMetricWithSummaryValue.

Write Test Thresholds

Test metrics are often compared against thresholds to qualitatively assess the inputs. For example, for certain models a model validator can require an area under the ROC curve (AUROC) of at least 0.8 for an acceptable model. Values under 0.7 are red flags, and values between 0.7 and 0.8 require a closer look.

Use the class to implement these thresholds. Encode the thresholds and classifications into a TestThresholds object.

aurocThresholds =[0.7, 0.8], ["Fail", "Undecided", "Pass"]);

These thresholds and labels govern the output of the status method of TestThresholds. For example, call status with an AUROC threshold of 0.72.

status(aurocThresholds, 0.72)

returns the following. Comment indicates the interval to which the given input belongs.

Customize Thresholds

Implement thresholding regimes or different diagnostics as subclasses of Implement the status method of the class to populate the Comment and Diagnostics properties according to your needs.

Write Statistical Hypothesis Tests

In some cases, notably in statistical hypothesis testing, the relevant quantity to compare against test thresholds is the associated p-value under a relevant null hypothesis. In these cases, use the test metric class to override the ComparisonValue method and return the p-value instead of the Value property of the metric. For an example, see the Modelscape implementation of the Augmented Dickey-Fuller test.


Set the thresholds against which to compare the p-values. This TestThresholds object returns status as "Reject" for p-values less than 0.05 and "Accept" otherwise.

adfThreshold =