Main Content

Group Distances

Select group distance to determine the separation between data groups for each condition label pair. This separation metric — the KS statistic — indicates numerically how effective a feature is at differentiating between, say, faulty and healthy data.

Group distance is especially useful when you have more than two labels for a condition variable, as histograms become harder to interpret when data from multiple color groups combine.

Select the feature you want to examine from Show grouping for feature. The table shows the KS statistic for each label pairing. This statistic ranges from 0 to 1.

  • A value of 0 means that the data groups are completely mixed, and therefore that the condition value is completely ambiguous. The associated feature has no differentiation capability for this data

  • A value of 1 means that the data groups are well separated, and that the associated feature has complete differentiation capability for this data.

Additional Information

The KS statistic indicates how well separated the cumulative distribution functions of the distributions of the two states are, using the two-sample Kolmogorov-Smirnov test. For more information on this test, see kstest2.