Evaluate quality of a classification feature based on distance matrix

LMarcel

27 Nov 2019

0 Answers

Updated 28 Nov 2019

7 Views (30 days)

Follow Question

Show older comments

0 votes

Hello,

I am currently trying to do feature extraction on measurement data which are preprocessed in the form of normalized 1D-histograms. As I don't know in advance where relevant features (they are expected to be somewhere within certain areas along the histogram indices) might be, I am "scanning" my data using custom distance metrics on data sets with known labels. Having applied the metric I get a square distance matrix with pairwise distances as shown below. The upper left and lower right quadrants always contain the pairwise distances within the two respective clusters, the others the distances between them. The Indices belonging to each cluster are always known.

In order to get a quick evaluation on relevant features (as there is a lot of data to scan), I thought of a measure that somehow reflects the discriminative power of the feature observed:

score=

Although this score value works quite well in most cases and is easy and fast to compute, I wanted to check whether there might be a better and more expressive way that is computationally effective, results in a single value and is less sensitive regarding outliers in the data.

Unfortunately the approaches I found use raw data (instead of distances, which are definitely the input here) and/ or require an interpretation of the result.

I hope y'all get what I am looking for and hope that my approach is not fundamentally stupid. If so, let me know:)

Thanks in advance!

5 Comments
Show 3 older comments Hide 3 older comments

LMarcel on 27 Nov 2019

sampledata.mat

Actually the Histograms are the features - to be more precise the distances computed between them. The histograms are spectrums of measured vibration data. Each bin represents a frequency band and the respective signal energy.

As a distinction between different "Conditions" is expected to lie within certain frequency intervals (consisting of multiple adjacent bins), the input data is cut and then the distances within the cut intervals are computed.

The output for each of these steps is a distance matrix as shown above.

is the pairwise distance (e.g. euclidean) between samples i and j, where each sample is represented by a normalized histogram
and are the mean values of all distances within the two groups that are compared. E.g. is the mean value of all distances from the upper left quadrant etc...
is the same for the lower left/upper right quadrant

To make it short, I don't have any clusters yet, but I know which data sets belong together. As the data within one Condition varies (influenced by certain measurement parameters) , there is no ONE model data set.

I tried to figure out whether PCA makes sense for this. But in this application similarity or dissimilarity is defined by the distance computed from a range of bins and not a combination of single features.

Attached is a sample file containing the histogram data for one sensor and two conditions.

When I normalize each vector from indices 66:75 and compute the pairwise euclidean distances, I get the matrix attached. It shows a good separation between Condition1 and 2, indicated by a score value of around 7.8 (from above equation). Still, there might be a better way...

LMarcel on 27 Nov 2019

Edited: LMarcel on 27 Nov 2019

distmatrix_euclidean_66_75.fig

I accidently uploaded the wrong distance matrix in the first place. Sorry for that.

Find attached the one matching the example data set and the following distance computation for the first and second "runs" for indices 66:75 and euclidean distance:

>> sample1=sampledata.DATA{1,1}(66:75)./sum(sampledata.DATA{1,1}(66:75))

>> sample2=sampledata.DATA{2,1}(66:75)./sum(sampledata.DATA{2,1}(66:75))

>> D_12=pdist2(sample1',sample2','euclidean')

LMarcel on 28 Nov 2019

@ Image Analyst: Was that the input you were asking for or did I get it wrong? And thanks for the hint on PNGs, good call!

Follow Question

Answers (0)

Products

MATLAB

Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Evaluate quality of a classification feature based on distance matrix

5 Comments
Show 3 older comments Hide 3 older comments

Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Evaluate quality of a classification feature based on distance matrix

5 Comments Show 3 older comments Hide 3 older comments

Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments