quantifying the similarity between data sets

Question

Daniel Mella on 14 Jul 2017

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/348782-quantifying-the-similarity-between-data-sets

Commented: Kafayat Olayinka on 29 May 2020

data_sets.mat

Hi, I implemented an algorithm that tracks a particle in space and time. I applied it to two experiments and I got two data sets A=[X,Y] and B=[X,Y] of 8399 coordinate points each. The experiments were exactly the same. I ploted A and B and there are clear differences between them but overall, the points are within similar limits. Of course, they are never going to be exactly the same due to errors in the tracking algorithm. Still, given a certain criteria, Is there any method that quantify the difference between data sets in which I can say "ok, they are close enough" or "no, they are too much difference between them"?

Ps. I attached the data set I am currently analysing. Thank you

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Image Analyst on 14 Jul 2017

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/348782-quantifying-the-similarity-between-data-sets#answer_274197

See https://www.mathworks.com/products/computer-vision/features.html#3-d-point-cloud-processing

2 Comments
Show NoneHide None

Daniel Mella on 16 Jul 2017

Thanks for your answer.

I tried it but it is not what I am looking for. I need a way to quantify how similar or different my plots are.

I have been thinking on applying FFT to A and B using the pwelch function and then calculate the cross correlation between spectras. I think that will give me the similarity in X and Y.

Image Analyst on 16 Jul 2017

Methods like sift and surf first identify a bunch of "salient points" and then they use point matching algorithms to find subsets of points that seem to align fairly well. If you don't like the ones in the Computer Vision System Toolbox, you can use some other one: https://www.google.com/#q=point+matching+algorithm

Or look into how "optical flow" (also in the CVSToolbox) works.

Sign in to comment.

Answer 2

Star Strider on 16 Jul 2017

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/348782-quantifying-the-similarity-between-data-sets#answer_274341

Open in MATLAB Online

I can’t find anything online that address your problem, and there may be no consensus. Some exploration of your data reveals that the x-coordinates in both are (essentially) identically-distributed, and the y-coordinates in both are (essentially) identically distributed. The x- and y-coordinates have different distributions, and none of them are normally distributed.

One approach therefore could be to do a Wilcoxon Rank Sum or Mann-Whitney U test separately on the x-coordinates of the two data sets and the y-coordinates of the two data sets. This tests the null hypothesis that the medians are the same, against the alternate hypothesis that they are different.

AB = load('data_sets.mat');
A = AB.A;
B = AB.B;
[p1,h1,stats1] = ranksum(A(:,1),B(:,1));
[p2,h2,stats2] = ranksum(A(:,2),B(:,2));

These results indicate that the medians are not different with respect to both the x- and y-coordinates.

To demonstrate that the distributions of the x- and y-coordinates are not different would require a different test, such as a chi-square goodness-of-fit test of one x-coordinate distribution against the other, and similarly for the y-coordinates. (Use histogram or histcounts to generate the distributions.) You would have to write that code yourself, and then use the appropriate chi squared distribution function to calculate the p-values based on your calculated chi-square statistics and degrees-of-freedom.

Since a definitive discussion on this does not seem to exist, or at least has evaded my search for it, this is the best I can come up with.

3 Comments
Show 1 older commentHide 1 older comment

Star Strider on 17 Jul 2017

Open in MATLAB Online

My pleasure,

I experimented with the chi-square idea in the interim:

Xedges = linspace(min([A(:,1);B(:,1)]),max([A(:,1);B(:,1)]), 20);
Yedges = linspace(min([A(:,2);B(:,2)]),max([A(:,2);B(:,2)]), 20);
[HXA,edgesx] = histcounts(A(:,1),Xedges);
[HXB,edgesx] = histcounts(B(:,1),Xedges);
[HYA,edgesy] = histcounts(A(:,2),Yedges);
[HYB,edgesy] = histcounts(B(:,2),Yedges);
FXA = HXA/sum(HXA)+sqrt(eps);
FXB = HXB/sum(HXB)+sqrt(eps);
FYA = HYA/sum(HYA)+sqrt(eps);
FYB = HXA/sum(HYB)+sqrt(eps);
QX = (FXA(:)-FXB(:)).^2./FXA(:);
Chi2_X = sum((FXA(:)-FXB(:)).^2./FXA(:));
Chi2_Y = sum((FYA(:)-FYB(:)).^2./FYA(:));
df = size(FXA(:),1)-1;
P1 = chi2cdf(Chi2_X, df);
P2 = chi2cdf(Chi2_Y, df);

I believe this is correct. I’ve not written code to calculate chi-square statistics in a while. Adding ‘sqrt(eps)’ prevents Inf and NaN values in the chi-square calculations, since some of the bins have zero values.

Unfortunately, the p-values are vanishingly small, meaning that the distributions are different (the probability of their being the same is essentially zero).

I would be hesitant to use pwelch on random spatial data. You might want to experiment with the fft2 function instead, and the image processing functions.

Yours appears to be a relatively new problem. I am not certain how to approach it, and the literature search I did turned up no relevant results.

Kafayat Olayinka on 29 May 2020

Can you show us how to plot this and what it'll look like? Thanks

Sign in to comment.

quantifying the similarity between data sets

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

2 Comments
Show NoneHide None

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Community Treasure Hunt

quantifying the similarity between data sets

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

2 Comments Show NoneHide None

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

3 Comments
Show 1 older commentHide 1 older comment