Finding a row with multiple "closest values" (values with the least combined error)

Greetings Matlabians. I've ran into a bit of a puzzle. I have a 2D matrix that contains rows with columns ranging from 0.00 - 0.99. For example...
ranges = [0.011, 0.052, 0.076,... 0.987; 0.013, 0.018, 0.050, 0.071,... 0.999;...]
All rows contain slightly different values in this range, and some rows have a more narrow range. What I want to do is to extract the index of the row that contains the values that are, on average, closest to 0.05, 0.20, and 0.80. The optimum row will deviate from these three values the least. To measure deviation, I do not want to simply subtract the difference between the actual value and the desired value (0.51-0.50 = 0.01). Rather, it would be better to weight each difference by finding a percent difference (0.51-0.50 = 0.01/0.05). The row that has the smallest sum of all three percent differences wins. Also, note: the three values which are closest to my desired values (let's call them my approximate values) are found in different columns of each row. Finally, if it so happens that two values have the same magnitude of deviation from one of my desired values (i.e. there are two potential approximate values), I want to favor the smaller approximate value.

6 Comments

And your question is.....? By the way, you probably want to use abs() rather than the difference so you can detect differences no matter whether they're above or below your target values.
Let's clarify "extract the index of the row that contains values closest to 0.05, 0.20, and 0.80" You say "index", meaning one, but you give three values, meaning that you would have 3 indexes - one for each target value. So which is it? Do you want the 1 index - whichever is closest to any target value? Or do you want the index that is closest to each target value?
Are ‘0.05, 0.20, and 0.80’ always in the same columns, or do we have to go looking for them? Are there more than one possible value in each vector? Are all of them always in every row vector?
For a distance metric, I would use the Euclidean metric (here, the sum of the squares of the differences). It’s easy, mathematically robust, and gives you an unbiased distance measure.
@Image Analyst I should clarify, "The row that has the smallest sum of all three percent differences wins." I want a single row index--one that references the row which has the smallest summed error. That is, each row will contain three values which are close to my three desired values, but since all of them deviate slightly, I want to find the row that matches these three values the best, on average. This process may involve first pulling the indices of the three closest values, but my end goal is to index a single row. I am currently struggling with figuring out how to code for this. I've tried filling three matrices with each of my desired values, and subtracting those from my matrix containing rows with values 0.00 to 0.99. This allows me to find out which row has a value closest to one of the three desired values, but since I want the row that does the best on average (between all three desired values), this won't due. @Start Strider The three values which are closest to 0.05, 0.20, and 0.80 (let's call them my approximate values) are not always in the same columns. Each vector (row) will typically have different approximate values. All vectors, by definition, contain one approximate value for each of my desired values (just whatever values are closest to each desired value). My specific question is how can I find the row with the least deviation from these three values. Hope these clarifications help.
Is it certain that there will be three values that are close to your desired values. Is it impossible in the situation for there there to 0.125 or 0.60 as those are half-way between two target values and so equally close to both, reducing the number of matches?
@Walter Roberson I hadn't thought about that. Good point. It is possible. If such a case were to arise, I would favor the smaller approximate value.
So is only row 1 to be compared only to 0.05, and only row2 to be compared to only 0.20, and only row 3 to be compared to only 0.80 like this?
row1Diffs = abs(ranges(1,:) - 0.05);
row2Diffs = abs(ranges(2,:) - 0.20);
row3Diffs = abs(ranges(3,:) - 0.80);
Or is each row (all 3 rows) to be compared to each number (all 3 numbers), like this?
% Compare row 1 to all of the numbers.
row1Diffs(1) = abs(ranges(1,:) - 0.05);
row1Diffs(2) = abs(ranges(1,:) - 0.20);
row1Diffs(3) = abs(ranges(1,:) - 0.80);
% Compare row 2 to all of the numbers.
row2Diffs(1) = abs(ranges(2,:) - 0.05);
row2Diffs(2) = abs(ranges(2,:) - 0.20);
row2Diffs(3) = abs(ranges(2,:) - 0.80);
% Compare row 3 to all of the numbers.
row3Diffs(1) = abs(ranges(3,:) - 0.05);
row3Diffs(2) = abs(ranges(3,:) - 0.20);
row3Diffs(3) = abs(ranges(3,:) - 0.80);
What set of comparisons do you want to do?

Sign in to comment.

Answers (0)

Asked:

on 1 Nov 2015

Commented:

on 1 Nov 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!