find similar numbers within a matrix

32 views (last 30 days)
Hi. I need to identify the position of a triplet of numbers (P) inside an array (M).
P = [-3.4970900e+01 -2.0289890e+02 -1.7948105e+02];
I am using the 'find' function but the numbers in P is very close to the respective numbers in M.
This does not determine the line of interest for me.
[r,c] = find(M(:,1) == P(1,1) & M(:,2) == P(1,2) & M(:,3) == P(1,3));
That is:
  • P(1,1) is equal to or similar to M(?,1);
  • P(1,2) is equal to or similar to M(?,2);
  • P(1,3) is equal to or similar to M(?,3);
Instead, I determine the line of interest in this way:
[r,c] = find(M(:,1) == P(1,1));
p.s. The variation is only present after the decimal point.
Is there a way to identify within M the (very similar) values present in P? For example by considering a variation of +-0.01 in the numbers in M?

Accepted Answer

Dyuman Joshi
Dyuman Joshi on 11 Jan 2024
Edited: Dyuman Joshi on 11 Jan 2024
isequal or eq, == are not the best choices when comparing floating point numbers.
arr = load('M.mat')
arr = struct with fields:
M: [61×3 double]
M = arr.M;
P = [-3.4970900e+01 -2.0289890e+02 -1.7948105e+02];
tol = 0.01;
%Rows that satisfy the condition for all column elements
idx = all(abs(M-P)<tol, 2)
idx = 61×1 logical array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
%Get the output using the row indices
out = M(idx, :)
out = 1×3
-34.9709 -202.8989 -179.4810

More Answers (4)

John D'Errico
John D'Errico on 11 Jan 2024
Edited: John D'Errico on 11 Jan 2024
Lots of good answers posted. I did not see this one.
load M.mat
size(M)
ans = 1×2
61 3
The array M has 61 rows.
P = [-3.4970900e+01 -2.0289890e+02 -1.7948105e+02];
And we want to find an element in M that is close to P.
help knnsearch
KNNSEARCH Find K nearest neighbors. IDX = KNNSEARCH(X,Y) finds the nearest neighbor in X for each point in Y. X is an MX-by-N matrix and Y is an MY-by-N matrix. Rows of X and Y correspond to observations and columns correspond to variables. IDX is a column vector with MY rows. Each row in IDX contains the index of the nearest neighbor in X for the corresponding row in Y. [IDX, D] = KNNSEARCH(X,Y) returns a MY-by-1 vector D containing the distances between each row of Y and its closest point in X. [IDX, D]= KNNSEARCH(X,Y,'NAME1',VALUE1,...,'NAMEN',VALUEN) specifies optional argument name/value pairs: Name Value 'K' A positive integer, K, specifying the number of nearest neighbors in X to find for each point in Y. Default is 1. IDX and D are MY-by-K matrices. D sorts the distances in each row in ascending order. Each row in IDX contains the indices of K closest neighbors in X corresponding to the K smallest distances in D. 'NSMethod' Nearest neighbors search method. Value is either: 'kdtree' - Creates and uses a kd-tree to find nearest neighbors. 'kdtree' is only valid when the distance metric is one of the following metrics: - 'euclidean' - 'cityblock' - 'minkowski' - 'chebychev' 'exhaustive' - Uses the exhaustive search algorithm. The distance values from all the points in X to each point in Y are computed to find nearest neighbors. Default is 'kdtree' when the number of columns of X is not greater than 10, X is not sparse, and the distance metric is one of the above 4 metrics; otherwise, default is 'exhaustive'. 'IncludeTies' A logical value indicating whether KNNSEARCH will include all the neighbors whose distance values are equal to the Kth smallest distance. Default is false. If the value is true, KNNSEARCH includes all these neighbors. In this case, IDX and D are MY-by-1 cell arrays. Each row in IDX and D contains a vector with at least K numeric numbers. D sorts the distances in each vector in ascending order. Each row in IDX contains the indices of the closest neighbors corresponding to these smallest distances in D. 'Distance' A string or a function handle specifying the distance metric. The value can be one of the following: 'euclidean' - Euclidean distance (default). 'seuclidean' - Standardized Euclidean distance. Each coordinate difference between X and a query point is scaled by dividing by a scale value S. The default value of S is the standard deviation computed from X, S=NANSTD(X). To specify another value for S, use the 'Scale' argument. 'fasteuclidean' - Euclidean distance computed by using an alternative algorithm that saves time. This faster algorithm can, in some cases, reduce accuracy. 'fastseuclidean' - Standardized Euclidean distance computed by using an alternative algorithm that saves time. This faster algorithm can, in some cases, reduce accuracy. 'cityblock' - City Block distance. 'chebychev' - Chebychev distance (maximum coordinate difference). 'minkowski' - Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'P' argument. 'mahalanobis' - Mahalanobis distance, computed using a positive definite covariance matrix C. The default value of C is the sample covariance matrix of X, as computed by NANCOV(X). To specify another value for C, use the 'Cov' argument. 'cosine' - One minus the cosine of the included angle between observations (treated as vectors). 'correlation' - One minus the sample linear correlation between observations (treated as sequences of values). 'spearman' - One minus the sample Spearman's rank correlation between observations (treated as sequences of values). 'hamming' - Hamming distance, percentage of coordinates that differ. 'jaccard' - One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ. function - A distance function specified using @ (for example @DISTFUN). A distance function must be of the form function D2 = DISTFUN(ZI, ZJ), taking as arguments a 1-by-N vector ZI containing a single row of X or Y, an M2-by-N matrix ZJ containing multiple rows of X or Y, and returning an M2-by-1 vector of distances D2, whose Jth element is the distance between the observations ZI and ZJ(J,:). 'P' A positive scalar indicating the exponent of Minkowski distance. This argument is only valid when 'Distance' is 'minkowski'. Default is 2. 'Cov' A positive definite matrix indicating the covariance matrix when computing the Mahalanobis distance. This argument is only valid when 'Distance' is 'mahalanobis'. Default is NANCOV(X). 'Scale' A vector S containing non-negative values, with length equal to the number of columns in X. Each coordinate difference between X and a query point is scaled by the corresponding element of S. This argument is only valid when 'Distance' is 'seuclidean'. Default is NANSTD(X). 'BucketSize' The maximum number of data points in the leaf node of the kd-tree (default is 50). This argument is only meaningful when kd-tree is used for finding nearest neighbors. 'SortIndices' A flag to indicate if output distances and the corresponding indices should be sorted in the order of distances ranging from the smallest to the largest distance. Default is true. 'CacheSize' A positive scalar or 'maximal'. The default is 1e3. This argument is only meaningful when the alternative algorithm of computing Euclidean distance is used which requires an intermediate matrix (when 'NSMethod' is 'exhaustive', and 'Distance' is one of {'fasteuclidean','fastseuclidean'}). If numeric, this argument specifies the cache size in megabytes (MB) to allocate for an intermediate matrix. If 'maximal', knnsearch attempts to allocate enough memory for an entire intermediate matrix whose size is MX-by-MY (MX is the number of rows of the input data X, and MY is the number of rows of the input data Y). 'CacheSize' does not have to be large enough for an entire intermediate matrix, but must be at least large enough to hold an MX-by-1 vector. Otherwise, the regular algorithm of computing Euclidean distance will be used instead. If the specified cache size exceeds the available memory, MATLAB issues an out-of-memory error. Example: % Find 2 nearest neighbors in X and the corresponding values to each % point in Y using the distance metric 'cityblock' X = randn(100,5); Y = randn(25, 5); [idx, dist] = knnsearch(X,Y,'dist','cityblock','k',2); See also CREATENS, ExhaustiveSearcher, KDTreeSearcher, RANGESEARCH. Documentation for knnsearch doc knnsearch Other uses of knnsearch ExhaustiveSearcher/knnsearch tall/knnsearch gpuArray/knnsearch textanalytics/knnsearch KDTreeSearcher/knnsearch
[idx,D] = knnsearch(M,P)
idx = 16
D = 2.8422e-14
It tells us that row 16 of M is the closest one to P, and the distance between the two vectors is not exactly zero, but is very close.
format long g
M(idx,:)
ans = 1×3
-34.9709 -202.8989 -179.48105
P - M(idx,:)
ans = 1×3
1.0e+00 * 0 0 -2.8421709430404e-14
So it misses only in the third element. If the distance is too arge to be acceptable, then D will tell you that. For example...
[idx,D] = knnsearch(M,[-100 -200 -300])
idx =
1
D =
131.959875743576
So that was a miss by a mile.
M(idx,:)
ans = 1×3
-28.1429 -199.4923 -189.3216

VINAYAK LUHA
VINAYAK LUHA on 11 Jan 2024
Hi Alberto,
Try this
[r, ~] = find(abs(M(:,1) - P(1,1)) <= 0.01 & abs(M(:,2) - P(1,2)) <= 0.01 & abs(M(:,3) - P(1,3)) <= 0.01);

Hassaan
Hassaan on 11 Jan 2024
% Define the matrix M and the vector P
loadedArray = load('M.mat');
M = loadedArray.M;
P = [-3.49799e+01, -2.02899e+02, -1.794815e+02]; % example vector P
% Define the tolerance for similarity
tolerance = 0.01; % For example, 0.01 means we're allowing a difference of up to ±0.01
% Preallocate a logical array for row matches
row_matches = true(size(M, 1), 1);
% Loop through each element in P and update the row_matches array
for i = 1:length(P)
row_matches = row_matches & (abs(M(:, i) - P(i)) < tolerance);
end
% Find the row indices where all elements match P within the tolerance
matching_rows = find(row_matches);
% Display the matching row indices
disp('Matching row indices:');
disp(matching_rows);
---------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
Feel free to contact me.

Steven Lord
Steven Lord on 11 Jan 2024
If you want to find a row of numbers that's close to (but perhaps not exactly equal to, due to roundoff error) another row of numbers, I would use the ismembertol function with the 'ByRows' option.
  1 Comment
Dyuman Joshi
Dyuman Joshi on 11 Jan 2024
Edited: Dyuman Joshi on 16 Feb 2024
@Steven Lord, I don't know the inner workings of ismembertol so I'll just ask you instead -
How similar is it to the code I've written? And if there is a significant difference, which code is better?
On the surface, it looks very similar -
arr = load('M.mat');
M = arr.M;
P = [-3.4970900e+01 -2.0289890e+02 -1.7948105e+02];
tol = 0.01;
idx = all(abs(M-P)<tol, 2)
idx = 61×1 logical array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
out = M(idx, :)
out = 1×3
-34.9709 -202.8989 -179.4810
IDX = ismembertol(M, P, tol, 'ByRows', 1)
IDX = 61×1 logical array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
OUT = M(IDX,:)
OUT = 1×3
-34.9709 -202.8989 -179.4810

Sign in to comment.

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!