Extracting data from classifier objects

I am using MATLAB's kNN classifier and would like to extract a list of distances from each grid-point to its k Nearest Neighbors (or something along the lines of this). I have looked through the properties/methods of the resulting classifier object that comes from using fitcknn() but cannot find this data. I do not know if this data is stored; I would like to use this data elsewhere and if fitcknn() already calculates these distances it would be more efficient to extract it.
Thank you!

 Accepted Answer

Yes. I've done that. What you need to do is to put a list of coordinates into vectors and then use pdist2(), if you have the statistics and machine learning toolbox. For example here is code to find the distance of every blob centroid to every other blob centroid, then to sort them by distance. After that, you can take columns to find the mean or distibution of distances to first neighbor, second neighbor, third neighbor, etc. Of course by taking the mean of different columns you can do things like compute the mean distance of the first 3 neighbors, etc.
% Get the distances from every point to every other point.
% Requires pdist2() of the Statistics and Machine Learning Toolbox.
xy = [xCentroids, yCentroids];
distances = pdist2(xy, xy);
% Display it.
imshow(distances, []);
axis on;
title('Quantized Distances', 'FontSize', fontSize);
ylabel('From this particle', 'FontSize', fontSize);
xlabel('To this particle', 'FontSize', fontSize);
drawnow;
% Each particle (point) is a row in distances.
% Sort them from closest to farthest away.
for row = 1 : size(distances, 1);
distances(row, :) = sort(distances(row, :), 'ascend');
end
% Display it.
imshow(distances, []);
axis on;
title('Sorted Quantized Distances', 'FontSize', fontSize);
ylabel('From this particle', 'FontSize', fontSize);
xlabel('To this particle', 'FontSize', fontSize);
drawnow;
% Let's sort again, just for fun, by the particles that are cloest to other ones.
distances = sortrows(distances, size(distances, 2));
% Display it.
imshow(distances, []);
axis on;
numberOfPoints = size(distances, 1);
caption = sprintf('Sorted Quantized Distances of %d Particles', numberOfPoints);
title(caption, 'FontSize', fontSize);
ylabel('From this particle', 'FontSize', fontSize);
xlabel('To this particle', 'FontSize', fontSize);
drawnow;
% Let's look at this histogram of distances for the n'th closest other particle.
% We can do this by taking the n'th column of distances and putting it into histogram().
% Remember column 1 is always 0 because it's the distance of the particle to itself.
% So the nearest neighbor distances are in column 2, and the 2nd closest neighbor distances are in column 3, etc.
n = 1; % 1 = first closest (which will be in column2). 2 = 2nd closest (which will be in column 3). etc.
nthNeighborDistances = distances(:, n+1);
% Display it.
hold off;
histogram(nthNeighborDistances, 'Normalization', 'pdf');
axis on;
grid on;
suffix = 'th';
if n == 1
suffix = 'st';
elseif n == 2
suffix = 'nd';
elseif n == 3
suffix = 'rd';
end
caption = sprintf('Histogram of %d%s closest quantized distances', n, suffix);
title(caption, 'FontSize', fontSize);
ylabel('Count (# of particles)', 'FontSize', fontSize);
xlabel('Distance', 'FontSize', fontSize);
drawnow;
% Fit a distribution
pd = fitdist(nthNeighborDistances, 'LogNormal')
xl = xlim;
x_values = linspace(xl(1), xl(2), 100);
y = pdf(pd, x_values);
hold on;
plot(x_values, y, 'LineWidth',2);
legend('Actual', 'Log-Normal Fit', 'Location', 'east');

3 Comments

Ah, pdist2() is precisely what I am using now but I was wondering if there was a way to extract this data from the kNN classifier as I figured it might be doing something similar in its process anyways. I appreciate the quick and detailed reply!
knnsearch() gives you the distances from some cluster centroid that you must specify. What my code did was to find distances between all coordinates and all other coordinates. They're different. I don't know how you define "grid point" but if you put the grid points as the first arg to pdist2() and your other points as the second arg, it would do what you said. I'm not really sure what you want.
I was hoping to find a way to access the results of something such as knnsearch() just from the kNN model since I am assuming it already has calculated that data. But this has still been very helpful, thank you!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!