Pairwise Haversine Distance Matrix...Speed Up?
Show older comments
Hello,
I have the following code which produces an N x N matrix of pairwise distances using the Haversine formula where the inputs are the latitude and longitude coordinates of a certain geography, e.g. counties. Here is the function:
function W = knn_weight_matrix(xc,yc,k)
%%PURPOSE: The function builds a k nearest neighbor N x N row-stochastic spatial weight matrix
% The function returns a sparse matrix W representing the spatial weight
% matrix
%%USAGE: W = knn_weight_matrix(xc,yc,k)
% where the following holds:
% xc = an (n x 1) vector of longitude coordinates (in degrees)
% yc = an (n x 1) vecotor of latitude coordinates (in degrees)
% k = the number of nearest neighbors
%%NOTES: The code uses the haversine distance formula to create an
% N x N matrix of pairwise diatances
%%Rearrange coordinates and pre-allocate matrices
coordinates = [yc xc];
n = length(xc);
tempw = zeros(n,n);
nnlist = zeros(n,k);
temp2 = zeros(n,n);
%%Create pairwise distance matrix using Haversine distance calculation
for i=1:n
for j=1:n
temp1 = haversine(coordinates(i,1),coordinates(i,2),coordinates(j,1),coordinates(j,2));
temp2(i,j) = temp1;
end
end
%%Find indices for the nearest neighbors
for i=1:n;
temp1 = temp2(i,:);
[~,xind] = sort(temp1);
nnlist(i,1:k) = xind(1,2:k+1);
end
%%Clear the distance matrix
clear temp2
%%Assign a value of 1 to nearest neighbors
for i=1:n
tempw(i,nnlist(i,:)) = 1;
end
%%Create row-stochastic and sparse spatial weight matrix
W = (tempw./k);
end
%%Haversine Function
% This function uses the Haversine formula to calculate the pairwise
% distances used in the knn_weight_matrix formula.
function [dist] = haversine(lat1,lon1,lat2,lon2)
dist = 2 * 6372.8 * asin(sqrt(sind((lat2-lat1)/2)^2 + cosd(lat1) * cosd(lat2) * sind((lon2 - lon1)/2)^2));
end
Is there a way to "bsxfun" or otherwise speed up the code? It runs pretty briskly with a small number of coordinates but it gets fairly slow when N>50,000 or greater.
Any help would be greatly appreciated and please use the code if you find it useful.
Answers (1)
Geoff Hayes
on 1 Feb 2015
Donald - before considering using something like bsxfun, why not try and optimize the code that you already have? For example, the following calculates the pairwise distances between two coordinates
%%Create pairwise distance matrix using Haversine distance calculation
for i=1:n
for j=1:n
temp1 = haversine(coordinates(i,1),coordinates(i,2),coordinates(j,1),coordinates(j,2));
temp2(i,j) = temp1;
end
end
It seems that you could reduce the above calculations in half because the haversine distance between coordinates i and j is the same as the distance between coordinates j and i. So you could do the following instead
temp2 = zeros(n,n);
for u=1:n
for v=r+1:n
temp1 = haversine(coordinates(u,1),coordinates(u,2),coordinates(v,1),coordinates(v,2));
temp2(u,v) = temp1;
temp2(v,u) = temp1;
end
end
Note that the above uses u and v instead of i and j as MATLAB uses the latter two as representations of the imaginary number.
You may also be able to simplify
%%Find indices for the nearest neighbors
for i=1:n;
temp1 = temp2(i,:);
[~,xind] = sort(temp1);
nnlist(i,1:k) = xind(1,2:k+1);
end
to something like
% sort all rows of temp2
[~,xind] = sort(temp2,2);
% extract first k columns of each (ignoring the first element which
% is the distance from a coordinate to itself)
nnlist = xind(:,2:k+1);
Start with the above and see if it helps improve performance.
1 Comment
Donald Lacombe
on 3 Feb 2015
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!