Find not fast enough - is there a speedier solution for large matrices?

1 view (last 30 days)
My code works but given the size of my data matrices is too slow despite access to a pretty heavy duty machine. I'm sure the matlab community has a good and quick fix to my woes. It takes about 0.3s per iteration at the moment so we are talking days/weeks of computing to run my code. I think my main problem lies with use of the function 'find', and I need a more elegant solution perhaps vectorizing or using the parallel computing tool box (available but new to me).
Thanks in advance!
The problem:
I have 36 sampling dates. A large matrix (1942242*2) of xy sample coordinates('locmat'), my code pasted below then reads in a three column matrix for each sample date in turn. These matrices have similar but different lengths to 'locmat' that consists of xy coordinate data (read in to 'xydat') and a data measurement at that xy location (read in to 'fetcol'). All coordinatyes in 'xydat' have an exact match in 'locmat', but are indexed differently depending on the sample date. Therefore not all xy coordinates in 'locmat' are to be found in 'xydat'. I am trying to index the data in the sample files to locmat based on the xy locations - producing a single matrix (1942242*36) called 'fetmat'. Any coordinate with no data on a given date is stored as -999.
Code:
nosamp = 36;
fetpath = 'C:\Data\dat_text\';
locfnam = 'C:\Data\srchmat\locmat.csv';
locmat = csvread(locfnam);
fn = dir(fetpath);
ns = {fn.name};
ns = sort(ns);
ns = char(ns(3:end));
fetmat = zeros(length(locmat),nosamp);
for q = 1:size(ns,1);
fnam = ns(q,:);
filename = fullfile(fetpath, fnam);
fetdat = csvread(filename, 1,2);
xydat = fetdat(:,2:3);
fetcol = fetdat(:,1);
clear fetdat;
for s = 1:length(locmat);
xysrch = locmat(s,:);
xyrep = repmat(xysrch,length(locmat),1);
ids = find(locmat == xyrep) ;
if isempty(ids)
fetmat(s,q) = -999;
else
fetmat(s,q) = fetcol(ids(1));
end
end
end
  1 Comment
Roger Stafford
Roger Stafford on 20 Dec 2012
I don't entirely understand your code. In spite of the statement about 'xydat' having a match in 'locmat' there is no reference to 'xydat' within your for-loops. Instead you seem to be searching for duplications in 'locmat' itself. Perhaps I haven't understood your description correctly.
However I can make a general comment concerning the use of the 'find' function. When you have a long list to be repeatedly searched for specific items it is best not to use 'find' if you can possibly avoid it. If you use a sorted list instead, there are some much faster methods of finding a match. With your 'locmat' at a length of 1,942,242 rows such a search can take only log2(1,942,242) = 21 comparisons rather than 1,942,242 of them using a binary search algorithm. I am fairly sure the matlab function 'ismember' uses just such a method in finding elements of one set which lie in another set. Of course you are apparently trying to match a pair of values, x and y, but I am sure there is a way of making use of the binary search technique which would apply here.
You don't want to be scanning 'locmat' from one end to the other repeatedly 1,942,242 x 36 times. That's over 100 trillion comparisons!
Roger Stafford

Sign in to comment.

Accepted Answer

Matt J
Matt J on 20 Dec 2012
Edited: Matt J on 20 Dec 2012
for q = 1:size(ns,1);
fnam = ns(q,:);
filename = fullfile(fetpath, fnam);
fetdat = csvread(filename, 1,2);
xydat = fetdat(:,2:3);
fetcol = fetdat(:,1);
clear fetdat;
[~,fetmat(:,q)]=ismember(locmat,xydat,'rows');
end
fetmat(~fetmat)=-999;

More Answers (0)

Categories

Find more on Structures in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!