Code covered by the BSD License  

Highlights from
quickfindstr

Be the first to rate this file! 5 Downloads (last 30 days) File Size: 2.56 KB File ID: #40628

quickfindstr

by Manel Soria

 

05 Mar 2013

Finds the index of an element in a SORTED list of strings in O(log(N)) time

| Watch this File

File Information
Description

loc=quickfindstr(el,list)
 
Finds the index of an element el in a SORTED list of strings.
 
Uses strcmpc by S. Helsen 23-09-96, you can download it from:
  http://www.mathworks.es/matlabcentral/fileexchange/3987-compare-strings-c-convention/content/Strcmpc.m

It doesn't check if the list is sorted, because that would take O(N) time and this is meant to run in O(log(N)). If you give it an unsorted list, its behaviour is undefined
 
If you give it an element that's not in the sorted list, it returns -1
 
Example of usage:
 
  lis={'aa','zz','fdf','fdeeg','aaff'}
  lis=sort(lis); % must be sorted !
  quickfindstr('fefe',lis) % returns -1 (not found)
  quickfindstr('zz',lis) % returns 2
 
Time comparison for a long list (about 150000 elements)

  tic; strmatch(totr{end},totr,'exact'); toc -> 1.30562 seconds
  tic; quickfindstr(totr{end},totr); toc -> 0.002985 seconds (about 400
  times faster)
  
  by Manel Soria
  manel -dot- soria -at- gmail -dot- com
  LLOP - grup Obert de Programari Lliure
  ETSIAT - UPC

  Adapted fom quickfind by Peter O'Connor,
  http://www.mathworks.es/matlabcentral/fileexchange/30112-quickfind

  Peter
  oconnorp -at- ethz -dot- ch

 
Please let me know if this was useful !

Manel

Acknowledgements

Quickfind inspired this file.

Required Products MATLAB
MATLAB release MATLAB 7.13 (R2011b)
Tags for This File  
Everyone's Tags
data import, demo, statistics
Tags I've Applied
Add New Tags Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (5)
15 Apr 2013 Jan Simon

@Manel: Thanks for fixing the typo. I've meaured the timings on a Core2Duo/2009a/64/Win7 system.

strcmp(c{150000}, c) needs longer than for c{1}, because the contents of c{1} is shorter. There are only 9 strings in c, which have one character only, while for C{150000}='150000' there are a lot of strings with size 6, such that an explicit comparison is started. In addition, a comparison of a shorter string is faster than of a longer string, because less bytes have to be processed.

08 Mar 2013 Manel Soria

First, thanks for your comment, I see that strmatch isn't a good choice at all.
It seems there is a small mistake in your code, should it be c{k} = sprintf('%d', k); in the second line ?
My timings are different:
tic; for i=1:100:n; f= quickfindstr(c{i}, c); end; toc
% 4.697042
tic; for i=1:100:n; f= find(strcmp(c{i}, c)); end; toc
% 6.770262
tic; for i=1:100:n; f= strmatch(c{i}, c, 'exact'); end; toc
% 48.737699
What Matlab version are you using ? On what machine ? I'm running 2011b on a late 2009 Imac
To conclude, I tested the following
>> tic; for i=1:100; f= find(strcmp(c{1}, c)); end; toc
Elapsed time is 0.385195 seconds.
>> tic; for i=1:100; f= find(strcmp(c{150000}, c)); end; toc
Elapsed time is 0.469383 seconds.
Thus, strcmp isn't just a linear search algorithm as I expected. I wonder how is it working

07 Mar 2013 Jan Simon

Some tests under Matlab 2009a/64/Win7:

n = 150000; c = cell(1, n);
for k = 1:n; c = sprintf('%d', k); end;
c = sort(c);
tic; for i=1:100:n; f= quickfindstr(c{i}, c); end; toc
% 6.043004 seconds
tic; for i=1:100:length(c); f= find(strcmp(c{i}, c)); end; toc
% 7.903860 seconds
tic; for i=1:100:length(c); f= strmatch(c{i}, c, 'exact'); end; toc
% 56.037904 seconds

For n=15'000 the timings are:
quickfindstr: 0.437317 seconds
find(strcmp): 0.054181 seconds
strmatch: 0.417264 seconds

This shows, that for 150'000 strings you methods beats find(strcmp).

06 Mar 2013 Manel Soria

Thanks for the comment !
Can you please post how you do it and the time it takes ?
In any case, a sequential strcmp will be a O(N) operation and for large sets this method, O(logN) should be faster

06 Mar 2013 Jan Simon

The comparison with the very slow STRMATCH is not fair. On one hand this command is deprecated now, on the other FIND(STRCMP) is much faster.

Contact us