File Exchange Binary Search for numeric vector

version 1.1.0.0 (2.5 KB) by Dr. Murtaza Khan

Dr. Murtaza Khan (view profile)

Search given value in a sorted vector, returns the index of location where the value is found.

Updated 09 Jul 2009

INPUT:
x: vector of numeric values, x should already be sorted in ascending order
(e.g. 2,7,20,...120)
sval: numeric value to be search in x

OUTPUT:
index: index of sval with respect to x. If sval is not found in x then index is empty.

John D'Errico

John D'Errico (view profile)

Actually, the limit for a floating point integer as a double is 2^53-1.

>> (2^53-1) == 2^53
ans =
0

>> (2^53+1) == 2^53
ans =
1

dmitry.grigoryev@uha.fr

dmitry.grigoryev@uha.fr (view profile)

Derek O'Connor,

Matlab numbers keep integer precision for values up to 2^51, which means the overflow will only happen on a computer with 16384TB of RAM. Despite advances in computer technology, I think we won't be seeing such monsters for a few decades.

C. Chaya

C. Chaya (view profile)

Works great. Want to use 'index' rather than 'x(index)' to find the location, as in :

x = sort(randint(1,100,[0,400]));
sval=12;
index = binarysearchasc(x,sval);
index

Derek O'Connor

Derek O'Connor (view profile)

Dear Dr Khan,
The comma should not have been included at the end of the link

http://www.derekroconnor.net/NA/Notes/NA-4-Latex.pdf

Please note that your Binary Search of rows of m by n matrix
also suffers from this error.

http://www.mathworks.com/matlabcentral/fileexchange/9095

Derek O'Connor

Derek O'Connor

Derek O'Connor (view profile)

Dear Dr Khan,

Your Binary Search has an error that occurs not only
in most student programs but in some of the best books.
See, for example, Jon Bentley, "Programming Pearls",
Addison-Wesley, 1986, pages 35 to 48.

The midpoint of an interval [a,b] may be calculated as
m1 = (a+b)/2 or as m2 = a+(b-a)/2. Mathematically, m1 == m2,
but computationally this may not be true.

Your program uses the first calculation and it goes wrong
when a and b are so large that (a+b) overflows. Here is an example, using your variable names:

>> from = intmax-10; to = intmax;
>> mid1 = round((from + to)/2); [from mid1 to]
ans =
2147483637 1073741824 2147483647
>> mid2 = round(from+(to-from)/2); [from mid2 to]
ans =
2147483637 2147483642 2147483647

See the notes on this topic at :

http://www.derekroconnor.net/NA/Notes/NA-4-Latex.pdf,
pages 25, 26, 40, 41

Derek O'Connor

Johnty R

Yang Zang

excellent

dsgs gsdgs

Marek Petrik

Not a production quality, but it is still useful for large data-sets.

M Khan

----------------------------------
1.The complexity of MATLAB 'find' is O(n) (i.e. linear search), where as the complexity of binaraysearchasc.m is O(logn). Therefore for any large data 'find' would be extremely slow than binary search. For example if n=100000 the find would take maximum 100000 operations, where as binary search would take about 20 operations.

2.In numerous real life applications, we need to sort data only once but search it countless time. For example telephone directory or dictionary. Therefore it is extremely useful to sort the data once and then apply binary search to find the records. Will it be wise to sort the telephone directory each time before search? Therefore would be useless to incorporate the sort method in search method. This will add the sorting complexity in search method each time.

3. The method deal with empty input and it states clearly that it is for numeric vector.

4. The binary search given the location of item, then more than one record can be find be searching backward and forward from that location linearly to find all those records , if needed but this will make the method more complex and will deviate from the principal binary search algorithm. Then this would be binary search+linear search. The method is basic binary search not binary search+linear search.

Richie Cotton

This sort of thing is handled more comprehensively by the MATLAB function 'find', but if you want to improve your function as a technical exercise, here are a couple of pointers.
1. 'binary' isn't spelt 'binaray'.
2. It would be more helpful to the user if the function sorted the input X, rather than making them do it themselves.
3. If the search value appears in the array more than once, can you get it to output all the locations, rather than just the first that it finds?
4. You might also want to type-check the inputs. What if it is empty/ complex/ a char/ an array etc.?