Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
buffer of sampling vectors

Subject: buffer of sampling vectors

From: Michal Kvasnicka

Date: 20 Apr, 2011 19:14:17

Message: 1 of 5

My problem is the effective storage (buffer) of the huge number (about
1e6, for example) of unique n-dimensional sampling vectors (10<n<100,
elements of vector are integer numbers <= n) and effectively designed
two basic functions:

1. addtobuffer(buff, vec) ... this function add the vector "vec" to
buffer "buff" in the case that vector "vec" is not already in the
buffer. The buffer is empty at the begining. Some kind of vector
ordering (lexicographic) would be beneficial.

2. pos = isinbuffer(buff,vec) ... this function check if the vector
"vec" is in buffer "buff", and returns possition "pos" of the vector
in the buffer. pos = 0, if the vector "vec" is not in buffer.

I will be very happy for any hints how to solve this problem in
MATLAB.

The problem is motivated by optimization of the very cpu time
consuming black-box function. In this case is very important to avoid
repeated evaluation randomly generated input sampling vectors by
properly designed buffering.

Michal

Subject: buffer of sampling vectors

From: Michal Kvasnicka

Date: 20 Apr, 2011 19:45:50

Message: 2 of 5

Just a brief clarification ...

Of course, I know that this problem can be solved by some trivial
techniques in MATLAB, but I am looking for something more
sophisticated. Because for large number of vectors in buffer and n>5
the problem becomes numerically nontrivial.

Subject: buffer of sampling vectors

From: Roger Stafford

Date: 21 Apr, 2011 00:56:04

Message: 3 of 5

"Michal Kvasnicka" wrote in message <7a51ea38-5435-464d-a3e5-3b87a7619966@2g2000vbl.googlegroups.com>...
> My problem is the effective storage (buffer) of the huge number (about
> 1e6, for example) of unique n-dimensional sampling vectors (10<n<100,
> elements of vector are integer numbers <= n) and effectively designed
> two basic functions:
>
> 1. addtobuffer(buff, vec) ... this function add the vector "vec" to
> buffer "buff" in the case that vector "vec" is not already in the
> buffer. The buffer is empty at the begining. Some kind of vector
> ordering (lexicographic) would be beneficial.
>
> 2. pos = isinbuffer(buff,vec) ... this function check if the vector
> "vec" is in buffer "buff", and returns possition "pos" of the vector
> in the buffer. pos = 0, if the vector "vec" is not in buffer.
>
> I will be very happy for any hints how to solve this problem in
> MATLAB.
>
> The problem is motivated by optimization of the very cpu time
> consuming black-box function. In this case is very important to avoid
> repeated evaluation randomly generated input sampling vectors by
> properly designed buffering.
>
> Michal
- - - - - - - - - - - -
  I am not sure what you mean by "trivial techniques", Michal. Unless you envision the use of mex files, the ordinary functions of matlab are all that you have available for use. The idea is to use these in the most efficient manner one can think of.

  For a determination of equality status, lexicographically speaking, between two vectors v1 and v2 I would recommend doing this

 f = find(v1~=v2,1,'first');

If f comes up empty, the vectors are equal. Otherwise, the 'find' scanning has not had to go past the first point of inequality. In that case a subsequent comparison of the single elements v1(f) and v2(f) will reveal which way the inequality goes.

  I would then recommend a binary search technique using this inequality method to determine in the least number of steps where, in an ordered "buffer" of vectors, a new vector is either already located or where it could be inserted so as to preserve its ordering. There is nothing very sophisticated about such a technique, but it has the virtue of being of order O(n*log(m)) where n is the number of elements in the vectors and m is the number of vectors present in the buffer.

  My final recommendation is that, in view of the large size of your vectors, you actually add new vectors to your buffer sequentially as they appear without regard to order, but instead maintain a set of sequential pointers which contain the index of the vector in the buffer that would be first in an ordered arrangement, then the second and so forth. This gives your buffer a "virtual" ordering and avoids having to carry out massive shifts of data every time an insertion is to be made, and yet it still enables binary searching. Only the pointer list would undergo a large shift when a virtual insertion is made. If the time ever comes when you need the buffer to actually be ordered, this could be carried out comparatively efficiently using this pointer list, provided it doesn't occur very often.

  If possible you should provide an initial memory allocation to your buffer with the 'zeros' function sufficiently large to accommodate all anticipated additions, and keep track of the number of vectors actually present with some counter. Otherwise you run into serious delays memory-wise from overly frequent expansions of the buffer.

  Of course if you merely want to order a fixed set of vectors lexicographically all at one time, the best way would be using the 'unique' function with the 'rows' option. That guarantees that the vectors will be ordered and without repetition. However that does not appear to be the problem you are faced with, and you don't want to find yourself repeatedly sorting the same set of vectors over and over again. Hence the need for binary searching as the vectors are presented one at a time.

Roger Stafford

Subject: buffer of sampling vectors

From: Michal Kvasnicka

Date: 22 Apr, 2011 10:01:04

Message: 4 of 5

"Roger Stafford" wrote in message <ionvb4$9dv$1@fred.mathworks.com>...
> I am not sure what you mean by "trivial techniques", Michal. Unless you envision the use of mex files, the ordinary functions of matlab are all that you have available for use. The idea is to use these in the most efficient manner one can think of.
>
> For a determination of equality status, lexicographically speaking, between two vectors v1 and v2 I would recommend doing this
>
> f = find(v1~=v2,1,'first');
>
> If f comes up empty, the vectors are equal. Otherwise, the 'find' scanning has not had to go past the first point of inequality. In that case a subsequent comparison of the single elements v1(f) and v2(f) will reveal which way the inequality goes.
>
> I would then recommend a binary search technique using this inequality method to determine in the least number of steps where, in an ordered "buffer" of vectors, a new vector is either already located or where it could be inserted so as to preserve its ordering. There is nothing very sophisticated about such a technique, but it has the virtue of being of order O(n*log(m)) where n is the number of elements in the vectors and m is the number of vectors present in the buffer.
>
> My final recommendation is that, in view of the large size of your vectors, you actually add new vectors to your buffer sequentially as they appear without regard to order, but instead maintain a set of sequential pointers which contain the index of the vector in the buffer that would be first in an ordered arrangement, then the second and so forth. This gives your buffer a "virtual" ordering and avoids having to carry out massive shifts of data every time an insertion is to be made, and yet it still enables binary searching. Only the pointer list would undergo a large shift when a virtual insertion is made. If the time ever comes when you need the buffer to actually be ordered, this could be carried out comparatively efficiently using this pointer list, provided it doesn't occur very often.
>
> If possible you should provide an initial memory allocation to your buffer with the 'zeros' function sufficiently large to accommodate all anticipated additions, and keep track of the number of vectors actually present with some counter. Otherwise you run into serious delays memory-wise from overly frequent expansions of the buffer.
>
> Of course if you merely want to order a fixed set of vectors lexicographically all at one time, the best way would be using the 'unique' function with the 'rows' option. That guarantees that the vectors will be ordered and without repetition. However that does not appear to be the problem you are faced with, and you don't want to find yourself repeatedly sorting the same set of vectors over and over again. Hence the need for binary searching as the vectors are presented one at a time.
>
> Roger Stafford

Roger, thank you for your interesting hints and recomendations. First of all, I tried to find some simple and compact Matlab solution of my problem. The function "ismember" is probably very good candidate for this kind of problem.

Lets start with testing buffer respresented by matrix M:
m = 1000000;
n = 20;
M = randi([1 n],m,n,'uint8');

and then the function which is able to find the location of unique row vector x within matrix M may looks like:
testloc = 123456;
x = M(testloc,:);
[tf loc] = ismember(x,M,'rows');

Is possible to find locations of multiple vectors, too:
[tf loc] = ismember([x;y;z],M,'rows');

This solution is very elegant, but I need something faster. For large m > 1e6 and m>20 is typical response time about 1sec. In a case of multiple vectors is the response time about 10times longer.

Is there any chance to speedup this searching method based on "ismember" function?

Michal

Subject: buffer of sampling vectors

From: Roger Stafford

Date: 22 Apr, 2011 22:41:05

Message: 5 of 5

"Michal Kvasnicka" wrote in message <iorjl0$m13$1@fred.mathworks.com>...
> Roger, thank you for your interesting hints and recomendations. First of all, I tried to find some simple and compact Matlab solution of my problem. The function "ismember" is probably very good candidate for this kind of problem.
>
> Lets start with testing buffer respresented by matrix M:
> m = 1000000;
> n = 20;
> M = randi([1 n],m,n,'uint8');
>
> and then the function which is able to find the location of unique row vector x within matrix M may looks like:
> testloc = 123456;
> x = M(testloc,:);
> [tf loc] = ismember(x,M,'rows');
>
> Is possible to find locations of multiple vectors, too:
> [tf loc] = ismember([x;y;z],M,'rows');
>
> This solution is very elegant, but I need something faster. For large m > 1e6 and m>20 is typical response time about 1sec. In a case of multiple vectors is the response time about 10times longer.
>
> Is there any chance to speedup this searching method based on "ismember" function?
>
> Michal
- - - - - - - -
  I seriously doubt that you will find any other function that will do better than 'ismember' in locating whether and where any single copy of a given vector is located in an array M if the rows of M occur in random order. It is an inherently order O(m) algorithm where m is the number of rows in M, since it requires m comparisons to successfully carry out the task. However, as I said previously, if M were to be presented in some kind of sorted state, it is easy to write an algorithm that takes advantage of this. For your m = 1000000 case there would only be 21 comparisons necessary to locate a copy if one is present or to locate an insertion point if none is present, using a binary search technique. As you can imagine, there is a vast difference between requiring 1000000 comparisons and only 21.

  Of course it should be remembered that there is a cost to maintaining a sorted state. Each new insertion requires a shift of at least the elements of a long "pointer" vector such as I mentioned before, but I suspect the cost of such a shift would be markedly less than the above million comparisons.

  To give you confidence in binary searching, below is an example of searching for a scalar quantity in a list of scalars. (Using the vector comparison I mentioned earlier it could easily be modified to do a binary search on a lexicographically ordered array of unique vectors.) Let M be a vector of unique scalar quantities and x an arbitrary scalar. Then do:

  n = length(M);
  a = 0; b = n; c = ceil((a+b)/2);
  for k = 1:ceil(log2(n+1))
   if x <= M(c), b = c; else a = c; end
   c = ceil((a+b)/2);
  end
  t = (a<n); if t, t = (x==M(c)); else, c = n+1; end

When this finishes, the logical variable 't' is true if the value x has been found in M. The index 'c' will point to where the copy of x is located in M if t were true, and to where it could be inserted, with the remainder of the list being shifted right one place to maintain a sorted M, if t were false. In the special case c equal to n+1, the "insertion" point would be an appending beyond the end of the current list.

  You will note that the for-loop only has to execute ceil(log2(n+1)) times, which for n = 1000000 would be 20 times plus one more to test for equality. Even when this is modified to accept vectors for x, it should be faster than the 'ismember' method by a large factor. It will be the insertion shifts necessary to maintain order in M that will then probably take the most time, but very likely less than with the use of an unsorted M.

Roger Stafford

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us