Documentation Center

  • Trial Software
  • Product Updates

Data Collection

Parallel Collection

Suppose the data that you want to collect is generated element-by-element and you know in advance how many elements will be generated. The intuitive approach for collecting such data is to create an empty list and append each new element to the end of the list. For example, this procedure uses this approach to collect random integers generated by random:

col :=
proc(n)
  local L, i;
begin
  L := [];
  for i from 1 to n do
    L := L.[random()];
  end_for;
end:

The procedure generates random integers and collects them in a list:

col(5)

To estimate the performance of this approach, use the procedure col to generate a list of 50,000 random numbers:

time(col(50000))

The time function returns results measured in milliseconds.

Now, check how much time the procedure actually spends generating random numbers:

time(random() $ i = 1..50000)

Thus, the procedure spends most of the time appending the newly generated numbers to a list. In MuPAD®, appending a new entry to a list of n entries takes time proportional to n. Therefore, run time of col(n) is proportional to n2. You can visualize this dependency by plotting the times that col(n) spends when creating lists of 1 to 50,000 entries:

plotfunc2d(n -> time(col(n)), n = 1..50000,
               Mesh = 20, AdaptiveMesh = 0)

When appending a new entry to a list, MuPAD allocates space for the new, longer list. Then it copies all entries of the old list plus a new entry to this new list. The faster approach is to create the entire list at once, without adding each new entry separately. This approach is called parallel collection because you create and collect data simultaneously. Use the sequence operator $ to implement this approach:

col := proc(n)
  local i;
begin
  [random() $ i = 1..n];
end:

This procedure spends most of its time generating random numbers:

time(col(50000))

Fixed-Length Collection

Suppose you know how many elements you will generate, but you cannot generate them all at once. In this case, the best strategy is to create a list of the required length filling it with some constant, such as 0 or NIL. Then you can replace any entry of this list with the generated value. In this case, you do not need to generate elements in the order in which you want them to appear in the list.

For example, use this procedure to generate the list of the first n Lucas numbers. The procedure creates a list of n entries, where each entry is 0. Then it assigns the values to the first two entries. To replace all other entries of the list with the Lucas numbers, the procedure uses the for loop:

lucas :=
proc(n)
  local L, i;
begin
  L := [0 $ n];
  L[1] := 1;
  L[2] := 3;
  for i from 3 to n do
    L[i] := L[i-1] + L[i-2];
  end_for;
  L
end:

Measure the time needed to create a list of 10,000 Lucas numbers:

time(lucas(10000))

If you use the procedure that creates an empty list and appends each generated Lucas number to this list, then creating a list of 10,000 Lucas numbers takes much longer:

lucas :=
proc(n)
  local L, i;
begin
  L := [];
  L :=L.[1];
  L := L.[3];
  for i from 3 to n do
    L := L.[L[i-1] + L[i-2]];
  end_for;
  L
end:
time(lucas(10000))

Known-Maximum-Length Collection

If you cannot predict the number of elements that you will generate, but have a reasonable upper limit on this number, use this strategy:

  1. Create a list with the number of entries equal to or greater than the upper limit.

  2. Generate the data and populate the list.

  3. Discard the unused part of the list.

For example, use the following procedure to create a list. The entries of this list are modular squares of a number a (a2 mod n). You cannot predict the number of entries in the resulting list because it depends on the parameters a and n. Nevertheless, you can see that in this case the number of entries in the list cannot exceed n:

modPowers :=
proc(a, n)
  local L, i;
begin
  L := [0 $ n];
  L[1] := a;
  L[2] := a^2 mod n;
  i := 2;
  while L[i] <> a do
    L[i + 1] := a*L[i] mod n;
    i := i + 1;
  end_while;
  L := L[1..i - 1];
end:

When you call modPowers for a = 3 and a = 2, it creates two lists of different lengths:

modPowers(3, 14);
modPowers(2, 14)

Unknown-Maximum-Length Collection

Often, you cannot predict the number of elements and cannot estimate the upper limit on this number before you start generating actual data. One way of dealing with this problem is to choose some upper limit, and use the strategy described in Known Maximum Length Collection. If that limit is reached, then:

  1. Choose a larger limit.

  2. Create a new list with the number of elements corresponding to the new limit.

  3. Copy existing collected data to the new list.

Typically, increasing the list length by a constant factor results in better performance than increasing it by a constant number of entries:

rndUntil42 :=
proc()
  local L, i;
begin
  i := 1;
  L := [random()];
  while L[i] mod 42 <> 0 do
    if i = nops(L) then
      L := L.L;
    end_if;
    i := i+1;
    L[i] := random();
  end_while;
  L[1..i];
end:
SEED := 123456789:
rndUntil42()

SEED := 123456789:
time(rndUntil42() $ i = 1..500)

Alternatively, if you cannot predict the number of elements that you will need to collect, then use a table that grows automatically (a hash table):

rndUntil42 :=
proc()
  local L, i, j;
begin
  i := 1;
  L := table(1 = random());
  while L[i] mod 42 <> 0 do
    i := i+1;
    L[i] := random();
  end_while;
  [L[j] $ j=1..i];
end:
SEED := 123456789:
time(rndUntil42() $ i = 1..500)

For this example, using the table is slightly faster. If you change the value 42 to another value, using the list might be faster. In general, tables are preferable when you collect large amounts of data. Choose the approach that works best for solving your problem.

Was this topic helpful?