Suppose the data that you want to collect is generated element-by-element
and you know in advance how many elements will be generated. The intuitive
approach for collecting such data is to create an empty list and append
each new element to the end of the list. For example, this procedure
uses this approach to collect random integers generated by `random`

:

col := proc(n) local L, i; begin L := []; for i from 1 to n do L := L.[random()]; end_for; end:

The procedure generates random integers and collects them in a list:

col(5)

To estimate the performance of this approach, use the procedure `col`

to
generate a list of 50,000 random numbers:

time(col(50000))

The `time`

function
returns results measured in milliseconds.

Now, check how much time the procedure actually spends generating random numbers:

time(random() $ i = 1..50000)

Thus, the procedure spends most of the time appending the newly
generated numbers to a list. In MuPAD^{®}, appending a new entry
to a list of `n`

entries takes time proportional
to `n`

. Therefore, run time of `col(n)`

is
proportional to *n*^{2}.
You can visualize this dependency by plotting the times that `col(n)`

spends
when creating lists of 1 to 50,000 entries:

plotfunc2d(n -> time(col(n)), n = 1..50000, Mesh = 20, AdaptiveMesh = 0)

When appending a new entry to a list, MuPAD allocates space
for the new, longer list. Then it copies all entries of the old list
plus a new entry to this new list. The faster approach is to create
the entire list at once, without adding each new entry separately.
This approach is called *parallel collection* because
you create and collect data simultaneously. Use the sequence operator `$`

to
implement this approach:

col := proc(n) local i; begin [random() $ i = 1..n]; end:

This procedure spends most of its time generating random numbers:

time(col(50000))

Suppose you know how many elements you will generate, but you
cannot generate them all at once. In this case, the best strategy
is to create a list of the required length filling it with some constant,
such as 0 or `NIL`

.
Then you can replace any entry of this list with the generated value.
In this case, you do not need to generate elements in the order in
which you want them to appear in the list.

For example, use this procedure to generate the list of the
first `n`

Lucas numbers. The procedure creates a
list of `n`

entries, where each entry is 0. Then
it assigns the values to the first two entries. To replace all other
entries of the list with the Lucas numbers, the procedure uses the `for`

loop:

lucas := proc(n) local L, i; begin L := [0 $ n]; L[1] := 1; L[2] := 3; for i from 3 to n do L[i] := L[i-1] + L[i-2]; end_for; L end:

Measure the time needed to create a list of 10,000 Lucas numbers:

time(lucas(10000))

If you use the procedure that creates an empty list and appends each generated Lucas number to this list, then creating a list of 10,000 Lucas numbers takes much longer:

lucas := proc(n) local L, i; begin L := []; L :=L.[1]; L := L.[3]; for i from 3 to n do L := L.[L[i-1] + L[i-2]]; end_for; L end:

time(lucas(10000))

If you cannot predict the number of elements that you will generate, but have a reasonable upper limit on this number, use this strategy:

Create a list with the number of entries equal to or greater than the upper limit.

Generate the data and populate the list.

Discard the unused part of the list.

For example, use the following procedure to create a list. The
entries of this list are modular squares of a number `a`

(*a*^{2} mod *n*).
You cannot predict the number of entries in the resulting list because
it depends on the parameters `a`

and `n`

.
Nevertheless, you can see that in this case the number of entries
in the list cannot exceed `n`

:

modPowers := proc(a, n) local L, i; begin L := [0 $ n]; L[1] := a; L[2] := a^2 mod n; i := 2; while L[i] <> a do L[i + 1] := a*L[i] mod n; i := i + 1; end_while; L := L[1..i - 1]; end:

When you call `modPowers`

for ```
a =
3
```

and `a = 2`

, it creates two lists of
different lengths:

modPowers(3, 14); modPowers(2, 14)

Often, you cannot predict the number of elements and cannot estimate the upper limit on this number before you start generating actual data. One way of dealing with this problem is to choose some upper limit, and use the strategy described in Known Maximum Length Collection. If that limit is reached, then:

Choose a larger limit.

Create a new list with the number of elements corresponding to the new limit.

Copy existing collected data to the new list.

Typically, increasing the list length by a constant factor results in better performance than increasing it by a constant number of entries:

rndUntil42 := proc() local L, i; begin i := 1; L := [random()]; while L[i] mod 42 <> 0 do if i = nops(L) then L := L.L; end_if; i := i+1; L[i] := random(); end_while; L[1..i]; end:

SEED := 123456789: rndUntil42()

SEED := 123456789: time(rndUntil42() $ i = 1..500)

Alternatively, if you cannot predict the number of elements
that you will need to collect, then use a `table`

that grows automatically (a hash
table):

rndUntil42 := proc() local L, i, j; begin i := 1; L := table(1 = random()); while L[i] mod 42 <> 0 do i := i+1; L[i] := random(); end_while; [L[j] $ j=1..i]; end:

SEED := 123456789: time(rndUntil42() $ i = 1..500)

For this example, using the table is slightly faster. If you change the value 42 to another value, using the list might be faster. In general, tables are preferable when you collect large amounts of data. Choose the approach that works best for solving your problem.

Was this topic helpful?