## Lazy Evaluation of Tall Arrays

One of the differences between tall arrays and in-memory MATLAB^{®} arrays is that tall arrays typically remain *unevaluated*
until you request that calculations be performed. (The exceptions to this rule include
plotting functions like `plot`

and `histogram`

and
some statistical fitting functions like `fitlm`

, which automatically
evaluate tall array inputs.) While a tall array is in an unevaluated state, MATLAB might not know its size, its data type, or the specific values it contains.
However, you can still use unevaluated arrays in your calculations as if the values were
known. This allows you to work quickly with large data sets instead of waiting for each
command to execute. For this reason, it is recommended that you use
`gather`

only when you require output.

MATLAB keeps track of all the operations you perform on unevaluated tall arrays as
you enter them. When you eventually call `gather`

to evaluate the queued
operations, MATLAB uses the history of unevaluated commands to optimize the calculation by
minimizing the number of passes through the data. Used properly, this *lazy
evaluation* can save huge amounts of execution time by eliminating unnecessary
passes through large data sets.

### Display of Unevaluated Tall Arrays

The display of unevaluated tall arrays varies depending on how much MATLAB knows about the array and its values. There are three pieces of information reflected in the display:

**Array size**— Unknown dimension sizes are represented by the variables`M`

or`N`

in the display. If no dimension sizes are known, then the size appears as`MxNx....`

.**Array data type**— If the array has an unknown underlying data type, then its type appears as`tall array`

. If the type is known, it is listed as, for example,`tall double array`

.**Array values**— If the array values are unknown, then they appear as`?`

. Known values are displayed.

MATLAB might know all, some, or none of these pieces of information about a given tall array, depending on the nature of the calculation.

For example, if the array has a known data type but unknown size and values, then the unevaluated tall array might look like this:

M×N×... tall double array ? ? ? ... ? ? ? ... ? ? ? ... : : : : : :

If the type and relative size are known, then the display could be:

1×N tall char array ? ? ? ...

If some of the data is known, then MATLAB displays the known values:

100×3 tall double matrix 0.8147 0.1622 0.6443 0.9058 0.7943 0.3786 0.1270 0.3112 0.8116 0.9134 0.5285 0.5328 0.6324 0.1656 0.3507 0.0975 0.6020 0.9390 0.2785 0.2630 0.8759 0.5469 0.6541 0.5502 : : : : : :

### Evaluation with `gather`

The `gather`

function is used to evaluate tall
arrays. `gather`

accepts tall arrays as inputs and returns in-memory
arrays as outputs. For this reason, you can think of this function as a bridge between tall
arrays and in-memory arrays. For example, you cannot control `if`

or
`while`

loop statements using a tall logical array, but once the array is
evaluated with `gather`

it becomes an in-memory logical value that you
can use in these contexts.

`gather`

performs all queued operations on a tall array and returns
the *entire* result in memory. Since `gather`

returns
results as in-memory MATLAB arrays, standard memory considerations apply. MATLAB might run out of memory if the result returned by `gather`

is too large.

Most of the time you can use `gather`

to see the entire result of a
calculation, particularly if the calculation includes a reduction operation such as
`sum`

or `mean`

. However, if the result is too large
to fit in memory, then you can use `gather(head(X))`

or
`gather(tail(X))`

to perform the calculation and look at only the first
or last few rows of the result.

### Resolve Errors with `gather`

If you enter an erroneous command and `gather`

fails to evaluate a
tall array variable, then you must delete the variable from your workspace and recreate the
tall array using *only* valid commands. This is because MATLAB keeps track of all the operations you perform on unevaluated tall arrays as
you enter them. The only way to make MATLAB “forget” about an erroneous statement is to reconstruct the tall
array from scratch.

### Example: Calculate Size of Tall Array

This example shows what an unevaluated tall array looks like, and how to evaluate the array.

Create a datastore for the data set `airlinesmall.csv`

. Convert the
datastore into a tall table and then calculate the size.

varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'}; ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ... 'SelectedVariableNames', varnames); tt = tall(ds)

tt = M×4 tall table ArrDelay DepDelay Origin Dest ________ ________ ______ _____ 8 12 'LAX' 'SJC' 8 1 'SJC' 'BUR' 21 20 'SAN' 'SMF' 13 12 'BUR' 'SJC' 4 -1 'SMF' 'LAX' 59 63 'LAX' 'SJC' 3 -2 'SAN' 'SFO' 11 -1 'SEA' 'LAX' : : : : : : : :

s = size(tt)

s = 1×2 tall double row vector ? ? Preview deferred. Learn more.

Calculating the size of a tall array returns a small answer (a 1-by-2 vector), but the
display indicates that an entire pass through the data is still required to calculate the
size of `tt`

.

Use the `gather`

function to fully evaluate the tall array and bring
the results into memory. As the command executes, there is a dynamic progress display in the
command window that is particularly helpful with long calculations.

**Note**

Always ensure that the result returned by `gather`

will be able to
fit in memory. If you use `gather`

directly on a tall array without
reducing its size using a function such as `mean`

, then MATLAB might run out of memory.

tableSize = gather(s)

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.42 sec Evaluation completed in 0.48 sec tableSize = 123523 4

### Example: Multi-pass Calculations with Tall Arrays

This example shows how several calculations can be combined to minimize the total number of passes through the data.

Create a datastore for the data set `airlinesmall.csv`

. Convert the
datastore into a tall table.

varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'}; ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ... 'SelectedVariableNames', varnames); tt = tall(ds)

tt = M×4 tall table ArrDelay DepDelay Origin Dest ________ ________ ______ _____ 8 12 'LAX' 'SJC' 8 1 'SJC' 'BUR' 21 20 'SAN' 'SMF' 13 12 'BUR' 'SJC' 4 -1 'SMF' 'LAX' 59 63 'LAX' 'SJC' 3 -2 'SAN' 'SFO' 11 -1 'SEA' 'LAX' : : : : : : : :

Subtract the mean value of `DepDelay`

from `ArrDelay`

to create a new variable `AdjArrDelay`

. Then calculate the mean value of
`AdjArrDelay`

and subtract this mean value from
`AdjArrDelay`

. If these calculations were all evaluated separately, then
MATLAB would require four passes through the data.

AdjArrDelay = tt.ArrDelay - mean(tt.DepDelay,'omitnan'); AdjArrDelay = AdjArrDelay - mean(AdjArrDelay,'omitnan')

AdjArrDelay = M×1 tall double column vector ? ? ? : : Preview deferred. Learn more.

Evaluate `AdjArrDelay`

and view the first few rows. Because some
calculations can be combined, only three passes through the data are required.

gather(head(AdjArrDelay))

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 3: Completed in 0.4 sec - Pass 2 of 3: Completed in 0.39 sec - Pass 3 of 3: Completed in 0.23 sec Evaluation completed in 1.2 sec ans = 0.8799 0.8799 13.8799 5.8799 -3.1201 51.8799 -4.1201 3.8799

### Summary of Behavior and Recommendations

Tall arrays remain unevaluated until you request output using

`gather`

, an optimization called*lazy evaluation*.Use

`gather`

in most cases to evaluate tall array calculations. If you believe the result of the calculations might not fit in memory, then use`gather(head(X))`

or`gather(tail(X))`

instead.Work primarily with unevaluated tall arrays and request output only when necessary. The more queued calculations there are that are unevaluated, the more optimization MATLAB can do to minimize the number of passes through the data.

If you enter an erroneous tall array command and

`gather`

fails to evaluate a tall array variable, then you must delete the variable from your workspace and recreate the tall array using*only*valid commands.