Main Content

tall

Create tall array

Description

t = tall(ds) creates a tall array on top of datastore ds.

  • If ds is a datastore for tabular data (so that the read and readall methods of datastore return tables or timetables), then t is a tall table or tall timetable, depending on what the datastore is configured to return. Tabular data is data that is arranged in a rectangular fashion with each row having the same number of entries.

  • Otherwise, t is a tall cell array.

example

t = tall(A) converts the in-memory array A into a tall array. The underlying data type of t is the same as class(A). This syntax is useful when you need to quickly create a tall array, such as for debugging or prototyping algorithms.

In R2019b and later, you can cast in-memory arrays into tall arrays for more efficient operations on the array. After you convert into a tall array, MATLAB® avoids making temporary copies of the whole array and works on the data in smaller blocks. This enables you to perform a wider range of operations on the array without running out of memory.

example

Examples

collapse all

Convert a datastore into a tall array.

First, create a datastore for the data set. You can specify either a full or relative file location for the data set using datastore(location) to create the datastore. The location argument can specify:

  • A single file, such as 'airlinesmall.csv'

  • Several files with the same extension, such as '*.csv'

  • An entire folder of files, such as 'C:\MyData'

tabularTextDatastore also has several options to specify file and text format properties when you create the datastore.

Create a datastore for the airlinesmall.csv data set. Treat 'NA' values as missing data so that they are replaced with NaN values. Select a small subset of the variables to work with.

varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
    'SelectedVariableNames', varnames);

Use tall to create a tall array for the data in the datastore. Since the data in ds is tabular, the result is a tall table. If the data is not tabular, then tall creates a tall cell array instead.

T = tall(ds)
T =

  Mx4 tall table

    ArrDelay    DepDelay    Origin      Dest  
    ________    ________    _______    _______

        8          12       {'LAX'}    {'SJC'}
        8           1       {'SJC'}    {'BUR'}
       21          20       {'SAN'}    {'SMF'}
       13          12       {'BUR'}    {'SJC'}
        4          -1       {'SMF'}    {'LAX'}
       59          63       {'LAX'}    {'SJC'}
        3          -2       {'SAN'}    {'SFO'}
       11          -1       {'SEA'}    {'LAX'}
       :           :           :          :
       :           :           :          :

You can use many common MATLAB® operators and functions to work with tall arrays. To see if a function works with tall arrays, check the Extended Capabilities section at the bottom of the function reference page.

Convert a datastore into a tall table, calculate its size using a deferred calculation, and then perform the calculation and return the result in memory.

First, create a datastore for the airlinesmall.csv data set. Treat 'NA' values as missing data so that they are replaced with NaN values. Set the text format of a few columns so that they are read as a cell array of character vectors. Convert the datastore into a tall table.

ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA');
ds.SelectedFormats{strcmp(ds.SelectedVariableNames, 'TailNum')} = '%s';
ds.SelectedFormats{strcmp(ds.SelectedVariableNames, 'CancellationCode')} = '%s';
T = tall(ds)
T =

  Mx29 tall table

    Year    Month    DayofMonth    DayOfWeek    DepTime    CRSDepTime    ArrTime    CRSArrTime    UniqueCarrier    FlightNum    TailNum    ActualElapsedTime    CRSElapsedTime    AirTime    ArrDelay    DepDelay    Origin      Dest      Distance    TaxiIn    TaxiOut    Cancelled    CancellationCode    Diverted    CarrierDelay    WeatherDelay    NASDelay    SecurityDelay    LateAircraftDelay
    ____    _____    __________    _________    _______    __________    _______    __________    _____________    _________    _______    _________________    ______________    _______    ________    ________    _______    _______    ________    ______    _______    _________    ________________    ________    ____________    ____________    ________    _____________    _________________

    1987     10          21            3          642          630         735          727          {'PS'}          1503       {'NA'}             53                 57            NaN          8          12       {'LAX'}    {'SJC'}      308        NaN        NaN          0             {'NA'}            0            NaN             NaN           NaN            NaN                NaN       
    1987     10          26            1         1021         1020        1124         1116          {'PS'}          1550       {'NA'}             63                 56            NaN          8           1       {'SJC'}    {'BUR'}      296        NaN        NaN          0             {'NA'}            0            NaN             NaN           NaN            NaN                NaN       
    1987     10          23            5         2055         2035        2218         2157          {'PS'}          1589       {'NA'}             83                 82            NaN         21          20       {'SAN'}    {'SMF'}      480        NaN        NaN          0             {'NA'}            0            NaN             NaN           NaN            NaN                NaN       
    1987     10          23            5         1332         1320        1431         1418          {'PS'}          1655       {'NA'}             59                 58            NaN         13          12       {'BUR'}    {'SJC'}      296        NaN        NaN          0             {'NA'}            0            NaN             NaN           NaN            NaN                NaN       
    1987     10          22            4          629          630         746          742          {'PS'}          1702       {'NA'}             77                 72            NaN          4          -1       {'SMF'}    {'LAX'}      373        NaN        NaN          0             {'NA'}            0            NaN             NaN           NaN            NaN                NaN       
    1987     10          28            3         1446         1343        1547         1448          {'PS'}          1729       {'NA'}             61                 65            NaN         59          63       {'LAX'}    {'SJC'}      308        NaN        NaN          0             {'NA'}            0            NaN             NaN           NaN            NaN                NaN       
    1987     10           8            4          928          930        1052         1049          {'PS'}          1763       {'NA'}             84                 79            NaN          3          -2       {'SAN'}    {'SFO'}      447        NaN        NaN          0             {'NA'}            0            NaN             NaN           NaN            NaN                NaN       
    1987     10          10            6          859          900        1134         1123          {'PS'}          1800       {'NA'}            155                143            NaN         11          -1       {'SEA'}    {'LAX'}      954        NaN        NaN          0             {'NA'}            0            NaN             NaN           NaN            NaN                NaN       
     :        :          :             :           :           :            :           :               :              :           :               :                  :              :          :           :           :          :          :          :          :           :               :               :             :               :             :              :                  :
     :        :          :             :           :           :            :           :               :              :           :               :                  :              :          :           :           :          :          :          :          :           :               :               :             :               :             :              :                  :

The display of the tall table indicates that MATLAB® does not yet know how many rows of data are in the table.

Calculate the size of the tall table. Since calculating the size of a tall array requires a full pass through the data, MATLAB does not immediately calculate the value. Instead, like most operations with tall arrays, the result is an unevaluated tall array whose values and size are currently unknown.

s = size(T)
s =

  1x2 tall double row vector

    ?    ?

Use the gather function to perform the deferred calculation and return the result in memory. The result returned by size is a trivially small 1-by-2 vector, which fits in memory.

sz = gather(s)
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 0.68 sec
Evaluation completed in 0.87 sec
sz = 1×2

      123523          29

If you use gather on an unreduced tall array, then the result might not fit in memory. If you are unsure whether the result returned by gather can fit in memory, use gather(head(X)) or gather(tail(X)) to bring only a small portion of the calculation result into memory.

Create an in-memory array of random numbers, and then convert it into a tall array. Creating tall arrays from in-memory arrays in this manner is useful for debugging or prototyping new programs. The in-memory array is still bound by normal memory constraints, and even after it is converted into a tall array it cannot grow beyond the limits of memory.

A = rand(100,4);
tA = tall(A)
tA =

  100x4 tall double matrix

    0.8147    0.1622    0.6443    0.0596
    0.9058    0.7943    0.3786    0.6820
    0.1270    0.3112    0.8116    0.0424
    0.9134    0.5285    0.5328    0.0714
    0.6324    0.1656    0.3507    0.5216
    0.0975    0.6020    0.9390    0.0967
    0.2785    0.2630    0.8759    0.8181
    0.5469    0.6541    0.5502    0.8175
      :         :         :         :
      :         :         :         :

In R2019b and later releases, when you convert in-memory arrays into tall arrays, you can perform calculations on the array without requiring extra memory for temporary copies of the data. For example, this code normalizes the data in a large matrix and then calculates the sum of all the rows and columns. An in-memory version of this calculation needs to not only store the array but also have enough memory available to create temporary copies of the array.

N = 5000;
tA = tall(rand(N));
tB = tA - mean(tA);
S = gather(sum(tB, [1,2]))
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 2: Completed in 0.37 sec
- Pass 2 of 2: Completed in 0.38 sec
Evaluation completed in 1.3 sec
S = 
-1.0004e-11

If you adjust the value of N so that there is enough memory to store tA, but not enough memory for copies, the calculation still executes successfully.

Input Arguments

collapse all

Input datastore, specified as a datastore object. See Datastore for more information on creating a datastore object for your data set.

Tall arrays work only with datastores that are deterministic. That is, if you use read on the datastore, reset the datastore with reset, and then read the datastore again, then the data returned must be the same in both cases. Tall array calculations involving a datastore that is not deterministic can produce unpredictable results. See Select Datastore for File Format or Application for more information.

Example: ds = tabularTextDatastore('airlinesmall.csv') specifies a single file.

Example: ds = tabularTextDatastore('*.csv') specifies a collection of .csv files.

Example: ds = spreadsheetDatastore('C:\MyData') specifies a folder of spreadsheet files.

Example: ds = datastore('hdfs:///data/') specifies a data set in an HDFS file system.

In-memory variable, specified as an array.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | table | timetable | string | cell | categorical | datetime | duration | calendarDuration
Complex Number Support: Yes

Output Arguments

collapse all

Tall array, returned as one of these types:

  • When converting a datastore, t is a tall table or tall timetable for tabular datastores. Otherwise, t is a tall cell array.

  • When converting an in-memory array, the underlying data type of t is the same as class(A).

See Lazy Evaluation of Tall Arrays for information about how to effectively work with tall arrays.

Tips

  • See Extend Tall Arrays with Other Products for information on how to use tall arrays with:

    • Statistics and Machine Learning Toolbox™

    • Parallel Computing Toolbox™

    • MATLAB Parallel Server™

    • Database Toolbox™

    • MATLAB Compiler™

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Version History

Introduced in R2016b