This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

tall

Create tall array

Syntax

t = tall(ds)
t = tall(A)

Description

example

t = tall(ds) creates a tall array on top of datastore ds.

  • If ds is a datastore for tabular data (so that the read and readall methods of datastore return tables), then t is a tall table. Tabular data is data that is arranged in a rectangular fashion with each row having the same number of entries.

  • Otherwise, t is a tall cell array.

example

t = tall(A) converts the in-memory array A into a tall array. The underlying data type of t is the same as class(A).

Examples

collapse all

Convert a datastore into a tall array.

First, create a datastore for the data set. You can specify either a full or relative file location for the data set using datastore(location) to create the datastore. The location argument can specify:

  • A single file, such as 'airlinesmall.csv'

  • Several files with the same extension, such as '*.csv'

  • An entire folder of files, such as 'C:\MyData'

datastore also has several options to specify file and text format properties when you create the datastore.

Create a datastore for the airlinesmall.csv data set. Treat 'NA' values as missing data so that they are replaced with NaN values. Select a small subset of the variables to work with.

varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
    'SelectedVariableNames', varnames);

Use tall to create a tall array for the data in the datastore. Since the data in ds is tabular, the result is a tall table. If the data is not tabular, then tall creates a tall cell array instead.

T = tall(ds)
T =

  Mx4 tall table

    ArrDelay    DepDelay    Origin    Dest 
    ________    ________    ______    _____

        8          12       'LAX'     'SJC'
        8           1       'SJC'     'BUR'
       21          20       'SAN'     'SMF'
       13          12       'BUR'     'SJC'
        4          -1       'SMF'     'LAX'
       59          63       'LAX'     'SJC'
        3          -2       'SAN'     'SFO'
       11          -1       'SEA'     'LAX'
       :           :          :         :
       :           :          :         :

You can use many common MATLAB® operators and functions to work with tall arrays. For a list of supported functions, see:

Convert a datastore into a tall table, calculate its size using a deferred calculation, and then perform the calculation and return the result in memory.

First, create a datastore for the airlinesmall.csv data set. Treat 'NA' values as missing data so that they are replaced with NaN values. Set the text format of a few columns so that they are read as a cell array of character vectors. Convert the datastore into a tall table.

ds = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA');
ds.SelectedFormats{strcmp(ds.SelectedVariableNames, 'TailNum')} = '%s';
ds.SelectedFormats{strcmp(ds.SelectedVariableNames, 'CancellationCode')} = '%s';
T = tall(ds)
T =

  Mx29 tall table

    Year    Month    DayofMonth    DayOfWeek    DepTime    CRSDepTime    ArrTime    CRSArrTime    UniqueCarrier    FlightNum    TailNum    ActualElapsedTime    CRSElapsedTime    AirTime    ArrDelay    DepDelay    Origin    Dest     Distance    TaxiIn    TaxiOut    Cancelled    CancellationCode    Diverted    CarrierDelay    WeatherDelay    NASDelay    SecurityDelay    LateAircraftDelay
    ____    _____    __________    _________    _______    __________    _______    __________    _____________    _________    _______    _________________    ______________    _______    ________    ________    ______    _____    ________    ______    _______    _________    ________________    ________    ____________    ____________    ________    _____________    _________________

    1987     10          21            3          642          630         735          727           'PS'           1503        'NA'              53                 57            NaN          8          12       'LAX'     'SJC'      308        NaN        NaN          0              'NA'             0            NaN             NaN           NaN            NaN                NaN       
    1987     10          26            1         1021         1020        1124         1116           'PS'           1550        'NA'              63                 56            NaN          8           1       'SJC'     'BUR'      296        NaN        NaN          0              'NA'             0            NaN             NaN           NaN            NaN                NaN       
    1987     10          23            5         2055         2035        2218         2157           'PS'           1589        'NA'              83                 82            NaN         21          20       'SAN'     'SMF'      480        NaN        NaN          0              'NA'             0            NaN             NaN           NaN            NaN                NaN       
    1987     10          23            5         1332         1320        1431         1418           'PS'           1655        'NA'              59                 58            NaN         13          12       'BUR'     'SJC'      296        NaN        NaN          0              'NA'             0            NaN             NaN           NaN            NaN                NaN       
    1987     10          22            4          629          630         746          742           'PS'           1702        'NA'              77                 72            NaN          4          -1       'SMF'     'LAX'      373        NaN        NaN          0              'NA'             0            NaN             NaN           NaN            NaN                NaN       
    1987     10          28            3         1446         1343        1547         1448           'PS'           1729        'NA'              61                 65            NaN         59          63       'LAX'     'SJC'      308        NaN        NaN          0              'NA'             0            NaN             NaN           NaN            NaN                NaN       
    1987     10           8            4          928          930        1052         1049           'PS'           1763        'NA'              84                 79            NaN          3          -2       'SAN'     'SFO'      447        NaN        NaN          0              'NA'             0            NaN             NaN           NaN            NaN                NaN       
    1987     10          10            6          859          900        1134         1123           'PS'           1800        'NA'             155                143            NaN         11          -1       'SEA'     'LAX'      954        NaN        NaN          0              'NA'             0            NaN             NaN           NaN            NaN                NaN       
     :        :          :             :           :           :            :           :               :              :           :               :                  :              :          :           :          :         :         :          :          :           :               :               :             :               :             :              :                  :
     :        :          :             :           :           :            :           :               :              :           :               :                  :              :          :           :          :         :         :          :          :           :               :               :             :               :             :              :                  :

The display of the tall table indicates that MATLAB® does not yet know how many rows of data are in the table.

Calculate the size of the tall table. Since calculating the size of a tall array requires a full pass through the data, MATLAB does not immediately calculate the value. Instead, like most operations with tall arrays, the result is an unevaluated tall array whose values and size are currently unknown.

s = size(T)
s =

  1x2 tall double row vector

    ?    ?

Use the gather function to perform the deferred calculation and return the result in memory. The result returned by size is a trivially small 1-by-2 vector, which fits in memory.

sz = gather(s)
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 1.3 sec
Evaluation completed in 1.6 sec
sz = 1×2

      123523          29

If you use gather on an unreduced tall array, then the result might not fit in memory. If you are unsure whether the result returned by gather can fit in memory, use gather(head(X)) or gather(tail(X)) to bring only a small portion of the calculation result into memory.

Create an in-memory array of random numbers, and then convert it into a tall array. Creating tall arrays from in-memory arrays is useful for debugging or prototyping new programs.

A = rand(100,4);
tA = tall(A)
tA =

  100x4 tall double matrix

    0.8147    0.1622    0.6443    0.0596
    0.9058    0.7943    0.3786    0.6820
    0.1270    0.3112    0.8116    0.0424
    0.9134    0.5285    0.5328    0.0714
    0.6324    0.1656    0.3507    0.5216
    0.0975    0.6020    0.9390    0.0967
    0.2785    0.2630    0.8759    0.8181
    0.5469    0.6541    0.5502    0.8175
      :         :         :         :
      :         :         :         :

Input Arguments

collapse all

Input datastore, specified as a datastore object. Use the datastore function to create a datastore object for your data set.

Tall arrays work only with datastores that are deterministic. That is, if you use read on the datastore, reset the datastore with reset, and then read the datastore again, then the data returned must be the same in both cases. Tall array calculations involving a datastore that is not deterministic can produce unpredictable results. See Select Datastore for File Format or Application for more information.

Example: ds = datastore('airlinesmall.csv') specifies a single file.

Example: ds = datastore('*.csv') specifies a collection of .csv files.

Example: ds = datastore('C:\MyData') specifies a folder of files.

Example: ds = datastore('hdfs:///data/') specifies a data set in an HDFS file system.

In-memory variable, specified as an array.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | table | string | cell | categorical | datetime | duration | calendarDuration
Complex Number Support: Yes

Output Arguments

collapse all

Tall array.

  • When converting a datastore, t is a tall table for tabular datastores. Otherwise, t is a tall cell array.

  • When converting an in-memory array, the underlying data type of t is the same as class(A).

Tips

  • See Extend Tall Arrays with Other Products for information on how to use tall arrays with:

    • Statistics and Machine Learning Toolbox™

    • Parallel Computing Toolbox™

    • MATLAB® Distributed Computing Server™

    • Database Toolbox™

    • MATLAB Compiler™

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Introduced in R2016b