This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

spreadsheetDatastore

Datastore for spreadsheet files

Description

Use a SpreadsheetDatastore object to manage large collections of spreadsheet files where the collection does not necessarily fit in memory. You can create a SpreadsheetDatastore object using the spreadsheetDatastore function, specify its properties, and then import the data using object functions.

Creation

Syntax

ssds = spreadsheetDatastore(location)
ssds = spreadsheetDatastore(location,Name,Value)

Description

example

ssds = spreadsheetDatastore(location) creates a spreadsheet datastore from the collection of data specified by location.

ssds = spreadsheetDatastore(location,Name,Value) specifies additional parameters and properties for ssds using one or more name-value pair arguments. For example, spreadsheetDatastore(location,'FileExtensions',{'.xlsx','.xls'}) specifies which files to include in the datastore depending on the file extensions.

Input Arguments

expand all

Files or folders to include in the datastore, specified as a character vector, cell array of character vectors, string scalar, or string array. If the files are not in the current folder, then location must be full or relative paths. Files within subfolders of the specified folder are not automatically included in the datastore. Supported file extensions are .xls, .xlsx, .xlsm, .xltx, and .xltm.

You can use the wildcard character (*) when specifying location. This character specifies that the datastore include all matching files or all files in the matching folders.

If the files are not available locally, then the full path of the files or folders must be an internationalized resource identifier (IRI) of the form
hdfs:///path_to_file.

For information on using datastore with Amazon S3™, Windows Azure® Blob Storage, and HDFS™, see Work with Remote Data.

When location represents a folder, the datastore includes only supported spreadsheet file formats and ignores any other format.

Example: 'file1.xlsx'

Example: '../dir/data/file1.xlsx'

Example: {'C:\dir\data\file1.xlsx','C:\dir\data\file2.xlsx'}

Example: 's3://bucketname/path_to_files/*.xls'

Data Types: char | cell | string

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: ssds = spreadsheetDatastore('C:\dir\spreadsheetdata','FileExtensions',{'.xls','.xlsm'})

Subfolder inclusion flag, specified as the comma-separated pair consisting of 'IncludeSubfolders' and true, false, 0, or 1. Specify true to include all files and subfolders within each folder or false to include only the files within each folder.

If you do not specify 'IncludeSubfolders', then the default value is false.

Example: 'IncludeSubfolders',true

Data Types: logical | double

Spreadsheet file extensions, specified as the comma-separated pair consisting of 'FileExtensions' and a character vector, cell array of character vectors, string scalar, or string array.

  • If you do not specify 'FileExtensions', then spreadsheetDatastore automatically includes all files with .xls, .xlsx, .xlsm, .xltx, and .xltm extensions in the specified path.

  • If you want to include spreadsheets with non-standard file extensions in the SpreadsheetDatastore, then specify those extensions explicitly.

  • If you want to create a SpreadsheetDatastore for files without any extensions, then specify 'FileExtensions' as an empty character vector, ''.

Example: 'FileExtensions',''

Example: 'FileExtensions','.xls'

Example: 'FileExtensions',{'.xlsx','.xlsm'}

Data Types: char | cell | string

Alternate file system root paths, specified as the comma-separated pair consisting of 'AlternateFileSystemRoots' and a string vector or a cell array. Use 'AlternateFileSystemRoots' when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Distributed Computing Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use 'AlternateFileSystemRoots' to associate the root paths.

  • To associate a set of root paths that are equivalent to one another, specify 'AlternateFileSystemRoots' as a string vector. For example,

    ["Z:\datasets","/mynetwork/datasets"]

  • To associate multiple sets of root paths that are equivalent for the datastore, specify 'AlternateFileSystemRoots' as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:

    • Specify 'AlternateFileSystemRoots' as a cell array of string vectors.

      {["Z:\datasets", "/mynetwork/datasets"];...
       ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}

    • Alternatively, specify 'AlternateFileSystemRoots' as a cell array of cell array of character vectors.

      {{'Z:\datasets','/mynetwork/datasets'};...
       {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}

The value of 'AlternateFileSystemRoots' must satisfy these conditions:

  • Contains one or more rows, where each row specifies a set of equivalent root paths.

  • Each row specifies multiple root paths and each root path must contain at least two characters.

  • Root paths are unique and are not subfolders of one another.

  • Contains at least one root path entry that points to the location of the files.

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

Output data type of text variables, specified as the comma-separated pair consisting of 'TextType' and either 'char' or 'string'.

  • If the output table from the read, readall, or preview functions contains text variables, then 'TextType' specifies the data type of those variables.

  • If 'TextType' is 'char', then the output is a cell array of character vectors.

  • If 'TextType' is 'string', then the output has type string.

Data Types: char | string

In addition to these name-value pairs, you also can specify any of the properties on this page as name-value pairs. Except for the Files property.

Properties

expand all

SpreadsheetDatastore properties describe the format of the files in a datastore object, and control how the data is read from the datastore. You can specify the value of SpreadsheetDatastore properties using name-value pair arguments when you create the datastore object. Except for the Files property. To view or modify a property after creating the object, use the dot notation.

File Properties

Files included in the datastore, resolved as a cell array of character vectors or a string array, where each character vector or string is a full path to a file. The location argument in the spreadsheetDatastore and datastore functions define these files.

The first file specified in the cell array determines the variable names and format information for all files in the datastore.

Example: {'C:\dir\data\file1.xls';'C:\dir\data\file2.xls'}

Data Types: cell | string

Number of lines to skip at the beginning of each sheet when reading, specified as a positive integer. When you also specify the Range property, NumHeaderLines is the number of lines to skip at the beginning of the specified block of data.

Data Types: double

Sheets in files, specified as a character vector, cell array of character vectors, string scalar, or string array containing sheet names, or as a numeric vector of sheet indices. The empty character vector '' indicates that all sheets in the files are included.

Example: {'sheet1','sheet7'}

Example: [3 5 7]

Data Types: char | cell | string | double

Row and column bounds, specified as a character vector or string scalar that defines a rectangular block of data in the sheets. The empty character vector '' indicates that the bounds are the beginning of the file and the end of the data.

Example: 'B1:T7'

Example: 'A:C'

Data Types: char | string

Indicator for reading the first row of the first file in the datastore as variable names, specified as either true (1) or false (0).

  • If true, then the first nonheader row of the first file determines the variable names for the data.

  • If false, then the first nonheader row of the first file contains the first row of data. Default variable names are assigned as Var1, Var2, and so on.

Data Types: logical | double

Names of variables in the datastore, specified as a character vector, cell array of character vectors, string scalar, or string array. Specify the variable names in the order in which they appear in the files. If you do not specify the variable names, they are detected from the first nonheader line in the first file of the datastore. You can specify VariableNames with a character vector or string scalar, however it is automatically converted to a cell array of character vectors or a string array. When modifying the VariableNames property, the number of new variable names must match the number of original variable names.

If ReadVariableNames is false, then VariableNames defaults to {'Var1','Var2', ...}.

Example: {'Time','Date','Quantity'}

Data Types: char | cell | string

Variable types, specified as 'double', 'char', 'string', 'categorical', or 'datetime', which indicates the type of each variable when reading the data.

The list of variable types corresponds with the variables in VariableNames. Types double, char, and datetime can be automatically detected from the data. You can specify VariableTypes as a character vector or string scalar, however spreadsheetDatastore automatically converts it to a cell array of character vectors or a string array.

Example: {'char','categorical'}

Data Types: char | cell | string

Properties for preview, read, readall Table

Variables to read from the file, specified as a character vector, cell array of character vectors, string scalar, or string array. Each character vector or string contains the name of one variable. You can specify the variable names in any order. You can specify SelectedVariableNames with a character vector or string scalar, however spreadsheetDatastore automatically converts it to a cell array of character vectors or a string array.

Example: {'Var3','Var7','Var4'}

Data Types: char | cell | string

Selected variable types, specified as 'double', 'char', 'string', 'categorical', or 'datetime', which indicates the type of each selected variable when reading the data. The list of variable types corresponds with the variables in SelectedVariableNames. Types double, char, and datetime can be automatically detected from the data. You can specify SelectedVariableTypes as a character vector or string scalar, however it is automatically converted to a cell array of character vectors or string array.

Example: {'double','datetime'}

Data Types: char | cell | string

Amount of data to read in a call to the read function, specified as 'file' or 'sheet', or as a positive integer scalar.

  • If ReadSize is 'file', then each call to read reads all the data one file at a time.

  • If ReadSize is 'sheet', then each call to read reads all the data one sheet at a time.

  • If ReadSize is a positive integer, then each call to read reads the rows specified by ReadSize, or fewer if it reaches the end of the data.

When you change ReadSize from an integer scalar to 'file' or 'sheet', or conversely, the datastore resets using the reset function.

Data Types: char | string | double

Object Functions

hasdataDetermine if data is available to read
numpartitionsNumber of datastore partitions
partitionPartition a datastore
previewSubset of data in datastore
readRead data in datastore
readallRead all data in datastore
resetReset datastore to initial state
sheetnamesQuery sheet names from datastore

Examples

collapse all

ssds = datastore('airlinesmall_subset.xlsx')
ssds = 

  SpreadsheetDatastore with properties:

                      Files: {
                             ' ...\matlab\toolbox\matlab\demos\airlinesmall_subset.xlsx'
                             }
                     Sheets: ''
                      Range: ''

  Sheet Format Properties:
             NumHeaderLines: 0
          ReadVariableNames: true
              VariableNames: {'Year', 'Month', 'DayofMonth' ... and 26 more}
              VariableTypes: {'double', 'double', 'double' ... and 26 more}

  Properties that control the table returned by preview, read, readall:
      SelectedVariableNames: {'Year', 'Month', 'DayofMonth' ... and 26 more}
      SelectedVariableTypes: {'double', 'double', 'double' ... and 26 more}
                   ReadSize: 'file'

Create a SpreadsheetDatastore object containing the file airlinesmall_subset.xlsx.

ssds = spreadsheetDatastore('airlinesmall_subset.xlsx')
ssds = 

  SpreadsheetDatastore with properties:

                      Files: {
                             ' ...\matlab\toolbox\matlab\demos\airlinesmall_subset.xlsx'
                             }
                     Sheets: ''
                      Range: ''

  Sheet Format Properties:
             NumHeaderLines: 0
          ReadVariableNames: true
              VariableNames: {'Year', 'Month', 'DayofMonth' ... and 26 more}
              VariableTypes: {'double', 'double', 'double' ... and 26 more}

  Properties that control the table returned by preview, read, readall:
      SelectedVariableNames: {'Year', 'Month', 'DayofMonth' ... and 26 more}
      SelectedVariableTypes: {'double', 'double', 'double' ... and 26 more}
                   ReadSize: 'file'

Display the sheet names for the file. The file contains one sheet per year.

sheetnames(ssds,1)
ans = 

  Columns 1 through 7

    '1996'    '1997'    '1998'    '1999'    '2000'    '2001'    '2002'

  Columns 8 through 13

    '2003'    '2004'    '2005'    '2006'    '2007'    '2008'

Specify the variable FlightNum in the second sheet as the data of interest, and preview the first eight rows.

ssds.Sheets = 2;
ssds.SelectedVariableNames = 'FlightNum';
preview(ssds)
ans = 

    FlightNum
    _________

    1014     
    1201     
     702     
    1184     
    1310     
    1759     
    1242     
    1558     

Read only the first three rows of variables DepTime and ArrTime in the first sheet.

ssds.ReadSize = 3;
ssds.Sheets = 1;
ssds.SelectedVariableNames = {'DepTime','ArrTime'};
read(ssds)
ans = 

    DepTime    ArrTime
    _______    _______

    2117       2305   
    1252       1511   
    1441       1708   

Read all of sheets four, five, and six.

ssds.Sheets = 4:6;
readall(ssds);

Alternatives

You also can create a SpreadsheetDatastore object using the datastore function. For example, ds = datastore(location,'Type','spreadsheet') creates a datastore from a collection of files specified by location.

Introduced in R2016a