Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

datastore

Create datastore for large collections of data

Syntax

ds = datastore(location)
ds = datastore(location,Name,Value)

Description

ds = datastore(location) creates a datastore from the collection of data specified by location. A datastore is a repository for collections of data that are too large to fit in memory. After creating ds, you can read and process the data.

example

ds = datastore(location,Name,Value) specifies additional parameters for ds using one or more name-value pair arguments. For example, you can create a datastore for image files by specifying 'Type','image'.

Examples

collapse all

Create a datastore associated with the sample file airlinesmall.csv. This file contains airline data from the years 1987 through 2008.

To manage the import of missing data in numeric columns, use the 'TreatAsMissing' name-value pair argument. In this example, specifying the value 'NA' for 'TreatAsMissing', replaces every instance of 'NA' with a NaN in the imported data. Where, NaN is the value specified in the 'MissingValue' property of the datastore.

ds = datastore('airlinesmall.csv', ...
                   'TreatAsMissing','NA')
ds = 
  TabularTextDatastore with properties:

                      Files: {
                             ' .../devel/bat/Bdoc17a/build/matlab/toolbox/matlab/demos/airlinesmall.csv'
                             }
               FileEncoding: 'UTF-8'
          ReadVariableNames: true
              VariableNames: {'Year', 'Month', 'DayofMonth' ... and 26 more}

  Text Format Properties:
             NumHeaderLines: 0
                  Delimiter: ','
               RowDelimiter: '\r\n'
             TreatAsMissing: 'NA'
               MissingValue: NaN

  Advanced Text Format Properties:
            TextscanFormats: {'%f', '%f', '%f' ... and 26 more}
                   TextType: 'char'
         ExponentCharacters: 'eEdD'
               CommentStyle: ''
                 Whitespace: ' \b\t'
    MultipleDelimitersAsOne: false

  Properties that control the table returned by preview, read, readall:
      SelectedVariableNames: {'Year', 'Month', 'DayofMonth' ... and 26 more}
            SelectedFormats: {'%f', '%f', '%f' ... and 26 more}
                   ReadSize: 20000 rows

datastore creates a TabularTextDatastore.

Create a datastore containing all .tif files in the MATLAB® path and its subfolders.

ds = datastore(fullfile(matlabroot, 'toolbox', 'matlab'),...
'IncludeSubfolders', true,'FileExtensions', '.tif','Type', 'image')
ds = 

  ImageDatastore with properties:

       Files: {
              ' ...\matlab\toolbox\matlab\demos\example.tif';
              ' ...\matlab\toolbox\matlab\imagesci\corn.tif'
              }
    ReadSize: 1
      Labels: {}
     ReadFcn: @readDatastoreImage

Input Arguments

collapse all

Files or folders to include in the datastore, specified as a character vector or cell array of character vectors. If the files are not in the current folder, then location must be full or relative paths. Files within subfolders of the specified folder are not automatically included in the datastore.

You can use the wildcard character (*) when specifying location. This character indicates that all matching files or all files in the matching folders are included in the datastore.

If the files are not available locally, then the full path of the files or folders must be an internationalized resource identifier (IRI), such as
hdfs://hostname:portnumber/path_to_file.

For information on using datastore with Amazon S3™ and HDFS™, see Read Remote Data.

    Note:   When reading from HDFS or when reading Sequence files locally, the datastore function calls the javaaddpath command. This command does the following:

    • Clears the definitions of all Java® classes defined by files on the dynamic class path

    • Removes all global variables and variables from the base workspace

    • Removes all compiled scripts, functions, and MEX-functions from memory

    To prevent persistent variables, code files, or MEX-files from being cleared, use the mlock function.

For KeyValueDatastore, the files must be MAT-files or Sequence files generated by the mapreduce function. MAT-files must be in a local file system or in a network file system. Sequence files can be in a local, network, or HDFS file system. For DatabaseDatastore, the location argument need not be files. For more information, see DatabaseDatastore.

Example: 'file1.csv'

Example: '../dir/data/file1.jpg'

Example: {'C:\dir\data\file1.xls','C:\dir\data\file2.xlsx'}

Example: 'C:\dir\data\*.mat'

Example: 'hdfs://myserver:7867/data/file1.txt'

Data Types: char | cell

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'FileExtensions',{'.jpg','.tif'} includes all extensions with a .jpg or .tif extension for an ImageDatastore object.

collapse all

Type of datastore, specified as the comma-separated pair consisting of 'Type' and one of the following:

Value of 'Type'Description
'tabulartext'Text files containing tabular data. The encoding of the data must be ASCII or UTF-8.
'image'Image files in a format such as JPEG or PNG. Acceptable files include imformats formats.
'spreadsheet'Spreadsheet files containing one or more sheets.
'keyvalue'Key-value pair data contained in MAT-files or Sequence files with data generated by mapreduce.
'file'Custom format files, which require a specified read function to read the data. For more information, see FileDatastore.
'tall'MAT-files or Sequence files produced by the write function of the tall data type. For more information see, TallDatastore.
'database'Data stored in database. Requires Database Toolbox™. Requires specification of additional input argument when using the type parameter. For more information, see DatabaseDatastore.

  • If there are multiple types that support the format of the files, then use the 'Type' argument to specify a datastore type.

  • If you do not specify a value for 'Type', then datastore automatically determines the appropriate type of datastore to create based on the extensions of the files.

Data Types: char

Include subfolders within a folder, specified as the comma-separated pair consisting of 'IncludeSubfolders' and true (1) or false (0). Specify true to include all files and subfolders within each folder or false to include only the files within each folder.

When you do not specify 'IncludeSubfolders', then the default value is false.

The 'IncludeSubfolders' name-value pair is only valid when creating these objects:

  • TabularTextDatastore

  • ImageDatastore

  • SpreadsheetDatastore

  • FileDatastore

  • KeyValueDatastore

Example: 'IncludeSubfolders',true

Data Types: logical | double

Extensions of files, specified as the comma-separated pair consisting of 'FileExtensions' and a character vector or cell array of character vectors. When specifying 'FileExtensions', also specify 'Type'. You can use the empty quotes '' to represent files without extensions.

If 'FileExtensions' is not specified, then datastore automatically includes all supported file extensions depending on the datastore type. If you want to include unsupported extensions, then specify each extension you want to include individually.

  • For TabularTextDatastore objects, supported extensions include .txt, .csv, .dat, .dlm, .asc, .text, and no extension.

  • For ImageDatastore objects, supported extensions include all imformats extensions.

  • For SpreadsheetDatastore objects, supported extensions include .xls, .xlsx, .xlsm, .xltx, and .xltm.

  • For TallDatastore objects, supported extensions include .mat and .seq.

The 'FileExtensions' name-value pair is only valid when creating these objects:

  • TabularTextDatastore

  • ImageDatastore

  • SpreadsheetDatastore

  • FileDatastore

  • KeyValueDatastore

Example: 'FileExtensions','.jpg'

Example: 'FileExtensions',{'.txt','.text'}

Data Types: char | cell

Output data type of text variables, specified as the comma-separated pair consisting of 'TextType' and either 'char' or 'string'. If the output table from the read, readall, or preview functions contains text variables, then 'TextType' specifies the data type of those variables for TabularTextDatastore and SpreadsheetDatastore objects only. If 'TextType' is 'char', then the output is a cell array of character vectors. If 'TextType' is 'string', then the output has type string.

Data Types: char

Type for imported date and time data, specified as the comma-separated pair consisting of 'DatetimeType' and one of these values: 'datetime' or 'text'. The 'DatetimeType' argument only applies when creating a TabularTextDatastore object.

ValueType for Imported Date and Time Data
'datetime'

MATLAB® datetime data type

For more information, see datetime.

'text'

If 'DatetimeType' is specified as 'text', then the type for imported date and time data depends on the value specified in the 'TextType' parameter:

  • If 'TextType' is 'char', then the datastore returns dates as a cell array of character vectors.

  • If 'TextType' is 'string', then the datastore returns dates as an array of strings.

Example: 'DatetimeType','datetime'

Data Types: char

In addition to these name-value pairs, you also can specify any of the properties on the following pages as name-value pairs, except for the Files property:

Output Arguments

collapse all

Datastore for a collection of data, returned as one of these objects: TabularTextDatastore, ImageDatastore, SpreadsheetDatastore, KeyValueDatastore, FileDatastore, TallDatastore or DatabaseDatastore. The type of the datastore depends on the type of files or the location argument. For more information, click the datastore name in the following table:

Type Output
Text filesTabularTextDatastore
Image filesImageDatastore
Spreadsheet filesSpreadsheetDatastore
MAT-files or Sequence files produced by mapreduceKeyValueDatastore
Custom format filesFileDatastore
MAT-files or Sequence files produced by the write function of the tall data type. TallDatastore
DatabaseDatabaseDatastore

For each of these datastore types, the Files property is a cell array of character vectors. Each character vector is an absolute path to a file resolved by the location argument.

Introduced in R2014b

Was this topic helpful?