Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

fileDatastore

Datastore with custom file reader

Description

Use a FileDatastore object to manage large collections of custom format files where the collection does not necessarily fit in memory. You can create a FileDatastore object using the fileDatastore function, specify its properties, and then import and process the data using object functions.

Creation

Syntax

fds = fileDatastore(location,'ReadFcn',@fcn)
fds = fileDatastore(location,Name,Value)

Description

fds = fileDatastore(location,'ReadFcn',@fcn) creates a datastore from the collection of files specified by location and uses the function fcn to read the data from the files.

example

fds = fileDatastore(location,Name,Value) specifies additional parameters and properties for fds using one or more name-value pair arguments. For example, you can specify which files to include in the datastore depending on their extensions with fileDatastore(location,'ReadFcn',@customreader,'FileExtentions',{'.exts','.extx'}).

Input Arguments

expand all

Files or folders to include in the datastore, specified as a character vector, cell array of character vectors, string scalar, or string array. If the files are not in the current folder, then location must be full or relative paths. Files within subfolders of the specified folder are not automatically included in the datastore.

You can use the wildcard character (*) when specifying location. This character indicates that all matching files or all files in the matching folders are included in the datastore.

If the files are not available locally, then the full path of the files or folders must be an internationalized resource identifier (IRI) of the form
hdfs:///path_to_file.

For information on using datastore with Amazon S3™, Windows Azure® Blob Storage, and HDFS™, see Read Remote Data.

Example: 'file1.ext'

Example: '../dir/data/file1.ext'

Example: {'C:\dir\data\file1.exts','C:\dir\data\file2.extx'}

Example: 'C:\dir\data\*.ext'

Data Types: char | cell | string

Function that reads the file data, specified as a function handle. At a minimum, the function takes a file name as input, and then it outputs the corresponding file data. For example, if customreader is the specified function to read the file, then it must have a signature similar to the following:

function data = customreader(filename)
..
end
If there is more than one output argument, then the datastore uses only the first argument and ignores the rest.

The value specified in @fcn, sets the value of the ReadFcn property.

Example: @customreader

Data Types: function_handle

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: fds = fileDatastore('C:\dir\data','FileExtensions',{'.exts','.extx'})

expand all

Subfolder inclusion flag, specified as the comma-separated pair consisting of 'IncludeSubfolders' and true, false, 0, or 1. Specify true to include all files and subfolders within each folder or false to include only the files within each folder.

If you do not specify 'IncludeSubfolders', then the default value is false.

Example: 'IncludeSubfolders',true

Data Types: logical | double

Custom format file extensions, specified as the comma-separated pair consisting of 'FileExtensions' and a character vector, cell array of character vectors, string scalar, or string array.

When you specify a file extension, the fileDatastore function creates a datastore object only for files with the specified extension. You can also create a datastore for files without any extensions by specifying 'FileExtensions' as an empty character vector, ''. If you do not specify 'FileExtensions', then fileDatastore automatically includes all files within a folder.

Example: 'FileExtensions',''

Example: 'FileExtensions','.ext'

Example: 'FileExtensions',{'.exts','.extx'}

Data Types: char | cell | string

Alternate file system root paths, specified as the comma-separated pair consisting of 'AlternateFileSystemRoots' and a string vector or a cell array. Use 'AlternateFileSystemRoots' when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using PCTParallel Computing Toolbox™ and MATLAB® Distributed Computing Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use 'AlternateFileSystemRoots' to associate the root paths.

  • To associate a set of root paths that are equivalent to one another, specify 'AlternateFileSystemRoots' as a string vector. For example,

    ["Z:\datasets","/mynetwork/datasets"]

  • To associate multiple sets of root paths that are equivalent for the datastore, specify 'AlternateFileSystemRoots' as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:

    • Specify 'AlternateFileSystemRoots' as a cell array of string vectors.

      {["Z:\datasets", "/mynetwork/datasets"];...
       ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}

    • Alternatively, specify 'AlternateFileSystemRoots' as a cell array of cell array of character vectors.

      {{'Z:\datasets','/mynetwork/datasets'};...
       {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}

The value of 'AlternateFileSystemRoots' must satisfy these conditions:

  • Contains one or more rows, where each row specifies a set of equivalent root paths.

  • Each row specifies multiple root paths and each root path must contain at least two characters.

  • Root paths are unique and are not subfolders of one another.

  • Contains at least one root path entry that points to the location of the files.

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

Properties

expand all

FileDatastore properties describe the files associated with a FileDatastore object. You can specify the value of FileDatastore properties using name-value pair arguments when you create the object, except the Files property.To view or modify a property after creating the object, use the dot notation.

Files included in the datastore, resolved as a character vector, cell array of character vectors, string scalar, or string array, where each character vector or string is a full path to a file. The location argument in the fileDatastore and datastore functions defines Files when the datastore is created.

Example: {'C:\dir\data\file1.ext';'C:\dir\data\file2.ext'}

Example: 'hdfs:///data/*.mat'

Data Types: char | cell | string

Function that reads the file data, specified as a function handle. The function must take a file name as input, and then output the corresponding file data. For example, if customreader is the specified function to read the file, then it must have a signature similar to the following:

function data = customreader(filename)
...
end
If there is more than one output argument, then only the first is used and the rest are ignored.

Example: @customreader

Data Types: function_handle

This property is read-only.

Vertically concatenateable flag, specified as a logical true or false. Specify the value of this property when you first create the FileDatastore object.

true

Multiple reads of the FileDatastore object return uniform data that is vertically concatenateable.

When the UniformRead property value is true:

  • The ReadFcn function must return data that is vertically concatenateable ; otherwise, the readall method returns an error.

  • The underlying data type of the output of the tall function is the same as the data type of the output from ReadFcn.

false (default)

Multiple reads of the FileDatastore object do not return uniform data that is vertically concatenateable.

When the UniformRead property value is false:

  • readall returns a cell array.

  • tall returns a tall cell array.

Example: fds = fileDatastore(location,'ReadFcn',@load,'UniformRead',true)

Data Types: logical | double

Object Functions

hasdataDetermine if data is available to read
numpartitionsNumber of datastore partitions
partitionPartition a datastore
previewSubset of data in datastore
readRead data in datastore
readallRead all data in datastore
resetReset datastore to initial state

Examples

collapse all

Create a datastore containing all .mat files within the MATLAB® demos folder, specifying the load function to read the file data.

fds = fileDatastore(fullfile(matlabroot,'toolbox','matlab','demos'),'ReadFcn',@load,'FileExtensions','.mat')
fds = 

  FileDatastore with properties:

      Files: {
             ' ...\matlab\toolbox\matlab\demos\accidents.mat';
             ' ...\matlab\toolbox\matlab\demos\airfoil.mat';
             ' ...\matlab\toolbox\matlab\demos\airlineResults.mat'
              ... and 35 more
             }
    ReadFcn: @load

Read the first file in the datastore, and then read the second file.

data1 = read(fds);
data2 = read(fds);

Read all files in the datastore simultaneously.

readall(fds);

Initialize a cell array to hold the data and counter i.

dataarray = cell(numel(fds.Files), 1);
i = 1;

Reset the datastore to the first file and read the files one at a time until there is no data left. Assign the data to the array dataarray.

reset(fds);
while hasdata(fds)
    dataarray{i} = read(fds);
    i = i+1;
end

Create a datastore for files in the MATLAB® demos folder that have a .mat extension.

fds = fileDatastore(fullfile(matlabroot,'toolbox','matlab','demos'),'ReadFcn',@load,'FileExtensions','.mat')
fds = 

  FileDatastore with properties:

      Files: {
             ' ...\matlab\toolbox\matlab\demos\accidents.mat';
             ' ...\matlab\toolbox\matlab\demos\airfoil.mat';
             ' ...\matlab\toolbox\matlab\demos\airlineResults.mat'
              ... and 35 more
             }
    ReadFcn: @load

Tips

  • The FileDatastore object is designed to read data from files and reads one complete file at a time. To read a subset of data from a large file or to read from a data stream, you must build your own custom datastore. For more information, see Develop Custom Datastore.

Alternatives

You also can create a FileDatastore object using the datastore function. For example, ds = datastore(location,'Type','file','ReadFcn',@fcn) creates a datastore from a collection of files specified by location.

Introduced in R2016a

Was this topic helpful?