This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

matlab.io.datastore.DsFileSet class

Package: matlab.io.datastore

File-set object for collection of files in datastore

Description

The DsFileSet object helps you manage the iterative processing of large collections of files. Use the DsFileSet object together with the DsFileReader object to manage and read files from your datastore.

Construction

fs = matlab.io.datastore.DsFileSet(location) returns a DsFileSet object for a collection of files based on the specified location.

fs = matlab.io.datastore.DsFileSet(location,Name,Value) specifies additional parameters for the DsFileSet object using one or more name-value pair arguments. Name also can be a property name, and Value is the corresponding value. Name must appear inside single quotes (''). You can specify several name-value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Input Arguments

expand all

Files or folders to include in the file-set object, specified as a character vector, cell array of character vectors, string, or a struct. If the files are not in the current folder, then location must be full or relative paths. Files within subfolders of the specified folder are not automatically included in the file-set object.

Typically for a Hadoop® workflow, when you specify location as a struct, it must contain the fields FileName, Offset, and Size. This requirement enables you to use the location argument directly with the initializeDatastore method of the matlab.io.datastore.HadoopFileBased class. For an example, see Add Support for Hadoop.

You can use the wildcard character (*) when specifying location. Specifying this character includes all matching files or all files in the matching folders in the file-set object.

If the files are not available locally, then the full path of the files or folders must be an internationalized resource identifier (IRI), such as
hdfs://hostname:portnumber/path_to_file.

Data Types: char | cell | string | struct

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'FileExtensions',{'.jpg','.tif'} includes all files with a .jpg or .tif extension in the FileSet object.

File extensions, specified as the comma-separated pair consisting of 'FileExtensions' and a character vector, cell array of character vectors, or string. You can use the empty quotes '' to represent files without extensions.

If 'FileExtensions' is not specified, then DsFileSet automatically includes all file extensions.

Example: 'FileExtensions','.jpg'

Example: 'FileExtensions',{'.txt','.csv'}

Data Types: char | cell | string

Subfolder inclusion flag, specified as the comma-separated pair consisting of 'IncludeSubfolders' and true or false. Specify true to include all files and subfolders within each folder or false to include only the files within each folder.

Example: 'IncludeSubfolders',true

Data Types: logical | double

Alternate file system root paths, specified as the comma-separated pair consisting of 'AlternateFileSystemRoots' and a string vector or a cell array. Use 'AlternateFileSystemRoots' when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Distributed Computing Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use 'AlternateFileSystemRoots' to associate the root paths.

  • To associate a set of root paths that are equivalent to one another, specify 'AlternateFileSystemRoots' as a string vector. For example,

    ["Z:\datasets","/mynetwork/datasets"]

  • To associate multiple sets of root paths that are equivalent for the datastore, specify 'AlternateFileSystemRoots' as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:

    • Specify 'AlternateFileSystemRoots' as a cell array of string vectors.

      {["Z:\datasets", "/mynetwork/datasets"];...
       ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}

    • Alternatively, specify 'AlternateFileSystemRoots' as a cell array of cell array of character vectors.

      {{'Z:\datasets','/mynetwork/datasets'};...
       {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}

The value of 'AlternateFileSystemRoots' must satisfy these conditions:

  • Contains one or more rows, where each row specifies a set of equivalent root paths.

  • Each row specifies multiple root paths and each root path must contain at least two characters.

  • Root paths are unique and are not subfolders of one another.

  • Contains at least one root path entry that points to the location of the files.

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

Properties

expand all

This property is read-only.

Number of files in the file-set object, specified as a numeric scalar.

Example: fs.NumFiles

Data Types: double

This property is read-only.

Split size, specified as 'file' or a numeric scalar.

The value assigned to FileSplitSize dictates the output from the nextfile method.

  • If FileSplitSize is 'file', then the nextfile method returns a table with FileName, FileSize, Offset, and SplitSize. The value of SplitSize is set equal to the FileSize.

  • If FileSplitSize is a numeric scalar n, then the nextfile method returns FileName, FileSize, Offset, and SplitSize. The value of SplitSize is set equal to the FileSplitSize. This information is used to read n bytes of the file. Subsequent calls to the nextfile method return information to help read the next n bytes of the same file until the end of the file.

Example: 'FileSplitSize',20

Data Types: double | char

Methods

matlab.io.datastore.DsFileSet.hasfile Determine if more files are available in file-set object
matlab.io.datastore.DsFileSet.maxpartitions Maximum number of partitions
matlab.io.datastore.DsFileSet.nextfile Information on next file or file chunk
matlab.io.datastore.DsFileSet.partition Partition file-set object
matlab.io.datastore.DsFileSet.reset Reset the file-set object
matlab.io.datastore.DsFileSet.resolve Information on all files in file-set object

Examples

collapse all

Create a file-set object, get file information one file at time, or get information for all the files in the file-set object.

Create a file-set object for all the .mat files from the demos folder.

folder = fullfile(matlabroot,'toolbox','matlab','demos');
fs = matlab.io.datastore.DsFileSet(folder,...
                 'IncludeSubfolders',true,...
                 'FileExtensions','.mat');

Obtain information for the first and second file from the file-set object.

fTable1 = nextfile(fs) ; % first file 
fTable2 = nextfile(fs) ; % second file

Obtain information on all the files by getting information for one file at a time and collect the information into a table.

ft = cell(fs.NumFiles,1); % using cell for efficiency
i = 1;
reset(fs); % reset to the beginning of the fileset
while hasfile(fs)                 
    ft{i} = nextfile(fs);
    i = i + 1;
end
allFiles = vertcat(ft{:});

Alternatively, obtain information on all files at the same time.

allfiles = resolve(fs);

Tips

  • If you use the DsFileSet object as a property in your custom datastore, then implement the copyElement method. Implementing the copyElement method enables you to create a deep copy of the datastore object. For more information, see Customize Copy Operation. For an example implementation of the copyElement method, see Develop Custom Datastore.

Introduced in R2017b