matlab.io.datastore.BlockedFileSet
Description
The matlab.io.datastore.BlockedFileSet
object helps you
process a large collection of blocks within files when moving through the files iteratively.
Use the BlockedFileSet
object together with the DsFileReader
object
to manage and read files from your datastore.
Creation
Syntax
Description
creates a bs
= matlab.io.datastore.BlockedFileSet(location
)BlockedFileSet
object for a collection of blocks within files
based on the specified location.
specifies the file extension, subfolders, or sets object properties. You can specify
multiple name-value pairs. Enclose names in quotes.bs
= matlab.io.datastore.BlockedFileSet(location
,Name,Value
)
Input Arguments
location
— Files or folders to include
character vector | cell array of character vectors | string array | structure
Files or folders to include in the BlockedFileSet
object,
specified as a character vector, cell array of character vectors, string array, or a
structure. If the files are not in the current folder, then
location
must be a full or relative path. Files within subfolders
of the specified folder are not automatically included in the
BlockedFileSet
object.
Typically for a Hadoop® workflow, when you specify location
as a
structure, it must contain the fields FileName
,
Offset
, and Size
. This requirement enables you
to use the location
argument directly with the initializeDatastore
method of the matlab.io.datastore.HadoopLocationBased
class. For an example, see Add Support for Hadoop.
You can use the wildcard character (*) when specifying
location
. Specifying this character includes all matching files or
all files in the matching folders in the file-set object.
If the files are not available locally, then the full path of the files or folders
must be a uniform resource locator (URL), such
as
hdfs://
.hostname
:portnumber
/path_to_file
Data Types: char
| cell
| string
| struct
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: bs =
matlab.io.datastore.BlockedFileSet(location,'IncludeSubfolders',true)
FileExtensions
— File extensions
character vector | cell array of character vectors | string array
File extensions, specified as a character vector, cell array of character
vectors, or string array. You can use the empty quotes ''
to
represent files without extensions.
If 'FileExtensions'
is not specified, then
BlockedFileSet
automatically includes all file
extensions.
Example: 'FileExtensions','.jpg'
Example: 'FileExtensions',{'.txt','.csv'}
IncludeSubfolders
— Subfolder inclusion flag
0
or false
(default) | 1
or true
Subfolder inclusion flag, specified as a numeric or logical 1
(true
) or 0
(false
).
Specify true
to include all files and subfolders within each
folder or false
to include only the files within each
folder.
Example: 'IncludeSubfolders',true
Properties
BlockSize
— Block size
'file'
(default) | numeric scalar
Block size in bytes to be used to split file information, specified as one of these values:
'file'
— Use size of next file in the collection.numeric scalar — Use specified value in bytes.
Example: 'BlockSize',2000
AlternateFileSystemRoots
— Alternate file system root paths
string array | cell array
Alternate file system root paths, specified as a string array or a cell array. Use
'AlternateFileSystemRoots'
when you create a datastore on a local
machine, but need to access and process the data on another machine (possibly of a
different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB®
Parallel Server™, and the data is stored on your local machines with a copy of the data
available on different platform cloud or cluster machines, you must use
'AlternateFileSystemRoots'
to associate the root paths.
To associate a set of root paths that are equivalent to one another, specify
'AlternateFileSystemRoots'
as a string array. For example,["Z:\datasets","/mynetwork/datasets"]
To associate multiple sets of root paths that are equivalent for the datastore, specify
'AlternateFileSystemRoots'
as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string array or a cell array of character vectors. For example:Specify
'AlternateFileSystemRoots'
as a cell array of string arrays.{["Z:\datasets", "/mynetwork/datasets"];... ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}
Alternatively, specify
'AlternateFileSystemRoots'
as a cell array of cell array of character vectors.{{'Z:\datasets','/mynetwork/datasets'};... {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}
The value of 'AlternateFileSystemRoots'
must satisfy these
conditions:
Contains one or more rows, where each row specifies a set of equivalent root paths.
Each row specifies multiple root paths and each root path must contain at least two characters.
Root paths are unique and are not subfolders of one another.
Contains at least one root path entry that points to the location of the files.
For more information, see Set Up Datastore for Processing on Different Machines or Clusters.
Example: ["Z:\datasets","/mynetwork/datasets"]
Data Types: string
| cell
NumBlocks
— Number of blocks
numeric scalar
This property is read-only.
Number of blocks in the blocked file-set object, specified as a numeric scalar.
Example: bs.NumBlocks
Data Types: double
NumBlocksRead
— Number of blocks read
numeric scalar
This property is read-only.
Number of blocks read from the BlockedFileSet
object, specified
as a numeric scalar.
Example: bs.NumBlocksRead
Data Types: double
BlockInfo
— Information about blocks
matlab.io.datastore.BlockedInfo
object
This property is read-only.
Information about blocks in the
matlab.io.datastore.BlockedFileSet
object, returned as a
matlab.io.datastore.BlockedInfo
object with these properties:
Filename
— Name of the file in theBlockedFileSet
object. The name contains the full path of the file.FileSize
— Size of the file in number of bytes.Offset
— Starting offset within the file to be read.BlockSize
— Size of the block in number of bytes.
For information about a specific block, specify the block index. For example,
bs.BlockInfo(2)
returns information for the second block. If you
call bs.BlockInfo
specifying (:)
or without
specifying an index, it returns information for all of the blocks.
Example: bs.BlockInfo(2)
Object Functions
hasPreviousBlock | Determine if blocked file-set has previous block |
previousblock | Information on previous block in blocked file-set |
hasNextBlock | Determine if blocked file-set has another block |
nextblock | Information on next block in blocked file-set |
progress | Determine how many blocks or files have been read |
maxpartitions | Maximum number of partitions |
partition | Partition file-set object |
subset | Create subset of datastore or FileSet |
reset | Reset the file-set object |
Examples
Create a Blocked File-Set and Get Information on All Files
Create a blocked file-set and query information for specific blocks in the blocked file-set.
Create a blocked file-set bs
for a collection of files and specify the block size.
folder = {'accidents.mat','airlineResults.mat','census.mat','earth.mat'}
folder = 1x4 cell
{'accidents.mat'} {'airlineResults.mat'} {'census.mat'} {'earth.mat'}
bs = matlab.io.datastore.BlockedFileSet(folder,'BlockSize',2000)
bs = BlockedFileSet with properties: NumBlocks: 98 NumBlocksRead: 0 BlockSize: 2000 BlockInfo: BlockInfo for all 98 blocks AlternateFileSystemRoots: {}
Obtain information for specific blocks using either the nextblock
function or by querying the BlockInfo
property and specifying an index. Obtain information for consecutive blocks using nextblock
. For example, obtain information for the first two blocks in the set.
blk1 = nextblock(bs)
blk1 = 1x1 BlockInfo Filename FileSize Offset BlockSize _________________________________________________________________________________________________________________ ________ ______ _________ "/mathworks/devel/bat/filer/batfs2561-0/Bdoc24b.2679053/build/runnable/matlab/toolbox/matlab/demos/accidents.mat" 7343 0 2000
blk2 = nextblock(bs)
blk2 = 1x1 BlockInfo Filename FileSize Offset BlockSize _________________________________________________________________________________________________________________ ________ ______ _________ "/mathworks/devel/bat/filer/batfs2561-0/Bdoc24b.2679053/build/runnable/matlab/toolbox/matlab/demos/accidents.mat" 7343 2000 2000
Query the BlockInfo
property to get information about the last block in the set.
lastblk = bs.BlockInfo(98)
lastblk = 1x1 BlockInfo Filename FileSize Offset BlockSize _____________________________________________________________________________________________________________ ________ ______ _________ "/mathworks/devel/bat/filer/batfs2561-0/Bdoc24b.2679053/build/runnable/matlab/toolbox/matlab/demos/earth.mat" 32522 32000 522
Version History
Introduced in R2020a
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)