Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

matlab.io.Datastore class

Package: matlab.io

Base datastore class

Description

matlab.io.Datastore is an abstract class for creating a custom datastore. A datastore helps access large collections of data iteratively, especially when data is too large to fit in memory. The Datastore abstract class declares and captures the interface expected for all custom datastores in MATLAB®. Derive your class using this syntax:

classdef MyDatastore < matlab.io.Datastore
    ...
end

To implement your custom datastore:

  • Inherit from the class matlab.io.Datastore

  • Define the four required methods: hasdata, read, reset, and progress

For more details and steps to create your custom datastore, see Develop Custom Datastore.

Methods

read

Read data from the datastore.

[data,info] = read(ds)

The data output can be any data type and must be vertically concatenable. It is recommended that the info output is a structure.

The data type of data dictates the data type of the output of tall.

Access: Public, Abstract: true

hasdata

Determine if data is available to read. The output is of type logical.

tf = hasdata(ds)

Access: Public, Abstract: true

reset

Reset the datastore to an initial state where no data has been read.

reset(ds)

Access: Public, Abstract: true

progress

Determine how much data has been read.

The output is a scalar double between 0 and 1. A return value of 0.55 means that you have read 55% of the data.

p = progress(ds)

Access: Public, Abstract: true

preview

Return a subset of the data.

data = preview(ds)

The default implementation returns the first 8 rows of data. The output has the same data type as the output of read.

The default implementation is not optimized for tall array construction. For improved tall array performance, implement a more efficient version of this method.

Access: Public

readall

Read all data in the datastore.

data = readall(ds)

The output has the same data type as the output of read. If the data does not fit in memory, readall returns an error.

The default implementation is not optimized for tall array construction. For improved tall array performance, implement a more efficient version of this method.

Access: Public

Properties

If you add handle properties to your custom datastore, then you must implement the copyElement method. For example, if you use the DsFileSet object as a property in your custom datastore, then you must implement the copyElement method. For more information on customizing copy operations, see Customize Copy Operation. For an example implementation of the copyElement method, see Develop Custom Datastore.

Attributes

Sealedfalse

To learn about attributes of classes, see Class Attributes.

Examples

expand all

Build a datastore to bring your custom or proprietary data into MATLAB® for serial processing, and then preview and read the data.

This example uses a simple data set to illustrate a workflow that you can use to build a custom datastore for your own data. The data set is a collection of 15 binary (.bin) files where each file contains a column (1 variable) and 10000 rows (records) of unsigned integers.

dir('*.bin')
binary_data01.bin  binary_data05.bin  binary_data09.bin  binary_data13.bin  
binary_data02.bin  binary_data06.bin  binary_data10.bin  binary_data14.bin  
binary_data03.bin  binary_data07.bin  binary_data11.bin  binary_data15.bin  
binary_data04.bin  binary_data08.bin  binary_data12.bin  

Implement your custom datastore in your working folder or in a folder that is on the MATLAB® path. Then create a new script, MyDatastore.m that contains the code implementing your custom datastore. The name of the script file must be the same as the name of your object constructor function. For example, if you want your constructor function to have the name MyDatastore, then the name of the script file must be MyDatastore.m. The script must contain the following steps:

  • Step 1: Inherit from the datastore classes.

  • Step 2: Define the constructor and the required methods.

  • Step 3: Define your custom file reading function.

%% STEP 1: INHERIT FROM DATASTORE CLASSES
classdef MyDatastore < matlab.io.Datastore
    
    properties(Access = private)
        CurrentFileIndex double
        FileSet matlab.io.datastore.DsFileSet
    end
    
    
%% STEP 2: DEFINE THE CONSTRUCTOR AND THE REQUIRED METHODS
    methods
        % Define your datastore constructor
        function myds = MyDatastore(location)
            myds.FileSet = matlab.io.datastore.DsFileSet(location,...
                'FileExtensions','.bin', ...
                'FileSplitSize',8*1024);
            myds.CurrentFileIndex = 1;
            reset(myds);
        end
        
        % Define the hasdata method
        function tf = hasdata(myds)
            % Return true if more data is available
            tf = hasfile(myds.FileSet);
        end
        
        % Define the read method
        function [data,info] = read(myds)
            % Read data and information about the extracted data
            % See also: MyFileReader()
            if ~hasdata(myds)
                error('No more data');
            end
            
            fileInfoTbl = nextfile(myds.FileSet);
            data = MyFileReader(fileInfoTbl);
            info.Size = size(data);
            info.FileName = fileInfoTbl.FileName;
            info.Offset = fileInfoTbl.Offset;
            
            % Update CurrentFileIndex for tracking progress
            if fileInfoTbl.Offset + fileInfoTbl.SplitSize >= ...
                    fileInfoTbl.FileSize
                myds.CurrentFileIndex = myds.CurrentFileIndex + 1 ;
            end
        end
        
        % Define the reset method
        function reset(myds)
            % Reset to the start of the data
            reset(myds.FileSet);
            myds.CurrentFileIndex = 1;
        end
        
        % Define the progress method
        function frac = progress(myds)
            % Determine percentage of data that you have read
            % from a datastore
            frac = (myds.CurrentFileIndex-1)/myds.FileSet.NumFiles;
        end
    end
    
    
    methods(Access = protected)
        % If you use the  FileSet property in the datastore,
        % then you must define the COPYELEMENT method. The
        % copyelement method allows methods such as readall
        % and preview to remain stateless 
        function dscopy = copyElement(ds)
            dscopy = copyElement@matlab.mixin.Copyable(ds);
            dscopy.FileSet = copy(ds.FileSet);
        end
                
    end
end

%% STEP 3: IMPLEMENT YOUR CUSTOM FILE READING FUNCTION
function data = MyFileReader(fileInfoTbl)
% create a reader object using FileName
reader = matlab.io.datastore.DsFileReader(fileInfoTbl.FileName);

% seek to the offset
seek(reader,fileInfoTbl.Offset,'Origin','start-of-file');

% read fileInfoTbl.SplitSize amount of data
data = read(reader,fileInfoTbl.SplitSize);

end

Use your custom datastore to read data from folder and return the datastore object.

folder = fullfile('*.bin');
ds = MyDatastore(folder) ;

Preview the data from the datastore.

preview(ds)
ans =

  8x1 uint8 column vector

   113
   180
   251
    91
    29
    66
   254
   214

Read the data in a while loop and use the hasdata method to check if more data is available to read.

while hasdata(ds)
    data = read(ds);
    % do something
end

Reset the datastore to its initial state and read the data from the start of the datastore.

reset(ds);
data = read(ds);

Alternatively, if your data collection fits in memory, then read all the data in the datastore. Since the folder contains 15 files with 10000 records in each file, the size of the output should be 150000 records.

dataAll = readall(ds);
whos dataAll
  Name              Size             Bytes  Class    Attributes

  dataAll      150000x1             150000  uint8              

Introduced in R2017b

Was this topic helpful?