Products & Services Solutions Academia Support User Community Company

Learn more about MATLAB   

Importing Scientific Data Files

Importing Common Data File Format (CDF) Files

CDF was created by the National Space Science Data Center (NSSDC) to provide a self-describing data storage and manipulation format that matches the structure of scientific data and applications (i.e., statistical and numerical methods, visualization, and management). For more information about this format, see the CDF Web site.

To import data into the MATLAB workspace from a Common Data Format (CDF) file, use the cdfread function. Using this function, you can import all the data in the file, specific variables, specific records, or subsets of the data in a specific variable. The following examples illustrate some of these capabilities.

  1. To get information about the contents of a CDF file, such as the names of variables in the CDF file, use the cdfinfo function. The cdfinfo function returns a structure containing general information about the file and detailed information about the variables and attributes in the file.

    In this example, the Variables field indicates that the file contains five variables. The first variable, Time, is made up of 24 records containing CDF epoch data. The next two variables, Longitude and Latitude, have only one associated record containing int8 data. For details about how to interpret the data returned in the Variables field, see cdfinfo.

      Note   Because cdfinfo creates temporary files, make sure that your current working directory is writable before attempting to use the function.

    info = cdfinfo('example.cdf')
    
    info = 
    
                  Filename: 'example.cdf'
               FileModDate: '09-Mar-2001 16:45:22'
                  FileSize: 1240
                    Format: 'CDF'
             FormatVersion: '2.7.0'
              FileSettings: [1x1 struct]
                  Subfiles: {}
                 Variables: {5x6 cell}
          GlobalAttributes: [1x1 struct]
        VariableAttributes: [1x1 struct]
    
    vars = info.Variables
    
    vars = 
    
      Columns 1 through 5
    
       'Time'                [1x2 double]   [24]   'epoch'    'T/'
       'Longitude'           [1x2 double]   [ 1]   'int8'     'F/FT'
       'Latitude'            [1x2 double]   [ 1]   'int8'     'F/TF'
       'Data'                [1x3 double]   [ 1]   'double'   'T/TTT'
       'multidimensional     [1x4 double]   [ 1]   'uint8'    'T/TTTT'
    
      Column 6
    
        'Full'
        'Full'
        'Full'
        'Full'
        'Full'
  2. To read all of the data in the CDF file, use the cdfread function. The function returns the data in a 24-by-5 cell array. The five columns of data correspond to the five variables; the 24 rows correspond to the 24 records associated with the Time variable and padding elements for the rows associated with the other variables. The padding value used is specified in the CDF file.

    data = cdfread('example.cdf');
    
    whos data
      Name       Size            Bytes  Class    Attributes
    
      data      24x5             14784  cell   
  3. To read the data associated with a particular variable, use the 'Variable' parameter, specifying a cell array of variable names as the value of this parameter. Variable names are case sensitive. For example, the following code reads the Longitude and Latitude variables from the file. The return value data is a 24-by-2 cell array, where each cell contains int8 data.

    var_time = cdfread('example.cdf','Variable',{'Longitude','Latitude'});
    
    whos var_time
      Name           Size            Bytes  Class    Attributes
    
      var_time      24x1              4608  cell               
    

Speeding Up Read Operations

The cdfread function offers two ways to speed up read operations when working with large data sets:

To reduce the number of elements in the returned cell array, specify the 'CombineRecords' parameter. By default, cdfread creates a cell array with a separate element for every variable and every record in each variable, padding the records dimension to create a rectangular cell array. For example, reading all the data from the example file produces an output cell array, 24-by-5, where the columns represent variables and the rows represent the records for each variable. When you set the 'CombineRecords' parameter to true, cdfread creates a separate element for each variable but saves time by putting all the records associated with a variable in a single cell array element. Thus, reading the data from the example file with 'CombineRecords' set to true produces a 1-by-5 cell array, as shown below.

data_combined = cdfread('example.cdf','CombineRecords',true);

whos
  Name                Size            Bytes  Class    Attributes

  data               24x5             14784  cell               
  data_combined       1x5              2364  cell               

When combining records, note that the dimensions of the data in the cell change. For example, if a variable has 20 records, each of which is a scalar value, the data in the cell array for the combined element contains a 20-by-1 vector of values. If each record is a 3-by-4 array, the cell array element contains a 20-by-3-by-4 array. For combined data, cdfread adds a dimension to the data, the first dimension, that is the index into the records.

Another way to speed up read operations is to read CDF epoch values as MATLAB serial date numbers. By default, cdfread creates a MATLAB cdfepoch object for each CDF epoch value in the file. If you specify the 'ConvertEpochToDatenum' parameter, setting it to true, cdfread returns CDF epoch values as MATLAB serial date numbers. For more information about working with MATLAB cdfepoch objects, see Representing CDF Time Values.

data_datenums = cdfread('example.cdf','ConvertEpochToDatenum',true);

whos
  Name                Size            Bytes  Class    Attributes

  data               24x5             14784  cell               
  data_combined       1x5              2364  cell               
  var_time           24x1              4608  cell       

Representing CDF Time Values

CDF represents time differently than MATLAB. CDF represents date and time as the number of milliseconds since 1-Jan-0000. This is called an epoch in CDF terminology. MATLAB represents date and time as a serial date number, which is the number of days since 0-Jan-0000. To represent CDF dates, MATLAB uses an object called a CDF epoch object. To access the time information in a CDF object, use the object's todatenum method.

For example, this code extracts the date information from a CDF epoch object:

  1. Extract the date information from the CDF epoch object returned in the cell array data (see Importing Common Data File Format (CDF) Files). Use the todatenum method of the CDF epoch object to get the date information, which is returned as a MATLAB serial date number.

    m_date = todatenum(data{1});
  2. View the MATLAB serial date number as a string.

    datestr(m_date)
    ans =
    
    01-Jan-2001

Importing Network Common Data Form (netCDF) Files

MATLAB provides access to the routines in the netCDF C library that you can use to read data from netCDF files and write data to netCDF files. MATLAB provides this access through a set of MATLAB functions that correspond to the functions in the netCDF C library. MATLAB groups the functions into a package, called netcdf. To call one of the functions in the package, you must specify the package name. For a complete list of all the functions, see netcdf.

This section does not attempt to describe all features of the netCDF library or explain basic netCDF programming concepts. To use the MATLAB netCDF functions effectively, you should be familiar with the information about netCDF contained in the NetCDF C Interface Guide for version 3.6.2.

Mapping netCDF API Syntax to MATLAB Function Syntax

For the most part, the MATLAB netCDF functions correspond directly to routines in the netCDF C library. For example, the MATLAB function netcdf.open corresponds to the netCDF library routine nc_open. In some cases, one MATLAB function corresponds to a group of netCDF library functions. For example, instead of creating MATLAB versions of every netCDF library nc_put_att_type function, where type represents a data type, MATLAB uses one function, netcdf.putAtt, to handle all supported data types.

The syntax of the MATLAB functions is similar to the netCDF library routines, with some exceptions. For example, the netCDF C library routines use input parameters to return data, while their MATLAB counterparts use one or more return values. For example, the following is the function signature of the nc_open routine in the netCDF library. Note how the netCDF file identifier is returned in the ncidp argument.

int nc_open (const char *path, int omode, int *ncidp); /* C syntax */

The following shows the signature of the corresponding MATLAB function, netcdf.open. Like its netCDF C library counterpart, the MATLAB netCDF function accepts a character string that specifies the file name and a constant that specifies the access mode. Note, however, that the MATLAB netcdf.open function returns the file identifier, ncid, as a return value.

ncid = netcdf.open(filename, mode)

To see a list of all the functions in the MATLAB netCDF package, see the netCDF reference page.

Exploring the Contents of a netCDF File

This example shows how to use the MATLAB netCDF functions to explore the contents of a netCDF file. The section uses the example netCDF file included with MATLAB, example.nc, as an illustration. For an example of reading data from a netCDF file, see Reading Data from a netCDF File

  1. Open the netCDF file using the netcdf.open function. This function returns an identifier that you use thereafter to refer to the file.

    ncid = netcdf.open('example.nc','NC_NOWRITE');
  2. Explore the contents of the file using the netcdf.inq function. This function returns the number of dimensions, variables, and global attributes in the file, and returns the identifier of the unlimited dimension in the file. The unlimited dimension can grow.

    [ndims,nvars,natts,unlimdimID]= netcdf.inq(ncid)
    ndims =
    
         4
    
    
    nvars =
    
         4
    
    
    natts =
    
         1
    
    
    unlimdimID =
    
         3
    
  3. Get more information about the dimensions, variables, and global attributes in the file by using netCDF inquiry functions. For example, to get information about the global attribute, first get the name of the attribute, using the netcdf.inqAttName function. After you get the name, 'creation_date' in this case, you can use the netcdf.inqAtt function to get information about the data type and length of the attribute.

    To get the name of an attribute, you must specify the ID of the variable the attribute is associated with and the attribute number. To access a global attribute, which isn't associated with a particular variable, use the constant 'NC_GLOBAL' as the variable ID. The attribute number is a zero-based index that identifies the attribute. For example, the first attribute has the index value 0, and so on.

    global_att_name = netcdf.inqAttName(ncid,netcdf.getConstant('NC_GLOBAL'),0)
    
    global_att_name =
    
    creation_date
    
    [xtype attlen] = netcdf.inqAtt(ncid,netcdf.getConstant('NC_GLOBAL'),global_att_name)
    
    xtype =
    
         2
    
    
    attlen =
    
        11
  4. Get the value of the attribute, using the netcdf.getAtt function.

    global_att_value = netcdf.getAtt(ncid,netcdf.getConstant('NC_GLOBAL'),global_att_name)
    
    global_att_value =
    
    09-Jun-2008
  5. Get information about the dimensions defined in the file through a series of calls to netcdf.inqDim. This function returns the name and length of the dimension. The netcdf.inqDim function requires the dimension ID, which is a zero-based index that identifies the dimensions. For example, the first dimension has the index value 0, and so on.

    [dimname, dimlen] = netcdf.inqDim(ncid,0)
    
    dimname =
    
    x
    
    dimlen =
    
        50

    The following table describes the dimensions in the example file.

    Dimension NameDimension Length
    x50
    y50
    z5
    t0 (unlimited)

  6. Get information about the variables in the file through a series of calls to netcdf.inqVar. This function returns the name, data type, dimension ID, and the number of attributes associated with the variable. The netcdf.inqVar function requires the variable ID, which is a zero-based index that identifies the variables. For example, the first variable has the index value 0, and so on.

    [varname, vartype, dimids, natts] = netcdf.inqVar(ncid,0)
    
    varname =
    
    avagadros_number
    
    
    vartype =
    
         6
    
    
    dimids =
    
         []
    
    
    natts =
    
         1

    The following table describes the variables in the example file. The data type information is the numeric value of the netCDF data type constants, such as, NC_INT and NC_BYTE. See the official netCDF documentation for information about these constants.

    Variable NameVariable TypeVariable Dimension IDsNumber of Attributes
    avagadros_number6[]1
    temperature304
    peaks5[0 1]1
    time_series4[2 3]1

Reading Data from a netCDF File

After you understand the contents of a netCDF file, by using the inquiry functions, you can retrieve the data from the variables and attributes in the file. To read the data associated with the variable avagadros_number in the example file, use the netcdf.getVar function. The following example uses the netCDF file identifier returned in the previous section, Exploring the Contents of a netCDF File. The variable ID is a zero-based index that identifies the variables. For example, the first variable has the index value 0, and so on. (To learn how to write data to a netCDF file, see Storing Data in a netCDF File.)

A_number = netcdf.getVar(ncid,0)

A_number =

  6.0221e+023

The netCDF functions automatically choose the MATLAB class that best matches the netCDF data type, but you can also specify the class of the return data by using an optional argument to netcdf.getVar. The following table shows the default mapping. For more information about netCDF data types, see the NetCDF C Interface Guide for version 3.6.2.

netCDF Data TypeMATLAB ClassNotes
NC_BYTEint8netCDF interprets byte data as either signed or unsigned.
NC_BYTEuint8netCDF interprets byte data as either signed or unsigned.
NC_CHARchar 
NC_SHORTint16 
NC_INTint32 
NC_FLOATsingle 
NC_DOUBLEdouble 

Importing Flexible Image Transport System (FITS) Files

The FITS file format is the standard data format used in astronomy, endorsed by both NASA and the International Astronomical Union (IAU). For more information about the FITS standard, go to the official FITS Web site, http://fits.gsfc.nasa.gov/.

The FITS file format is designed to store scientific data sets consisting of multidimensional arrays (1-D spectra, 2-D images, or 3-D data cubes) and two-dimensional tables containing rows and columns of data. A data file in FITS format can contain multiple components, each marked by an ASCII text header followed by binary data. The first component in a FITS file is known as the primary, which can be followed by any number of other components, called extensions, in FITS terminology.

To get information about the contents of a Flexible Image Transport System (FITS) file, use the fitsinfo function. The fitsinfo function returns a structure containing the information about the file and detailed information about the data in the file.

To import data into the MATLAB workspace from a Flexible Image Transport System (FITS) file, use the fitsread function. Using this function, you can import the data in the PrimaryData section of the file or you can import the data in any of the extensions in the file, such as the Image extension. This example illustrates how to use the fitsread function to read data from a sample FITS file included with MATLAB:

  1. Determine which extensions the FITS file contains, using the fitsinfo function.

    info = fitsinfo('tst0012.fits')
    
    info = 
    
           Filename: 'tst0012.fits'
        FileModDate: '12-Mar-2001 18:37:46'
           FileSize: 109440
           Contents: {1x5 cell}
        PrimaryData: [1x1 struct]
        BinaryTable: [1x1 struct]
            Unknown: [1x1 struct]
              Image: [1x1 struct]
         AsciiTable: [1x1 struct]

    The info structure shows that the file contains several extensions including the BinaryTable, AsciiTable, and Image extensions.

  2. Read data from the file.

    To read the PrimaryData in the file, specify the filename as the only argument:

    pdata = fitsread('tst0012.fits');

    To read any of the extensions in the file, you must specify the name of the extension as an optional parameter. This example reads the BinaryTable extension from the FITS file:

    bindata = fitsread('tst0012.fits','bintable');

Importing Hierarchical Data Format (HDF5) Files

Hierarchical Data Format, Version 5, (HDF5) is a general-purpose, machine-independent standard for storing scientific data in files, developed by the National Center for Supercomputing Applications (NCSA). HDF5 is used by a wide range of engineering and scientific fields that want a standard way to store data so that it can be shared. For more information about the HDF5 file format, read the HDF5 documentation available at the HDF Web site (http://www.hdfgroup.org).

The MATLAB high-level HDF5 function hdf5read provides an easy way to import data from an HDF5 file. In addition, you can use hdf5info to get information about an HDF5 file. These functions are discussed in the following sections:

MATLAB also provides direct access to the over 200 functions in the HDF5 library with low-level functions that correspond to the functions in the HDF5 library. In this way, you can access the features of the HDF5 library from MATLAB, such as reading and writing complex data types and using the HDF5 subsetting capabilities. For more information, see Using the MATLAB Low-Level HDF5 Functions.

Determining the Contents of an HDF5 File

HDF5 files can contain data and metadata, called attributes. HDF5 files organize the data and metadata in a hierarchical structure similar to the hierarchical structure of a UNIX® file system.

In an HDF5 file, the directories in the hierarchy are called groups. A group can contain other groups, data sets, attributes, links, and data types. A data set is a collection of data, such as a multidimensional numeric array or string. An attribute is any data that is associated with another entity, such as a data set. A link is similar to a UNIX file system symbolic link. Links are a way to reference data without having to make a copy of the data.

Data types are a description of the data in the data set or attribute. Data types tell how to interpret the data in the data set. For example, a file might contain a data type called "Reading" that is comprised of three elements: a longitude value, a latitude value, and a temperature value.

To explore the hierarchical organization of an HDF5 file, use the hdf5info function. For example, to find out what the sample HDF5 file, example.h5, contains, use this syntax:

fileinfo = hdf5info('example.h5');

hdf5info returns a structure that contains various information about the HDF5 file, including the name of the file and the version of the HDF5 library that MATLAB is using:

fileinfo = 

          Filename: 'example.h5'
        LibVersion: '1.8.1'
            Offset: 0
          FileSize: 8172
    GroupHierarchy: [1x1 struct]

In the information returned by hdf5info, look at the GroupHierarchy field. This field is a structure that describes the top-level group in the file, called the root group. Using the UNIX convention, HDF5 names this top-level group / (forward slash), as shown in the Name field of the GroupHierarchy structure.

toplevel = fileinfo.GroupHierarchy

toplevel = 

      Filename: 'C:\matlab\toolbox\matlab\demos\example.h5'
          Name: '/'
        Groups: [1x2 struct]
      Datasets: []
     Datatypes: []
         Links: []
    Attributes: [1x2 struct]

By looking at the Groups and Attributes fields, you can see that the file contains two groups and two attributes. The Datasets, Datatypes, and Links fields are all empty, indicating that the root group does not contain any data sets, data types, or links.

The following figure illustrates the organization of the root group in the sample HDF5 file example.h5.

Organization of the Root Group of the Sample HDF5 File

To explore the contents of the sample HDF5 file further, examine one of the two structures in the Groups field of the GroupHierarchy structure. Each structure in this field represents a group contained in the root group. The following example shows the contents of the second structure in this field.

level2 = toplevel.Groups(2)

level2 = 

      Filename: 'C:\matlab\toolbox\matlab\demos\example.h5'
          Name: '/g2'
        Groups: []
      Datasets: [1x2 struct]
     Datatypes: []
         Links: []
    Attributes: []

In the sample file, the group named /g2 contains two data sets. The following figure illustrates this part of the sample HDF5 file organization.

Organization of the Data Set /g2 in the Sample HDF5 File

To get information about a data set, look at either of the structures returned in the Datasets field. These structures provide information about the data set, such as its name, dimensions, and data type.

dataset1 = level2.Datasets(1)

dataset1 = 
      Filename: 'L:\matlab\toolbox\matlab\demos\example.h5'
          Name: '/g2/dset2.1'
          Rank: 1
      Datatype: [1x1 struct]
          Dims: 10
       MaxDims: 10
        Layout: 'contiguous'
    Attributes: []
         Links: []
     Chunksize: []
     Fillvalue: []

By examining the structures at each level of the hierarchy, you can traverse the entire file. The following figure describes the complete hierarchical organization of the sample file example.h5.

Hierarchical Structure of example.h5 HDF5 File

Importing Data from an HDF5 File

To read data or metadata from an HDF5 file, use the hdf5read function. As arguments, you must specify the name of the HDF5 file and the name of the data set or attribute. Alternatively, you can specify just the field in the structure returned by hdf5info that contains the name of the data set or attribute; hdf5read can determine the file name from the Filename field in the structure. For more information about finding the name of a data set or attribute in an HDF5 file, see Determining the Contents of an HDF5 File.

To illustrate, this example reads the data set, /g2/dset2.1 from the HDF5 sample file example.h5.

data = hdf5read('example.h5','/g2/dset2.1');

The return value contains the values in the data set, in this case a 1-by-10 vector of single-precision values:

data =

    1.0000
    1.1000
    1.2000
    1.3000
    1.4000
    1.5000
    1.6000
    1.7000
    1.8000
    1.9000

The hdf5read function maps HDF5 data types to appropriate MATLAB data types, whenever possible. If the HDF5 file contains data types that cannot be represented in MATLAB, hdf5write uses one of the predefined MATLAB HDF5 data type objects to represent the data.

For example, if an HDF5 data set contains four array elements, hdf5read can return the data as a 1-by-4 array of hdf5.h5array objects:

whos 

Name    Size        Bytes     Class

data     1x4                hdf5.h5array

Grand total is 4 elements using 0 bytes

For more information about the MATLAB HDF5 data type objects, see Mapping HDF5 Data Types to MATLAB Data Types.

Mapping HDF5 Data Types to MATLAB Data Types

When the hdf5read function reads data from an HDF5 file into the MATLAB workspace, it maps HDF5 data types to MATLAB data types, depending on whether the data in the data set is in an atomic data type or a nonatomic composite data type.

Atomic data types describe commonly used binary formats for numbers (integers and floating point) and characters (ASCII). Since MATLAB and HDF5 support similar data types, mapping atomic data types is typically straightforward.

Composite data types are aggregations of one or more atomic data types. Composite data types include structures, multidimensional arrays, and variable-length data types (one-dimensional arrays). The mapping is sometimes ambiguous between MATLAB classes and HDF5 data types. For example, in HDF5, a 5-by-5 data set containing a single uint8 value in each element is distinct from a 1-by-1 data set containing a 5-by-5 array of uint8 values. In the first case, the data set contains 25 observations of a single value; in the second case, the data set contains a single observation with 25 values. In MATLAB both of these data sets are represented by a 5-by-5 matrix.

Mapping Atomic Data Types.   HDF5 and MATLAB support similar atomic data types, mapped by hdf5read as shown in the table below.

Mapping Between HDF5 Atomic Data Types and MATLAB Data Types

HDF5 Atomic Data TypeMATLAB Data Type
Bit-fieldArray of packed 8-bit integers
FloatMATLAB single and double types, provided that they occupy 64 bits or fewer
Integer types, signed and unsignedEquivalent MATLAB integer types, signed and unsigned
OpaqueArray of uint8 values
ReferenceArray of uint8 values
StringMATLAB character arrays

To find information about the data types in HDF5 files, use the hdf5info function. Because different computing architectures and programming languages support different number and character representations, the HDF5 library provides platform-independent data types, which it then maps to an appropriate data type for each platform.

For example, the data set /g2/dset2.2 in the sample file example.h5 includes atomic data. The data type information is in a Datatype field:

fileinfo = hdf5info('example.h5');
dataset1 = fileinfo.GroupHierarchy.Groups(1,2).Datasets(1,2);

dtype = dataset1.Datatype 
dtype = 

        Name: []
       Class: 'H5T_IEEE_F32BE'
    Elements: []

The H5T_IEEE_F32BE class name indicates the data is a 4-byte, big endian, IEEE® floating-point data type. When hdf5read reads this data, MATLAB maps it to class single.

Mapping Composite Data Types.   A composite data type is an aggregation of one or more atomic data types. Composite data types include structures, multidimensional arrays, and variable-length data types (one-dimensional arrays).

To support reading HDF5 composite data types, or writing data to an HDF5 file, MATLAB includes a set of classes to represent HDF5 data types. If the data in the data set is stored in one of the HDF5 nonatomic data types and the data cannot be represented in the workspace using a native MATLAB data type, hdf5read uses one of a set of classes MATLAB defines to represent HDF5 data types. The following figure illustrates the hdf5 class and its subclasses.

To access the data in the data set in the MATLAB workspace, you must access the Data field in the object.

For example, if an HDF5 file contains a data set made up of an enumerated data type which cannot be represented in MATLAB, hdf5read uses the HDF5 h5enum class to represent the data. An h5enum object has data members that store the enumerations (text strings), their corresponding values, and the enumerated data.

This example converts a simple MATLAB vector into an h5array object and then displays the fields in the object:

vec = [ 1 2 3];

hhh = hdf5.h5array(vec);

hhh:
 
    Name: ''
    Data: [1 2 3]

hhh.Data

ans =

     1     2     3

For more information about a specific MATLAB HDF5 data class, see the sections that follow:

To learn more about the HDF5 data types in general, see the HDF Web page at http://www.hdfgroup.org.

MATLAB HDF5 h5array Data Class.   The h5array data class associates a name with an array. The following tables list the class constructors, data members, and methods.

ConstructorsDescription
arr = hdf5.h5arrayCreates an h5array object.
arr = hdf5.h5array(data)Creates an h5array object, where data specifies the value of the Data member. data can be numeric, a cell array, or an HDF5 data type.

Data MembersDescription
DataMultidimensional array
NameText string specifying name of the object

MethodsDescription
setData(arr, data)Sets the value of the Data member, where arr is an h5array object and data can be numeric, a cell array, or an HDF5 data type.
setName(arr, name)Sets the value of the Name member, where arr is an h5array object and name is a string or cell array.

MATLAB HDF5 h5compound Data Class.   The h5compound data class associates a name with a structure. You can define the field names in the structure and their values. The following tables list the class constructors, data members, and methods.

ConstructorsDescription
C = hdf5.h5compoundCreates an h5compound object.
C = hdf5.h5compound(n1,n2,...)Creates an h5compound object, where n1, n2 and so on are text strings that specify field names. The constructor creates a corresponding data field for every member name.

Data MembersDescription
DataMultidimensional array
NameText string specifying name of the object
MemberNamesText strings specifying name of the object

MethodsDescription
addMember(C, mName)Creates a new field in the object C. mName specifies the name of the field.
setMember(C, mName, mData)Sets the value of the Data element associated with the field specified by mName, where C is an h5compound object and mData can be numeric or an HDF5 data type.
setMemberNames(C, n1, n2,...)Specifies the names of fields in the structure, where C is an h5compound object and n1,n2, and so on are text strings that specify field names. The method creates a corresponding data field for every name specified.
setName(C, name)Sets the value of the Name member, where C is an h5compound object and name is a string or cell array.

MATLAB HDF5 h5enum Data Class.   The h5enum data class defines an enumerated type. You can specify the enumerations (text strings) and the values they represent. The following tables list the class constructors, data members, and methods.

ConstructorsDescription
E = hdf5.h5enumCreates an h5enum object.
E = hdf5.h5enum(eNames, eVals)Creates an h5enum object, where eNames is a cell array of strings, and eVals is vector of integers. eNames and eVals must have the same number of elements.

Data MembersDescription
DataMultidimensional array
NameText string specifying name of the object
EnumNamesText string specifying the enumerations, that is, the text strings that represent values
EnumValuesValues associated with enumerations

MethodsDescription
defineEnum(E, eNames, eVals)Defines the set of enumerations with the integer values they represent where eNames is a cell array of strings, and eVals is vector of integers. eNames and eVals must have the same number of elements.
enumdata = getString(E)Returns a cell array containing the names of the enumerations, where E is an h5enum object.
setData(E, eData)Sets the value of the object's Data member, where E is an h5enum object and eData is a vector of integers.
setEnumNames(E, eNames)Specifies the enumerations, where E is an h5enum object and eNames is a cell array of strings.
setEnumValues(E, eVals)Specifies the value associated with each enumeration, where E is an h5enum object and eVals is a vector of integers.
setName(E, name)Sets the value of the object's Name member, where E is an h5enum object and name is a string or cell array.

This example uses an HDF5 enumeration object.

  1. Create an HDF5 enumerated object.

    enum_obj = hdf5.h5enum;
  2. Define the enumerated values and their corresponding names.

    enum_obj.defineEnum({'RED' 'GREEN' 'BLUE'}, uint8([1 2 3]));
    

    enum_obj now contains the definition of the enumeration that associates the names RED, GREEN, and BLUE with the numbers 1, 2, and 3.

  3. Add enumerated data to the object.

    enum_obj.setData(uint8([2 1 3 3 2 3 2 1]));

    In the HDF5 file, these numeric values map to the enumerated values GREEN, RED, BLUE, BLUE, GREEN, etc.

  4. Write the enumerated data to a data set named objects in an HDF5 file.

    hdf5write('myfile3.h5', '/g1/objects', enum_obj);
  5. Read the enumerated data set from the file.

    ddd = hdf5read('myfile3.h5','/g1/objects')
    
    hdf5.h5enum:
     
              Name: ''
              Data: [2 1 3 3 2 3 2 1]
         EnumNames: {'RED'  'GREEN'  'BLUE'}
        EnumValues: [1 2 3]

MATLAB HDF5 h5string Data Class.   The h5string data class associates a name with a text string and provides optional padding behavior. The following tables list the class constructors, data members, and methods.

ConstructorsDescription
str = hdf5.h5stringCreates an h5string object.
str = hdf5.h5string(data)Creates an h5string object, where data is a text string.
str = hdf5.h5string(data, padtype)Creates an h5stringobject, where data is a text string and padtype specifies the type of padding to use.

Data MembersDescription
DataMultidimensional array
NameText string specifying name of the object
LengthScalar defining length of string
PaddingType of padding to use:
'spacepad'
'nullterm'
'nullpad'

MethodsDescription
setData(str, data)Sets the value of the object's Data member, where str is an h5string object anddata is a text string.
setLength(str, lenVal)Sets the value of the object's Length member, where str is an h5string object and lenVal is a scalar.
setName(str, name)Sets the value of the object's Name member, where str is an h5string object and name is a string or cell array.
setPadding(str, padType)Specifies the value of the object's Padding member, where str is an h5string object and padType is a text string specifying one of the supported pad types.

This example uses an HDF5 string object.

  1. Create an HDF5 string object, specifying the text string you want it to contain.

    myH5str = hdf5.h5string('this is a string')
    
    hdf5.h5string:
     
           Name: ''
         Length: 16
        Padding: 'nullterm'
           Data: 'this is a string'
    
  2. See how the generated object is of class hdf5.h5string in the workspace.

    whos
      Name        Size    Bytes   Class          Attributes
    
      myH5str     1x1             hdf5.h5string  
    
  3. Set the name of the object, using a HDF5 string object method, and view the object again.

    setName( myH5str, 'my H5 string object')
    
    myH5str
    
    hdf5.h5string:
     
           Name: 'my H5 string object'
         Length: 16
        Padding: 'nullterm'
           Data: 'this is a string'

MATLAB HDF5 h5vlen Data Class.   The h5vlen data class creates a variable-length array, that is, an array in which the elements can have different lengths. This is also called a ragged array. The following tables list the class constructors, data members, and methods.

ConstructorsDescription
V = hdf5.h5vlenCreates an h5vlen object.
V = hdf5.h5vlen(data)Creates an h5vlen object, where data specifies the value of the Data member. data can be numeric or an HDF5 data type.

Data MembersDescription
DataMultidimensional array
NameText string specifying name of the object

MethodsDescription
setData(V, data)Sets the value of the object's Data member, where V is an h5vlen object and data can be a scalar, vector, text string, a cell array, or an HDF5 data type.
setName(V, name)Sets the value of the object'sName member, where V is an h5vlen object and name is a string or cell array.

The following example creates an array of HDF5 h5vlen objects. The h5vlen objects contain numeric vectors of various lengths.

v(1) = hdf5.h5vlen([1:5]);
v(2) = hdf5.h5vlen([7:-1:3]);
v(3) = hdf5.h5vlen([1:2:8]);

Importing Hierarchical Data Format (HDF4) Files

Hierarchical Data Format (HDF4) is a general-purpose, machine-independent standard for storing scientific data in files, developed by the National Center for Supercomputing Applications (NCSA). For more information about these file formats, read the HDF documentation at the HDF Web site (www.hdfgroup.org).

HDF-EOS is an extension of HDF4 that was developed by the National Aeronautics and Space Administration (NASA) for storage of data returned from the Earth Observing System (EOS). For more information about this extension to HDF4, see the HDF-EOS documentation at the NASA Web site (www.hdfeos.org).

MATLAB includes several options for importing HDF4 files, discussed in the following sections:

Using the HDF Import Tool

The HDF Import Tool is a graphical user interface that you can use to navigate through HDF4 or HDF-EOS files and import data from them. Importing data using the HDF Import Tool involves these steps:

The following sections provide more detail about each of these steps.

Step 1: Opening an HDF4 File in the HDF Import Tool.   Open an HDF4 or HDF-EOS file in MATLAB using one of the following methods:

Viewing a File in the HDF Import Tool.  

When you open an HDF4 or HDF-EOS file in the HDF Import Tool, the tool displays the contents of the file in the Contents pane. You can use this pane to navigate within the file to see what data sets it contains. You can view the contents of HDF-EOS files as HDF data sets or as HDF-EOS files. The icon in the contents pane indicates the view, as illustrated in the following figure. Note that these are just two views of the same data.

Step 2: Selecting a Data Set in an HDF File.   To import a data set, you must first select the data set in the contents pane of the HDF Import Tool. Use the Contents pane to view the contents of the file and navigate to the data set you want to import.

For example, the following figure shows the data set Example SDS in the HDF file selected. Once you select a data set, the Metadata panel displays information about the data set and the importing and subsetting pane displays subsetting options available for this type of HDF object.

Step 3: Specifying a Subset of the Data (Optional).   When you select a data set in the contents pane, the importing and subsetting pane displays the subsetting options available for that type of HDF object. The subsetting options displayed vary depending on the type of HDF object. For more information, see Using the HDF Import Tool Subsetting Options.

Step 4: Importing Data and Metadata.   To import the data set you have selected, click the Import button, bottom right corner of the Importing and Subsetting pane. Using the Importing and Subsetting pane, you can

The following figure shows how to specify these options in the HDF Import Tool.

Step 5: Closing HDF Files and the HDF Import Tool.   To close a file, select the file in the contents pane and click Close File on the HDF Import Tool File menu.

To close all the files open in the HDF Import Tool, click Close All Files on the HDF Import Tool File menu.

To close the tool, click Close HDFTool in the HDF Import Tool File menu or click the Close button in the upper right corner of the tool.

If you used the hdftool syntax that returns a handle to the tool,

h = hdftool('example.hdf')

you can use the close(h) command to close the tool from the MATLAB command line.

Using the HDF Import Tool Subsetting Options

When you select a data set, the importing and subsetting pane displays the subsetting options available for that type of data set. The following sections provide information about these subsetting options for all supported data set types. For general information about the HDF Import tool, see Using the HDF Import Tool.

HDF Scientific Data Sets (SD).   The HDF scientific data set (SD) is a group of data structures used to store and describe multidimensional arrays of scientific data. Using the HDF Import Tool subsetting parameters, you can import a subset of an HDF scientific data set by specifying the location, range, and number of values to be read along each dimension.

The subsetting parameters are:

HDF Vdata.   HDF Vdata data sets provide a framework for storing customized tables. A Vdata table consists of a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. Each field is identified by a name. The following figure illustrates a Vdata table.

You can import a subset of an HDF Vdata data set in the following ways:

The following figure shows how you specify these subsetting parameters for Vdata.

HDF-EOS Grid Data.   In HDF-EOS Grid data, a rectilinear grid overlays a map. The map uses a known map projection. The HDF Import Tool supports the following mutually exclusive subsetting options for Grid data:

To access these options, click the Subsetting method menu in the importing and subsetting pane.

Direct Index.  

You can import a subset of an HDF-EOS Grid data set by specifying the location, range, and number of values to be read along each dimension.

Each row represents a dimension in the data set and each column represents these subsetting parameters:

Geographic Box.  

You can import a subset of an HDF-EOS Grid data set by specifying the rectangular area of the grid that you are interested in. To define this rectangular area, you must specify two points, using longitude and latitude in decimal degrees. These points are two corners of the rectangular area. Typically, Corner 1 is the upper-left corner of the box, and Corner 2 is the lower-right corner of the box.

Optionally, you can further define the subset of data you are interested in by using Time parameters (see Time) or by specifying other User-Defined subsetting parameters (see User-Defined).

Interpolation.  

Interpolation is the process of estimating a pixel value at a location in between other pixels. In interpolation, the value of a particular pixel is determined by computing the weighted average of some set of pixels in the vicinity of the pixel.

You define the region used for bilinear interpolation by specifying two points that are corners of the interpolation area:

Pixels.  

You can import a subset of the pixels in a Grid data set by defining a rectangular area over the grid. You define the region used for bilinear interpolation by specifying two points that are corners of the interpolation area:

Tile.  

In HDF-EOS Grid data, a rectilinear grid overlays a map. Each rectangle defined by the horizontal and vertical lines of the grid is referred to as a tile. If the HDF-EOS Grid data is stored as tiles, you can import a subset of the data by specifying the coordinates of the tile you are interested in. Tile coordinates are 1-based, with the upper-left corner of a two-dimensional data set identified as 1,1. In a three-dimensional data set, this tile would be referenced as 1,1,1.

Time.  

You can import a subset of the Grid data set by specifying a time period. You must specify both the start time and the stop time (the endpoint of the time span). The units (hours, minutes, seconds) used to specify the time are defined by the data set.

Along with these time parameters, you can optionally further define the subset of data to import by supplying user-defined parameters.

User-Defined.  

You can import a subset of the Grid data set by specifying user-defined subsetting parameters.

When specifying user-defined parameters, you must first specify whether you are subsetting along a dimension or by field. Select the dimension or field by name using the Dimension or Field Name menu. Dimension names are prefixed with the characters DIM:.

Once you specify the dimension or field, you use Min and Max to specify the range of values that you want to import. For dimensions, Min and Max represent a range of elements. For fields, Min and Max represent a range of values.

HDF-EOS Point Data.   HDF-EOS Point data sets are tables. You can import a subset of an HDF-EOS Point data set by specifying field names and level. Optionally, you can refine the subsetting by specifying the range of records you want to import, by defining a rectangular area, or by specifying a time period. For information about specifying a rectangular area, see Geographic Box. For information about subsetting by time, see Time.

HDF-EOS Swath Data.   HDF-EOS Swath data is data that is produced by a satellite as it traces a path over the earth. This path is called its ground track. The sensor aboard the satellite takes a series of scans perpendicular to the ground track. Swath data can also include a vertical measure as a third dimension. For example, this vertical dimension can represent the height above the Earth of the sensor.

The HDF Import Tool supports the following mutually exclusive subsetting options for Swath data:

To access these options, click the Subsetting method menu in the Importing and Subsetting pane.

Direct Index.  

You can import a subset of an HDF-EOS Swath data set by specifying the location, range, and number of values to be read along each dimension.

Each row represents a dimension in the data set and each column represents these subsetting parameters:

Geographic Box.  

You can import a subset of an HDF-EOS Swath data set by specifying the rectangular area of the grid that you are interested in and by specifying the selection Mode.

You define the rectangular area by specifying two points that specify two corners of the box:

You specify the selection mode by choosing the type of Cross Track Inclusion and the Geolocation mode. The Cross Track Inclusion value determines how much of the area of the geographic box that you define must fall within the boundaries of the swath.

Select from these values:

The Geolocation Mode value specifies whether geolocation fields and data must be in the same swath.

Select from these values:

Time.  

You can optionally also subset swath data by specifying a time period. The units used (hours, minutes, seconds) to specify the time are defined by the data set

User-Defined.  

You can optionally also subset a swath data set by specifying user-defined parameters.

When specifying user-defined parameters, you must first specify whether you are subsetting along a dimension or by field. Select the dimension or field by name using the Dimension or Field Name menu. Dimension names are prefixed with the characters DIM:.

Once you specify the dimension or field, you use Min and Max to specify the range of values that you want to import. For dimensions, Min and Max represent a range of elements. For fields, Min and Max represent a range of values.

HDF Raster Image Data.   For 8-bit HDF raster image data, you can specify the colormap.

Using the MATLAB HDF4 High-Level Functions

To import data from an HDF or HDF-EOS file, you can use the MATLAB HDF4 high-level function hdfread. The hdfread function provides a programmatic way to import data from an HDF4 or HDF-EOS file that still hides many of the details that you need to know if you use the low-level HDF functions, described in Using the HDF4 Low-Level Functions. You can also import HDF4 data using an interactive GUI, described in Using the HDF Import Tool.

This section describes these high-level MATLAB HDF functions, including

To export data to an HDF4 file, you must use the MATLAB HDF4 low-level functions.

Using hdfinfo to Get Information About an HDF4 File.   To get information about the contents of an HDF4 file, use the hdfinfo function. The hdfinfo function returns a structure that contains information about the file and the data in the file.

This example returns information about a sample HDF4 file included with MATLAB:

info = hdfinfo('example.hdf')

info = 

    Filename: 'example.hdf'
         SDS: [1x1 struct]
       Vdata: [1x1 struct]

To get information about the data sets stored in the file, look at the SDS field.

Using hdfread to Import Data from an HDF4 File.   To use thehdfread function, you must specify the data set that you want to read. You can specify the filename and the data set name as arguments, or you can specify a structure returned by the hdfinfo function that contains this information. The following example shows both methods. For information about how to import a subset of the data in a data set, see Reading a Subset of the Data in a Data Set.

  1. Determine the names of data sets in the HDF4 file, using the hdfinfo function.

    info = hdfinfo('example.hdf')
    
    info = 
    
        Filename: 'example.hdf'
             SDS: [1x1 struct]
           Vdata: [1x1 struct]

    To determine the names and other information about the data sets in the file, look at the contents of the SDS field. The Name field in the SDS structure gives the name of the data set.

    dsets = info.SDS
    
    dsets = 
    
           Filename: 'example.hdf'
               Type: 'Scientific Data Set'
               Name: 'Example SDS'
               Rank: 2
           DataType: 'int16'
         Attributes: []
               Dims: [2x1 struct]
              Label: {}
        Description: {}
              Index: 0
  2. Read the data set from the HDF4 file, using the hdfread function. Specify the name of the data set as a parameter to the function. Note that the data set name is case sensitive. This example returns a 16-by-5 array:

    dset = hdfread('example.hdf', 'Example SDS');
    
    dset =
    
          3      4      5      6      7
          4      5      6      7      8
          5      6      7      8      9
          6      7      8      9     10
          7      8      9     10     11
          8      9     10     11     12
          9     10     11     12     13
         10     11     12     13     14
         11     12     13     14     15
         12     13     14     15     16
         13     14     15     16     17
         14     15     16     17     18
         15     16     17     18     19
         16     17     18     19     20
         17     18     19     20     21
         18     19     20     21     22

    Alternatively, you can specify the specific field in the structure returned by hdfinfo that contains this information. For example, to read a scientific data set, use the SDS field.

    dset = hdfread(info.SDS);
    
Reading a Subset of the Data in a Data Set.  

To read a subset of a data set, you can use the optional 'index' parameter. The value of the index parameter is a cell array of three vectors that specify the location in the data set to start reading, the skip interval (e.g., read every other data item), and the amount of data to read (e.g., the length along each dimension). In HDF4 terminology, these parameters are called the start, stride, and edge values.

For example, this code

Using the HDF4 Low-Level Functions

This section describes how to use MATLAB functions to access the HDF4 Application Programming Interfaces (APIs). These APIs are libraries of C routines. To import or export data, you must use the functions in the HDF4 API associated with the particular HDF4 data type you are working with. Each API has a particular programming model, that is, a prescribed way to use the routines to write data sets to the file. To illustrate this concept, this section describes the programming model of one particular HDF4 API: the HDF4 Scientific Data (SD) API. For a complete list of the HDF4 APIs supported by MATLAB and the functions you use to access each one, see the hdf reference page.

This section includes the following:

Mapping HDF4 to MATLAB Syntax.   Each HDF4 API includes many individual routines that you use to read data from files, write data to files, and perform other related functions. For example, the HDF4 Scientific Data (SD) API includes separate C routines to open (SDopen), close (SDend), and read data (SDreaddata).

Instead of supporting each routine in the HDF4 APIs, MATLAB provides a single function that serves as a gateway to all the routines in a particular HDF4 API. For example, the HDF Scientific Data (SD) API includes the C routine SDend to close an HDF4 file:

status = SDend(sd_id); /* C code */

To call this routine from MATLAB, use the MATLAB function associated with the SD API, hdfsd. You must specify the name of the routine, minus the API acronym, as the first argument and pass any other required arguments to the routine in the order they are expected. For example,

status = hdfsd('end',sd_id); % MATLAB code

Some HDF4 API routines use output arguments to return data. Because MATLAB does not support output arguments, you must specify these arguments as return values.

For example, the SDfileinfo routine returns data about an HDF4 file in two output arguments, ndatasets and nglobal_atts. Here is the C code:

status = SDfileinfo(sd_id, ndatasets, nglobal_atts);

To call this routine from MATLAB, change the output arguments into return values:

[ndatasets, nglobal_atts, status] = hdfsd('fileinfo',sd_id);

Specify the return values in the same order as they appear as output arguments. The function status return value is always specified as the last return value.

Step 1: Opening the HDF4 File.  

To import an HDF4 SD data set, you must first open the file using the SD API routine SDstart. (In HDF4 terminology, the numeric arrays stored in HDF4 files are called data sets.) In MATLAB, you use the hdfsd function, specifying as arguments:

For example, this code opens the file mydata.hdf for read access:

sd_id = hdfsd('start','mydata.hdf','read');

If SDstart can find and open the file specified, it returns an HDF4 SD file identifier, named sd_id in the example. Otherwise, it returns -1.

Step 2: Retrieving Information About the HDF4 File.   To get information about an HDF4 file, you must use the SD API routine SDfileinfo. This function returns the number of data sets in the file and the number of global attributes in the file, if any. (For more information about global attributes, see Exporting to Hierarchical Data Format (HDF4) Files.) In MATLAB, you use the hdfsd function, specifying the following arguments:

In this example, the HDF4 file contains three data sets and one global attribute.

[ndatasets, nglobal_atts, stat] = hdfsd('fileinfo',sd_id)

ndatasets =
    3

nglobal_atts =
    1

status =
    0

Step 3: Retrieving Attributes from an HDF4 File (Optional).   HDF4 files can optionally include information, called attributes, that describes the data the file contains. Attributes associated with an entire HDF4 file are called global attributes. Attributes associated with a data set are called local attributes. (You can also associate attributes with files or dimensions. For more information, see Step 4: Writing Metadata to an HDF4 File.)

To retrieve attributes from an HDF4 file, use the HDF4 API routine SDreadattr. In MATLAB, use the hdfsd function, specifying as arguments:

For example, this code returns the contents of the first global attribute, which is the character string my global attribute:

attr_idx = 0;
[attr, status] = hdfsd('readattr', sd_id, attr_idx); 

attr =
    my global attribute

Step 4: Selecting the Data Sets to Import.   To select a data set, use the SD API routine SDselect. In MATLAB, you use the hdfsd function, specifying as arguments:

If SDselect finds the specified data set in the file, it returns an HDF4 SD data set identifier, called sds_id in the example. If it cannot find the data set, it returns -1.

sds_id = hdfsd('select',sd_id,1)

Step 5: Getting Information About a Data Set.   To read a data set, you must get information about the data set, such as its name, size, and data type. In the HDF4 SD API, you use the SDgetinfo routine to gather this information. In MATLAB, use the hdfsd function, specifying as arguments:

This code retrieves information about the data set identified by sds_id:

[dsname, dsndims, dsdims, dstype, dsatts, stat] = 
              hdfsd('getinfo',sds_id)
dsname =
      A

dsndims =
      2

dsdims =
      5     3

dstype =
      double

dsatts =
      0

stat =
      0

Step 6: Reading Data from the HDF4 File.   To read data from an HDF4 file, you must use the SDreaddata routine. In MATLAB, use the hdfsd function, specifying as arguments:

For example, to read the entire contents of a data set, use this code:

[ds_name, ds_ndims, ds_dims, ds_type, ds_atts, stat] = 
hdfsd('getinfo',sds_id);

ds_start = zeros(1,ds_ndims); % Creates the vector [0 0]
ds_stride = []; 
ds_edges = ds_dims; 

[ds_data, status] = 
            hdfsd('readdata',sds_id,ds_start,ds_stride,ds_edges);

disp(ds_data)
    1    2    3    4    5
    6    7    8    9    10
   11   12   13   14    15

To read less than the entire data set, use the start, stride, and edges vectors to specify where you want to start reading data and how much data you want to read. For example, this code reads the entire second row of the sample data set:

ds_start = [0 1]; % Start reading at the first column, second row
ds_stride = []; % Read each element
ds_edges = [5 1]; % Read a 1-by-5 vector of data 

[ds_data, status] = 
           hdfsd('readdata',sds_id,ds_start,ds_stride,ds_edges);

Step 7: Closing the HDF4 Data Set.   After writing data to a data set in an HDF4 file, you must close access to the data set. In the HDF4 SD API, you use the SDendaccess routine to close a data set. In MATLAB, use the hdfsd function, specifying as arguments:

For example, this code closes the data set:

stat = hdfsd('endaccess',sds_id);

You must close access to all the data sets in an HDF4 file before closing it.

Step 8: Closing the HDF4 File.   After writing data to a data set and closing the data set, you must also close the HDF4 file. In the HDF4 SD API, you use the SDend routine. In MATLAB, use the hdfsd function, specifying as arguments:

For example, this code closes the data set:

stat = hdfsd('end',sd_id);
  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS