Skip to Main Content Skip to Search
Product Documentation

Importing Scientific Data Files

Importing Common Data File Format (CDF) Files

CDF was created by the National Space Science Data Center (NSSDC) to provide a self-describing data storage and manipulation format that matches the structure of scientific data and applications (i.e., statistical and numerical methods, visualization, and management). For more information about this format, see the CDF Web site.

MATLAB provides two ways to access CDF files: a set of high-level functions and a package of low-level functions that provide direct access to the routines in the CDF C API library. The high level functions provide a simpler interface to accessing CDF files. However, if you require more control over the import operation, such as data subsetting for large data sets, use the low-level functions. The following sections provide more information.

High-Level CDF Import Functions

MATLAB includes high-level functions that you can use to get information about the contents of a Common Data Format (CDF) file and then read data from the file. The following sections provide more information.

Getting Information about the Contents of CDF File.  To get information about the contents of a CDF file, such as the names of variables in the CDF file, use the cdfinfo function. The cdfinfo function returns a structure containing general information about the file and detailed information about the variables and attributes in the file.

In this example, the Variables field indicates the number of variables in the file. Taking a closer look at the contents of this field, you can see that the first variable, Time, is made up of 24 records containing CDF epoch data. The next two variables, Longitude and Latitude, have only one associated record containing int8 data. For details about how to interpret the data returned in the Variables field, see cdfinfo.

info = cdfinfo('example.cdf')

info = 

              Filename: 'example.cdf'
           FileModDate: '19-May-2010 12:03:11'
              FileSize: 1310
                Format: 'CDF'
         FormatVersion: '2.7.0'
          FileSettings: [1x1 struct]
              Subfiles: {}
             Variables: {6x6 cell}
      GlobalAttributes: [1x1 struct]
    VariableAttributes: [1x1 struct]

vars = info.Variables

vars = 

    'Time'                [1x2 double]    [24]    'epoch'     'T/'        'Full'
    'Longitude'           [1x2 double]    [ 1]    'int8'      'F/FT'      'Full'
    'Latitude'            [1x2 double]    [ 1]    'int8'      'F/TF'      'Full'
    'Data'                [1x3 double]    [ 1]    'double'    'T/TTT'     'Full'
    'multidimensional'    [1x4 double]    [ 1]    'uint8'     'T/TTTT'    'Full'
    'Temperature'         [1x2 double]    [10]    'int16'     'T/TT'      'Full'

Reading Data from a CDF File.  To read all of the data in the CDF file, use the cdfread function. The function returns the data in a cell array. The columns of data correspond to the variables; the rows correspond to the records associated with a variable.

data = cdfread('example.cdf');

whos data
  Name       Size            Bytes  Class    Attributes

  data       24x6             16512  cell      

To read the data associated with one or more particular variables, use the 'Variable' parameter. Specify the names of the variables as text strings in a cell array. Variable names are case sensitive. The following example reads the Longitude and Latitude variables from the file.

var_long_lat = cdfread('example.cdf','Variable',{'Longitude','Latitude'});

whos var_long_lat
Name             Size            Bytes  Class    Attributes

var_long_lat     1x2              128    cell               

Speeding Up Read Operations.  The cdfread function offers two ways to speed up read operations when working with large data sets:

To reduce the number of elements in the returned cell array, specify the 'CombineRecords' parameter. By default, cdfread creates a cell array with a separate element for every variable and every record in each variable, padding the records dimension to create a rectangular cell array. For example, reading all the data from the example file produces an output cell array, 24-by-6, where the columns represent variables and the rows represent the records for each variable. When you set the 'CombineRecords' parameter to true, cdfread creates a separate element for each variable but saves time by putting all the records associated with a variable in a single cell array element. Thus, reading the data from the example file with 'CombineRecords' set to true produces a 1-by-5 cell array, as shown below.

data_combined = cdfread('example.cdf','CombineRecords',true);

whos
  Name                Size            Bytes  Class    Attributes

  data               24x6             16512  cell               
  data_combined       1x6              2544  cell               

When combining records, note that the dimensions of the data in the cell change. For example, if a variable has 20 records, each of which is a scalar value, the data in the cell array for the combined element contains a 20-by-1 vector of values. If each record is a 3-by-4 array, the cell array element contains a 20-by-3-by-4 array. For combined data, cdfread adds a dimension to the data, the first dimension, that is the index into the records.

Another way to speed up read operations is to read CDF epoch values as MATLAB serial date numbers. By default, cdfread creates a MATLAB cdfepoch object for each CDF epoch value in the file. If you specify the 'ConvertEpochToDatenum' parameter, setting it to true, cdfread returns CDF epoch values as MATLAB serial date numbers. For more information about working with MATLAB cdfepoch objects, see Representing CDF Time Values.

data_datenums = cdfread('example.cdf','ConvertEpochToDatenum',true);

whos
  Name                Size            Bytes  Class    Attributes

  data               24x6             16512  cell                
  data_combined       1x6              2544  cell                
  data_datenums      24x6             13536  cell    

Representing CDF Time Values.  CDF represents time differently than MATLAB. CDF represents date and time as the number of milliseconds since 1-Jan-0000. This is called an epoch in CDF terminology. MATLAB represents date and time as a serial date number, which is the number of days since 0-Jan-0000. To represent CDF dates, MATLAB uses an object called a CDF epoch object. To access the time information in a CDF object, use the object's todatenum method.

For example, this code extracts the date information from a CDF epoch object:

  1. Extract the date information from the CDF epoch object returned in the cell array data (see Importing Common Data File Format (CDF) Files). Use the todatenum method of the CDF epoch object to get the date information, which is returned as a MATLAB serial date number.

    m_date = todatenum(data{1});
  2. View the MATLAB serial date number as a string.

    datestr(m_date)
    ans =
    
    01-Jan-2001

Using the CDF Library Low-Level Functions to Import Data

To import (read) data from a Common Data Format (CDF) file, you can use the MATLAB low-level CDF functions. The MATLAB functions correspond to dozens of routines in the CDF C API library. For a complete list of all the MATLAB low-level CDF functions, see cdflib.

This section does not attempt to describe all features of the CDF library or explain basic CDF programming concepts. To use the MATLAB CDF low-level functions effectively, you must be familiar with the CDF C interface. Documentation about CDF, version 3.3.0, is available at the CDF Web site.

The following example shows how to use low-level functions to read data from a CDF file.

  1. Open the sample CDF file. For information about creating a new CDF file, seeUsing the Low-level CDF Functions to Export Data.

    cdfid = cdflib.open('example.cdf');
    
  2. Get some information about the contents of the file, such as the number of variables in the file, the number of global attributes, and the number of attributes with variable scope.

    info = cdflib.inquire(cdfid)
    
    info = 
    
         encoding: 'IBMPC_ENCODING'
         majority: 'ROW_MAJOR'
           maxRec: 23
          numVars: 6
        numvAttrs: 1
        numgAttrs: 3
    
  3. Get information about the individual variables in the file. Variable ID numbers start at zero.

    info  = cdflib.inquireVar(cdfid,0)
    
    info = 
    
               name: 'Time'
           datatype: 'cdf_epoch'
        numElements: 1
               dims: []
        recVariance: 1
        dimVariance: [] 
    
    info  = cdflib.inquireVar(cdfid,1)
    
    info = 
    
               name: 'Longitude'
           datatype: 'cdf_int1'
        numElements: 1
               dims: [2 2]
        recVariance: 0
        dimVariance: [1 0]
  4. Read the data in a variable into the workspace. The first variable contains CDF Epoch time values. The low-level interface returns these as double values.

    data_time = cdflib.getVarRecordData(cdfid,0,0)
    
    data_time =
    
      6.3146e+013
    
    % convert the time value to a time vector
    timeVec = cdflib.epochBreakdown(data_time)
    
    timeVec =
    
            2001
               1
               1
               0
               0
               0
               0
  5. Read a global attribute from the file.

    % Determine which attributes are global.
    info = cdflib.inquireAttr(cdfid,0)
    
    info = 
    
             name: 'SampleAttribute'
            scope: 'GLOBAL_SCOPE'
        maxgEntry: 4
         maxEntry: -1
    
    % Read the value of the attribute. Note you must use the 
    % cdflib.getAttrgEntry function for global attributes.
    value = cdflib.getAttrgEntry(cdfid,0,0)
    
    value =
    
    This is a sample entry.
    
  6. Close the CDF file.

    cdflib.close(cdfid);
    

Importing Network Common Data Form (NetCDF) Files and OPeNDAP Data

Network Common Data Form (NetCDF) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. NetCDF is used by a wide range of engineering and scientific fields that want a standard way to store data so that it can be shared. For more information, read the NetCDF documentation available at the Unidata Web site.

MATLAB provides two methods to import data from a NetCDF file or from an OPeNDAP source:

Using the MATLAB High-Level NetCDF Functions to Import Data

MATLAB includes several functions that you can use to examine the contents of a NetCDF file and import data from the file into the MATLAB workspace.

For details about how to use these functions, see their reference pages, which include examples. The following section illustrates how to use these functions to perform a common task: finding all the unlimited dimensions in a NetCDF file.

Finding All Unlimited Dimensions in a NetCDF File.  This example shows how to find all unlimited dimensions in an existing NetCDF file, visually and programmatically.

  1. To determine which dimensions in a NetCDF file are unlimited, display the contents of the example NetCDF file, using ncdisp. The ncdisp function identifies unlimited dimensions with the label UNLIMITED.

    Source:
               \\matlabroot\toolbox\matlab\demos\example.nc
    Format:
               netcdf4
    Global Attributes:
               creation_date = '29-Mar-2010'
    Dimensions:
               x = 50
               y = 50
               z = 5
    .
    .
    .
    Groups:
        
        /grid2/
            Attributes:
                       description = 'This is another group attribute.'
            Dimensions:
                       x    = 360
                       y    = 180
                       time = 0     (UNLIMITED)
            Variables:
                temp
                       Size:       []
                       Dimensions: x,y,time
                       Datatype:   int16
  2. To determine all unlimited dimensions programmatically, first get information about the file using ncinfo. This example gets information about a particular group in the file.

    ginfo = ncinfo('example.nc','/grid2/');
    
  3. Get a vector of the Boolean values that indicate, for this group, which dimension is unlimited.

    unlimDims = [finfo.Dimensions.Unlimited]
    
    unlimDims =
    
         0     0     1
  4. Use this vector to display the unlimited dimension.

    disp(ginfo.Dimensions(unlimDims))
             Name: 'time'
           Length: 0
        Unlimited: 1
    

Using the MATLAB Low-Level NetCDF Functions to Import Data

MATLAB provides access to the routines in the NetCDF C library that you can use to read data from NetCDF files and write data to NetCDF files. MATLAB provides this access through a set of MATLAB functions that correspond to the functions in the NetCDF C library. MATLAB groups the functions into a package, called netcdf. To call one of the functions in the package, you must specify the package name. For a complete list of all the functions, see netcdf.

This section does not describe all features of the NetCDF library or explain basic NetCDF programming concepts. To use the MATLAB NetCDF functions effectively, you should be familiar with the information about NetCDF contained in the NetCDF C Interface Guide.

Mapping NetCDF API Syntax to MATLAB Function Syntax.  For the most part, the MATLAB NetCDF functions correspond directly to routines in the NetCDF C library. For example, the MATLAB function netcdf.open corresponds to the NetCDF library routine nc_open. In some cases, one MATLAB function corresponds to a group of NetCDF library functions. For example, instead of creating MATLAB versions of every NetCDF library nc_put_att_type function, where type represents a data type, MATLAB uses one function, netcdf.putAtt, to handle all supported data types.

The syntax of the MATLAB functions is similar to the NetCDF library routines, with some exceptions. For example, the NetCDF C library routines use input parameters to return data, while their MATLAB counterparts use one or more return values. For example, the following is the function signature of the nc_open routine in the NetCDF library. Note how the NetCDF file identifier is returned in the ncidp argument.

int nc_open (const char *path, int omode, int *ncidp); /* C syntax */

The following shows the signature of the corresponding MATLAB function, netcdf.open. Like its NetCDF C library counterpart, the MATLAB NetCDF function accepts a character string that specifies the file name and a constant that specifies the access mode. Note, however, that the MATLAB netcdf.open function returns the file identifier, ncid, as a return value.

ncid = netcdf.open(filename, mode)

To see a list of all the functions in the MATLAB NetCDF package, see the netcdf reference page.

Exploring the Contents of a NetCDF File.  This example shows how to use the MATLAB NetCDF functions to explore the contents of a NetCDF file. The section uses the example NetCDF file included with MATLAB, example.nc, as an illustration. For an example of reading data from a NetCDF file, see Reading Data from a NetCDF File

  1. Open the NetCDF file using the netcdf.open function. This function returns an identifier that you use thereafter to refer to the file. The example opens the file for read-only access, but you can specify other access modes. For more information about modes, see netcdf.open.

    ncid = netcdf.open('example.nc','NC_NOWRITE');
  2. Explore the contents of the file using the netcdf.inq function. This function returns the number of dimensions, variables, and global attributes in the file, and returns the identifier of the unlimited dimension in the file. (An unlimited dimension can grow.)

    [ndims,nvars,natts,unlimdimID]= netcdf.inq(ncid)
    ndims =
    
         3
    
    
    nvars =
    
         3
    
    
    natts =
    
         1
    
    
    unlimdimID =
    
         -1
    
  3. Get more information about the dimensions, variables, and global attributes in the file by using NetCDF inquiry functions. For example, to get information about the global attribute, first get the name of the attribute, using the netcdf.inqAttName function. After you get the name, 'creation_date' in this case, you can use the netcdf.inqAtt function to get information about the data type and length of the attribute.

    To get the name of an attribute, you must specify the ID of the variable the attribute is associated with and the attribute number. To access a global attribute, which isn't associated with a particular variable, use the constant 'NC_GLOBAL' as the variable ID. The attribute number is a zero-based index that identifies the attribute. For example, the first attribute has the index value 0, and so on.

    global_att_name = netcdf.inqAttName(ncid,netcdf.getConstant('NC_GLOBAL'),0)
    
    global_att_name =
    
    creation_date
    
    [xtype attlen] = netcdf.inqAtt(ncid,netcdf.getConstant('NC_GLOBAL'),global_att_name)
    
    xtype =
    
         2
    
    
    attlen =
    
        11
  4. Get the value of the attribute, using the netcdf.getAtt function.

    global_att_value = netcdf.getAtt(ncid,netcdf.getConstant('NC_GLOBAL'),global_att_name)
    
    global_att_value =
    
    29-Mar-2010
  5. Get information about the dimensions defined in the file through a series of calls to netcdf.inqDim. This function returns the name and length of the dimension. The netcdf.inqDim function requires the dimension ID, which is a zero-based index that identifies the dimensions. For example, the first dimension has the index value 0, and so on.

    [dimname, dimlen] = netcdf.inqDim(ncid,0)
    
    dimname =
    
    x
    
    dimlen =
    
        50
  6. Get information about the variables in the file through a series of calls to netcdf.inqVar. This function returns the name, data type, dimension ID, and the number of attributes associated with the variable. The netcdf.inqVar function requires the variable ID, which is a zero-based index that identifies the variables. For example, the first variable has the index value 0, and so on.

    [varname, vartype, dimids, natts] = netcdf.inqVar(ncid,0)
    
    varname =
    
    avagadros_number
    
    
    vartype =
    
         6
    
    
    dimids =
    
         []
    
    
    natts =
    
         1

    The data type information returned in vartype is the numeric value of the NetCDF data type constants, such as, NC_INT and NC_BYTE. See the NetCDF documentation for information about these constants.

Reading Data from a NetCDF File.  After you understand the contents of a NetCDF file, by using the inquiry functions, you can retrieve the data from the variables and attributes in the file. To read the data associated with the variable avagadros_number in the example file, use the netcdf.getVar function. The following example uses the NetCDF file identifier returned in the previous section, Exploring the Contents of a NetCDF File. The variable ID is a zero-based index that identifies the variables. For example, the first variable has the index value 0, and so on. (To learn how to write data to a NetCDF file, see Exporting (Writing) Data to a NetCDF File.)

A_number = netcdf.getVar(ncid,0)

A_number =

  6.0221e+023

The NetCDF functions automatically choose the MATLAB class that best matches the NetCDF data type, but you can also specify the class of the return data by using an optional argument to netcdf.getVar. The following table shows the default mapping. For more information about NetCDF data types, see the NetCDF C Interface Guide.

NetCDF Data TypeMATLAB ClassNotes
NC_BYTEint8 or uint8NetCDF interprets byte data as either signed or unsigned.
NC_CHARchar 
NC_SHORTint16 
NC_INTint32 
NC_FLOATsingle 
NC_DOUBLEdouble 

Troubleshooting OPeNDAP Connections

If you have trouble reading OPeNDAP data, consider the following:

Importing Flexible Image Transport System (FITS) Files

The FITS file format is the standard data format used in astronomy, endorsed by both NASA and the International Astronomical Union (IAU). For more information about the FITS standard, go to the FITS Web site, http://fits.gsfc.nasa.gov/.

The FITS file format is designed to store scientific data sets consisting of multidimensional arrays (1-D spectra, 2-D images, or 3-D data cubes) and two-dimensional tables containing rows and columns of data. A data file in FITS format can contain multiple components, each marked by an ASCII text header followed by binary data. The first component in a FITS file is known as the primary, which can be followed by any number of other components, called extensions, in FITS terminology. For a complete list of extensions, see fitsread.

To get information about the contents of a Flexible Image Transport System (FITS) file, use the fitsinfo function. The fitsinfo function returns a structure containing the information about the file and detailed information about the data in the file.

To import data into the MATLAB workspace from a Flexible Image Transport System (FITS) file, use the fitsread function. Using this function, you can import the primary data in the file or you can import the data in any of the extensions in the file, such as the Image extension, as shown in this example.

  1. Determine which extensions the FITS file contains, using the fitsinfo function.

    info = fitsinfo('tst0012.fits')
    
    info = 
    
           Filename: 'matlabroot\tst0012.fits'
        FileModDate: '12-Mar-2001 19:37:46'
           FileSize: 109440
           Contents: {'Primary'  'Binary Table'  'Unknown'  'Image'  'ASCII Table'}
        PrimaryData: [1x1 struct]
        BinaryTable: [1x1 struct]
            Unknown: [1x1 struct]
              Image: [1x1 struct]
         AsciiTable: [1x1 struct]

    The info structure shows that the file contains several extensions including the Binary Table, ASCII Table, and Image extensions.

  2. Read data from the file.

    To read the Primary data in the file, specify the filename as the only argument:

    pdata = fitsread('tst0012.fits');

    To read any of the extensions in the file, you must specify the name of the extension as an optional parameter. This example reads the Binary Table extension from the FITS file:

    bindata = fitsread('tst0012.fits','binarytable');

Importing Hierarchical Data Format (HDF5) Files

Hierarchical Data Format, Version 5, (HDF5) is a general-purpose, machine-independent standard for storing scientific data in files, developed by the National Center for Supercomputing Applications (NCSA). HDF5 is used by a wide range of engineering and scientific fields that want a standard way to store data so that it can be shared. For more information about the HDF5 file format, read the HDF5 documentation available at the HDF Web site (http://www.hdfgroup.org).

MATLAB provides two methods to import data from an HDF5 file:

Using the High-Level HDF5 Functions to Import Data

MATLAB includes several functions that you can use to examine the contents of an HDF5 file and import data from the file into the MATLAB workspace.

For details about how to use these functions, see their reference pages, which include examples. The following sections illustrate some common usage scenarios.

Determining the Contents of an HDF5 File.  HDF5 files can contain data and metadata, called attributes. HDF5 files organize the data and metadata in a hierarchical structure similar to the hierarchical structure of a UNIX® file system.

In an HDF5 file, the directories in the hierarchy are called groups. A group can contain other groups, data sets, attributes, links, and data types. A data set is a collection of data, such as a multidimensional numeric array or string. An attribute is any data that is associated with another entity, such as a data set. A link is similar to a UNIX file system symbolic link. Links are a way to reference objects without having to make a copy of the object.

Data types are a description of the data in the data set or attribute. Data types tell how to interpret the data in the data set.

To get a quick view into the contents of an HDF5 file, use the h5disp function.

h5disp('example.h5')

HDF5 example.h5 
Group '/' 
    Attributes:
        'attr1':  97 98 99 100 101 102 103 104 105 0 
        'attr2':  2x2 H5T_INTEGER
    Group '/g1' 
        Group '/g1/g1.1' 
            Dataset 'dset1.1.1' 
                Size:  10x10
                MaxSize:  10x10
                Datatype:   H5T_STD_I32BE (int32)
                ChunkSize:  []
                Filters:  none
                Attributes:
                    'attr1':  49 115 116 32 97 116 116 114 105 ... 
                    'attr2':  50 110 100 32 97 116 116 114 105 ... 
            Dataset 'dset1.1.2' 
                Size:  20
                MaxSize:  20
                Datatype:   H5T_STD_I32BE (int32)
                ChunkSize:  []
                Filters:  none
        Group '/g1/g1.2' 
            Group '/g1/g1.2/g1.2.1' 
                Link 'slink'
                    Type:  soft link
    Group '/g2' 
        Dataset 'dset2.1' 
            Size:  10
            MaxSize:  10
            Datatype:   H5T_IEEE_F32BE (single)
            ChunkSize:  []
            Filters:  none
        Dataset 'dset2.2' 
            Size:  5x3
            MaxSize:  5x3
            Datatype:   H5T_IEEE_F32BE (single)
            ChunkSize:  []
            Filters:  none
					.
					.
					.

To explore the hierarchical organization of an HDF5 file, use the h5info function. h5info returns a structure that contains various information about the HDF5 file, including the name of the file.

info = h5info('example.h5')
info = 

         Filename: 'matlabroot\matlab\toolbox\matlab\demos\example.h5'
          Name: '/'
        Groups: [4x1 struct]
      Datasets: []
     Datatypes: []
         Links: []
    Attributes: [2x1 struct]

By looking at the Groups and Attributes fields, you can see that the file contains two groups and two attributes. The Datasets, Datatypes, and Links fields are all empty, indicating that the root group does not contain any data sets, data types, or links. To explore the contents of the sample HDF5 file further, examine one of the two structures in Groups. The following example shows the contents of the second structure in this field.

level2 = info.Groups(2)

level2 = 

          Name: '/g2'
        Groups: []
      Datasets: [2x1 struct]
     Datatypes: []
         Links: []
    Attributes: []

In the sample file, the group named /g2 contains two data sets. The following figure illustrates this part of the sample HDF5 file organization.

To get information about a data set, such as its name, dimensions, and data type, look at either of the structures returned in the Datasets field.

dataset1 = level2.Datasets(1)

dataset1 = 
      Filename: 'matlabroot\example.h5'
          Name: '/g2/dset2.1'
          Rank: 1
      Datatype: [1x1 struct]
          Dims: 10
       MaxDims: 10
        Layout: 'contiguous'
    Attributes: []
         Links: []
     Chunksize: []
     Fillvalue: []

Importing Data from an HDF5 File.  To read data or metadata from an HDF5 file, use the h5read function. As arguments, specify the name of the HDF5 file and the name of the data set. (To read the value of an attribute, you must use h5readatt.)

To illustrate, this example reads the data set, /g2/dset2.1 from the HDF5 sample file example.h5.

data = h5read('example.h5','/g2/dset2.1')

data =

    1.0000
    1.1000
    1.2000
    1.3000
    1.4000
    1.5000
    1.6000
    1.7000
    1.8000
    1.9000

Mapping HDF5 Datatypes to MATLAB Datatypes.  When the h5read function reads data from an HDF5 file into the MATLAB workspace, it maps HDF5 data types toMATLAB data types, as shown in the table below.

HDF5 Data Typeh5read Returns
Bit-fieldArray of packed 8-bit integers
FloatMATLAB single and double types, provided that they occupy 64 bits or fewer
Integer types, signed and unsignedEquivalent MATLAB integer types, signed and unsigned
OpaqueArray of uint8 values
ReferenceReturns the actual data pointed to by the reference, not the value of the reference.
Strings, fixed-length and variable lengthCell array of strings
EnumsCell array of strings, where each enumerated value is replaced by the corresponding member name
Compound1-by-1 struct array; the dimensions of the dataset are expressed in the fields of the structure.
ArraysArray of values using the same datatype as the HDF5 array. For example, if the array is of signed 32-bit integers, the MATLAB array will be of type int32.

The example HDF5 file included with MATLAB includes examples of all these datatypes.

For example, the data set /g3/string is a string.

h5disp('example.h5','/g3/string')
HDF5 example.h5 
Dataset 'string' 
    Size:  2
    MaxSize:  2
    Datatype:   H5T_STRING
        String Length: 3
        Padding: H5T_STR_NULLTERM
        Character Set: H5T_CSET_ASCII
        Character Type: H5T_C_S1
    ChunkSize:  []
    Filters:  none
    FillValue:  ''

Now read the data from the file, MATLAB returns it as a cell array of strings.

s = h5read('example.h5','/g3/string')

s = 

    'ab '
    'de '

>> whos s
  Name      Size            Bytes  Class    Attributes

  s         2x1               236  cell  

The compound data types are always returned as a 1-by-1 struct. The dimensions of the data set are expressed in the fields of the struct. For example, the data set /g3/compound2D is a compound datatype.

h5disp('example.h5','/g3/compound2D')
HDF5 example.h5 
Dataset 'compound2D' 
    Size:  2x3
    MaxSize:  2x3
    Datatype:   H5T_COMPOUND
        Member 'a':  H5T_STD_I8LE (int8)
        Member 'b':  H5T_IEEE_F64LE (double)
    ChunkSize:  []
    Filters:  none
    FillValue:  H5T_COMPOUND

Now read the data from the file, MATLAB returns it as a 1-by-1 struct.

data = h5read('example.h5','/g3/compound2D')

data = 

    a: [2x3 int8]
    b: [2x3 double]

Using the Low-Level HDF5 Functions to Import Data

MATLAB provides direct access to dozens of functions in the HDF5 library with low-level functions that correspond to the functions in the HDF5 library. In this way, you can access the features of the HDF5 library from MATLAB, such as reading and writing complex data types and using the HDF5 subsetting capabilities. For more information, see Using the MATLAB Low-Level HDF5 Functions to Export Data.

Importing Hierarchical Data Format (HDF4) Files

Hierarchical Data Format (HDF4) is a general-purpose, machine-independent standard for storing scientific data in files, developed by the National Center for Supercomputing Applications (NCSA). For more information about these file formats, read the HDF documentation at the HDF Web site (www.hdfgroup.org).

HDF-EOS is an extension of HDF4 that was developed by the National Aeronautics and Space Administration (NASA) for storage of data returned from the Earth Observing System (EOS). For more information about this extension to HDF4, see the HDF-EOS documentation at the NASA Web site (www.hdfeos.org).

MATLAB includes several options for importing HDF4 files, discussed in the following sections:

Using the HDF Import Tool

The HDF Import Tool is a graphical user interface that you can use to navigate through HDF4 or HDF-EOS files and import data from them. Importing data using the HDF Import Tool involves these steps:

The following sections provide more detail about each of these steps.

Step 1: Opening an HDF4 File in the HDF Import Tool.  Open an HDF4 or HDF-EOS file in MATLAB using one of the following methods:

Viewing a File in the HDF Import Tool.  

When you open an HDF4 or HDF-EOS file in the HDF Import Tool, the tool displays the contents of the file in the Contents pane. You can use this pane to navigate within the file to see what data sets it contains. You can view the contents of HDF-EOS files as HDF data sets or as HDF-EOS files. The icon in the contents pane indicates the view, as illustrated in the following figure. Note that these are just two views of the same data.

Step 2: Selecting a Data Set in an HDF File.  To import a data set, you must first select the data set in the contents pane of the HDF Import Tool. Use the Contents pane to view the contents of the file and navigate to the data set you want to import.

For example, the following figure shows the data set Example SDS in the HDF file selected. Once you select a data set, the Metadata panel displays information about the data set and the importing and subsetting pane displays subsetting options available for this type of HDF object.

Step 3: Specifying a Subset of the Data (Optional).  When you select a data set in the contents pane, the importing and subsetting pane displays the subsetting options available for that type of HDF object. The subsetting options displayed vary depending on the type of HDF object. For more information, see Using the HDF Import Tool Subsetting Options.

Step 4: Importing Data and Metadata.  To import the data set you have selected, click the Import button, bottom right corner of the Importing and Subsetting pane. Using the Importing and Subsetting pane, you can

The following figure shows how to specify these options in the HDF Import Tool.

Step 5: Closing HDF Files and the HDF Import Tool.  To close a file, select the file in the contents pane and click Close File on the HDF Import Tool File menu.

To close all the files open in the HDF Import Tool, click Close All Files on the HDF Import Tool File menu.

To close the tool, click Close HDFTool in the HDF Import Tool File menu or click the Close button in the upper right corner of the tool.

If you used the hdftool syntax that returns a handle to the tool,

h = hdftool('example.hdf')

you can use the close(h) command to close the tool from the MATLAB command line.

Using the HDF Import Tool Subsetting Options

When you select a data set, the importing and subsetting pane displays the subsetting options available for that type of data set. The following sections provide information about these subsetting options for all supported data set types. For general information about the HDF Import tool, see Using the HDF Import Tool.

HDF Scientific Data Sets (SD).  The HDF scientific data set (SD) is a group of data structures used to store and describe multidimensional arrays of scientific data. Using the HDF Import Tool subsetting parameters, you can import a subset of an HDF scientific data set by specifying the location, range, and number of values to be read along each dimension.

The subsetting parameters are:

HDF Vdata.  HDF Vdata data sets provide a framework for storing customized tables. A Vdata table consists of a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. Each field is identified by a name. The following figure illustrates a Vdata table.

You can import a subset of an HDF Vdata data set in the following ways:

The following figure shows how you specify these subsetting parameters for Vdata.

HDF-EOS Grid Data.  In HDF-EOS Grid data, a rectilinear grid overlays a map. The map uses a known map projection. The HDF Import Tool supports the following mutually exclusive subsetting options for Grid data:

To access these options, click the Subsetting method menu in the importing and subsetting pane.

Direct Index.  

You can import a subset of an HDF-EOS Grid data set by specifying the location, range, and number of values to be read along each dimension.

Each row represents a dimension in the data set and each column represents these subsetting parameters:

Geographic Box.  

You can import a subset of an HDF-EOS Grid data set by specifying the rectangular area of the grid that you are interested in. To define this rectangular area, you must specify two points, using longitude and latitude in decimal degrees. These points are two corners of the rectangular area. Typically, Corner 1 is the upper-left corner of the box, and Corner 2 is the lower-right corner of the box.

Optionally, you can further define the subset of data you are interested in by using Time parameters (see Time) or by specifying other User-Defined subsetting parameters (see User-Defined).

Interpolation.  

Interpolation is the process of estimating a pixel value at a location in between other pixels. In interpolation, the value of a particular pixel is determined by computing the weighted average of some set of pixels in the vicinity of the pixel.

You define the region used for bilinear interpolation by specifying two points that are corners of the interpolation area:

Pixels.  

You can import a subset of the pixels in a Grid data set by defining a rectangular area over the grid. You define the region used for bilinear interpolation by specifying two points that are corners of the interpolation area:

Tile.  

In HDF-EOS Grid data, a rectilinear grid overlays a map. Each rectangle defined by the horizontal and vertical lines of the grid is referred to as a tile. If the HDF-EOS Grid data is stored as tiles, you can import a subset of the data by specifying the coordinates of the tile you are interested in. Tile coordinates are 1-based, with the upper-left corner of a two-dimensional data set identified as 1,1. In a three-dimensional data set, this tile would be referenced as 1,1,1.

Time.  

You can import a subset of the Grid data set by specifying a time period. You must specify both the start time and the stop time (the endpoint of the time span). The units (hours, minutes, seconds) used to specify the time are defined by the data set.

Along with these time parameters, you can optionally further define the subset of data to import by supplying user-defined parameters.

User-Defined.  

You can import a subset of the Grid data set by specifying user-defined subsetting parameters.

When specifying user-defined parameters, you must first specify whether you are subsetting along a dimension or by field. Select the dimension or field by name using the Dimension or Field Name menu. Dimension names are prefixed with the characters DIM:.

Once you specify the dimension or field, you use Min and Max to specify the range of values that you want to import. For dimensions, Min and Max represent a range of elements. For fields, Min and Max represent a range of values.

HDF-EOS Point Data.  HDF-EOS Point data sets are tables. You can import a subset of an HDF-EOS Point data set by specifying field names and level. Optionally, you can refine the subsetting by specifying the range of records you want to import, by defining a rectangular area, or by specifying a time period. For information about specifying a rectangular area, see Geographic Box. For information about subsetting by time, see Time.

HDF-EOS Swath Data.  HDF-EOS Swath data is data that is produced by a satellite as it traces a path over the earth. This path is called its ground track. The sensor aboard the satellite takes a series of scans perpendicular to the ground track. Swath data can also include a vertical measure as a third dimension. For example, this vertical dimension can represent the height above the Earth of the sensor.

The HDF Import Tool supports the following mutually exclusive subsetting options for Swath data:

To access these options, click the Subsetting method menu in the Importing and Subsetting pane.

Direct Index.  

You can import a subset of an HDF-EOS Swath data set by specifying the location, range, and number of values to be read along each dimension.

Each row represents a dimension in the data set and each column represents these subsetting parameters:

Geographic Box.  

You can import a subset of an HDF-EOS Swath data set by specifying the rectangular area of the grid that you are interested in and by specifying the selection Mode.

You define the rectangular area by specifying two points that specify two corners of the box:

You specify the selection mode by choosing the type of Cross Track Inclusion and the Geolocation mode. The Cross Track Inclusion value determines how much of the area of the geographic box that you define must fall within the boundaries of the swath.

Select from these values:

The Geolocation Mode value specifies whether geolocation fields and data must be in the same swath.

Select from these values:

Time.  

You can optionally also subset swath data by specifying a time period. The units used (hours, minutes, seconds) to specify the time are defined by the data set

User-Defined.  

You can optionally also subset a swath data set by specifying user-defined parameters.

When specifying user-defined parameters, you must first specify whether you are subsetting along a dimension or by field. Select the dimension or field by name using the Dimension or Field Name menu. Dimension names are prefixed with the characters DIM:.

Once you specify the dimension or field, you use Min and Max to specify the range of values that you want to import. For dimensions, Min and Max represent a range of elements. For fields, Min and Max represent a range of values.

HDF Raster Image Data.  For 8-bit HDF raster image data, you can specify the colormap.

Using the MATLAB HDF4 High-Level Functions

To import data from an HDF or HDF-EOS file, you can use the MATLAB HDF4 high-level function hdfread. The hdfread function provides a programmatic way to import data from an HDF4 or HDF-EOS file that still hides many of the details that you need to know if you use the low-level HDF functions, described in Using the HDF4 Low-Level Functions. You can also import HDF4 data using an interactive GUI, described in Using the HDF Import Tool.

This section describes these high-level MATLAB HDF functions, including

To export data to an HDF4 file, you must use the MATLAB HDF4 low-level functions.

Using hdfinfo to Get Information About an HDF4 File.  To get information about the contents of an HDF4 file, use the hdfinfo function. The hdfinfo function returns a structure that contains information about the file and the data in the file.

This example returns information about a sample HDF4 file included with MATLAB:

info = hdfinfo('example.hdf')

info = 

      Filename: 'matlabroot\example.hdf'
    Attributes: [1x2 struct]
        Vgroup: [1x1 struct]
           SDS: [1x1 struct]
         Vdata: [1x1 struct]

To get information about the data sets stored in the file, look at the SDS field.

Using hdfread to Import Data from an HDF4 File.   To use the hdfread function, you must specify the data set that you want to read. You can specify the filename and the data set name as arguments, or you can specify a structure returned by the hdfinfo function that contains this information. The following example shows both methods. For information about how to import a subset of the data in a data set, see Reading a Subset of the Data in a Data Set.

  1. Determine the names of data sets in the HDF4 file, using the hdfinfo function.

    info = hdfinfo('example.hdf')
    
    info = 
    
          Filename: 'matlabroot\example.hdf'
        Attributes: [1x2 struct]
            Vgroup: [1x1 struct]
               SDS: [1x1 struct]
             Vdata: [1x1 struct]

    To determine the names and other information about the data sets in the file, look at the contents of the SDS field. The Name field in the SDS structure gives the name of the data set.

    dsets = info.SDS
    
    dsets = 
    
           Filename: 'example.hdf'
               Type: 'Scientific Data Set'
               Name: 'Example SDS'
               Rank: 2
           DataType: 'int16'
         Attributes: []
               Dims: [2x1 struct]
              Label: {}
        Description: {}
              Index: 0
  2. Read the data set from the HDF4 file, using the hdfread function. Specify the name of the data set as a parameter to the function. Note that the data set name is case sensitive. This example returns a 16-by-5 array:

    dset = hdfread('example.hdf', 'Example SDS')
    
    dset =
    
          3      4      5      6      7
          4      5      6      7      8
          5      6      7      8      9
          6      7      8      9     10
          7      8      9     10     11
          8      9     10     11     12
          9     10     11     12     13
         10     11     12     13     14
         11     12     13     14     15
         12     13     14     15     16
         13     14     15     16     17
         14     15     16     17     18
         15     16     17     18     19
         16     17     18     19     20
         17     18     19     20     21
         18     19     20     21     22

    Alternatively, you can specify the specific field in the structure returned by hdfinfo that contains this information. For example, to read a scientific data set, use the SDS field.

    dset = hdfread(info.SDS);
    
Reading a Subset of the Data in a Data Set.  

To read a subset of a data set, you can use the optional 'index' parameter. The value of the index parameter is a cell array of three vectors that specify the location in the data set to start reading, the skip interval (e.g., read every other data item), and the amount of data to read (e.g., the length along each dimension). In HDF4 terminology, these parameters are called the start, stride, and edge values.

For example, this code

Using the HDF4 Low-Level Functions

This section describes how to use MATLAB functions to access the HDF4 Application Programming Interfaces (APIs). These APIs are libraries of C routines. To import or export data, you must use the functions in the HDF4 API associated with the particular HDF4 data type you are working with. Each API has a particular programming model, that is, a prescribed way to use the routines to write data sets to the file. To illustrate this concept, this section describes the programming model of one particular HDF4 API: the HDF4 Scientific Data (SD) API. For a complete list of the HDF4 APIs supported by MATLAB and the functions you use to access each one, see the hdf reference page.

This section includes the following:

Mapping HDF4 to MATLAB Syntax.  Each HDF4 API includes many individual routines that you use to read data from files, write data to files, and perform other related functions. For example, the HDF4 Scientific Data (SD) API includes separate C routines to open (SDopen), close (SDend), and read data (SDreaddata).

Instead of supporting each routine in the HDF4 APIs, MATLAB provides a single function that serves as a gateway to all the routines in a particular HDF4 API. For example, the HDF Scientific Data (SD) API includes the C routine SDend to close an HDF4 file:

status = SDend(sd_id); /* C code */

To call this routine from MATLAB, use the MATLAB function associated with the SD API, hdfsd. You must specify the name of the routine, minus the API acronym, as the first argument and pass any other required arguments to the routine in the order they are expected. For example,

status = hdfsd('end',sd_id); % MATLAB code

Some HDF4 API routines use output arguments to return data. Because MATLAB does not support output arguments, you must specify these arguments as return values.

For example, the SDfileinfo routine returns data about an HDF4 file in two output arguments, ndatasets and nglobal_atts. Here is the C code:

status = SDfileinfo(sd_id, ndatasets, nglobal_atts);

To call this routine from MATLAB, change the output arguments into return values:

[ndatasets, nglobal_atts, status] = hdfsd('fileinfo',sd_id);

Specify the return values in the same order as they appear as output arguments. The function status return value is always specified as the last return value.

Step 1: Opening the HDF4 File.  

To import an HDF4 SD data set, you must first open the file using the SD API routine SDstart. (In HDF4 terminology, the numeric arrays stored in HDF4 files are called data sets.) In MATLAB, you use the hdfsd function, specifying as arguments:

For example, this code opens the file mydata.hdf for read access:

sd_id = hdfsd('start','mydata.hdf','read');

If SDstart can find and open the file specified, it returns an HDF4 SD file identifier, named sd_id in the example. Otherwise, it returns -1.

Step 2: Retrieving Information About the HDF4 File.  To get information about an HDF4 file, you must use the SD API routine SDfileinfo. This function returns the number of data sets in the file and the number of global attributes in the file, if any. (For more information about global attributes, see Exporting to Hierarchical Data Format (HDF4) Files.) In MATLAB, you use the hdfsd function, specifying the following arguments:

In this example, the HDF4 file contains three data sets and one global attribute.

[ndatasets, nglobal_atts, stat] = hdfsd('fileinfo',sd_id)

ndatasets =
    3

nglobal_atts =
    1

status =
    0

Step 3: Retrieving Attributes from an HDF4 File (Optional).  HDF4 files can optionally include information, called attributes, that describes the data the file contains. Attributes associated with an entire HDF4 file are called global attributes. Attributes associated with a data set are called local attributes. (You can also associate attributes with files or dimensions. For more information, see Step 4: Writing Metadata to an HDF4 File.)

To retrieve attributes from an HDF4 file, use the HDF4 API routine SDreadattr. In MATLAB, use the hdfsd function, specifying as arguments:

For example, this code returns the contents of the first global attribute, which is the character string my global attribute:

attr_idx = 0;
[attr, status] = hdfsd('readattr', sd_id, attr_idx); 

attr =
    my global attribute

Step 4: Selecting the Data Sets to Import.  To select a data set, use the SD API routine SDselect. In MATLAB, you use the hdfsd function, specifying as arguments:

If SDselect finds the specified data set in the file, it returns an HDF4 SD data set identifier, called sds_id in the example. If it cannot find the data set, it returns -1.

sds_id = hdfsd('select',sd_id,1)

Step 5: Getting Information About a Data Set.  To read a data set, you must get information about the data set, such as its name, size, and data type. In the HDF4 SD API, you use the SDgetinfo routine to gather this information. In MATLAB, use the hdfsd function, specifying as arguments:

This code retrieves information about the data set identified by sds_id:

[dsname, dsndims, dsdims, dstype, dsatts, stat] = 
              hdfsd('getinfo',sds_id)
dsname =
      A

dsndims =
      2

dsdims =
      5     3

dstype =
      double

dsatts =
      0

stat =
      0

Step 6: Reading Data from the HDF4 File.  To read data from an HDF4 file, you must use the SDreaddata routine. In MATLAB, use the hdfsd function, specifying as arguments:

For example, to read the entire contents of a data set, use this code:

[ds_name, ds_ndims, ds_dims, ds_type, ds_atts, stat] = 
hdfsd('getinfo',sds_id);

ds_start = zeros(1,ds_ndims); % Creates the vector [0 0]
ds_stride = []; 
ds_edges = ds_dims; 

[ds_data, status] = 
            hdfsd('readdata',sds_id,ds_start,ds_stride,ds_edges);

disp(ds_data)
    1    2    3    4    5
    6    7    8    9    10
   11   12   13   14    15

To read less than the entire data set, use the start, stride, and edges vectors to specify where you want to start reading data and how much data you want to read. For example, this code reads the entire second row of the sample data set:

ds_start = [0 1]; % Start reading at the first column, second row
ds_stride = []; % Read each element
ds_edges = [5 1]; % Read a 1-by-5 vector of data 

[ds_data, status] = 
           hdfsd('readdata',sds_id,ds_start,ds_stride,ds_edges);

Step 7: Closing the HDF4 Data Set.  After writing data to a data set in an HDF4 file, you must close access to the data set. In the HDF4 SD API, you use the SDendaccess routine to close a data set. In MATLAB, use the hdfsd function, specifying as arguments:

For example, this code closes the data set:

stat = hdfsd('endaccess',sds_id);

You must close access to all the data sets in an HDF4 file before closing it.

Step 8: Closing the HDF4 File.  After writing data to a data set and closing the data set, you must also close the HDF4 file. In the HDF4 SD API, you use the SDend routine. In MATLAB, use the hdfsd function, specifying as arguments:

For example, this code closes the data set:

stat = hdfsd('end',sd_id);
  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS