Creating an HDF5 dataset

37 views (last 30 days)
Jonathan
Jonathan on 25 Jul 2012
Edited: per isakson on 26 May 2015
Hi, I am trying to create a .h5 file containing an HDF5 dataset. I am having trouble creating the HDF5 dataset. I want to the dataset to contain a struct, called "data". "Data" is a 1x1 struct that contains one 6464x2560 double, four 6464x1 doubles, one 1x4359 struct, one 1x1 struct, one 1x200 struct, one 6464x1 double, one 303x1 double, four 1x4 cells, and one 1x440 cell.

Answers (2)

per isakson
per isakson on 26 Jul 2012
Edited: per isakson on 27 Jul 2012
A Matlab structure cannot be stored in a HDF5 file as one dataset (except for simple ones with a few fields, the values of which are basic data types - see hdf5 compound datatype).
HDF5 have compound datatypes as described in the User Guide: "2.2.7. Creating and Defining Compound Datatypes". However, I'm not sure it is worth the trouble. (Yes, requires Matlab's low level HDF5 functions)
Matlab has high level and a low level support of HDF5. I would recommend that you use the high level functions or at least try them first. I have experimented (both high and low level) with time series (length 64000). My test h5-file is 1.2GB.
You need some "user stories/use cases" from both reading and writing to make your decisions.
The quick way to produce a HDF5-file is
save( 'h5test.mat', 'Data', '-v7.3' )
then you can explore TMW's way to store a structure
h5disp( 'h5test.mat' ), info = h5info( 'h5test.mat' );
that might be all you need. The function, matfile, provides an interface to this file, which is more flexible than save/load. It provides limited means to read and write piece-wise.
.
--- Continue ---
My tentative recipe:
  • regard the structure, Data, as a tree
  • the interior nodes, fields, translates to groups in the h5-file
  • the leafs, string and numerical array data, translates to datasets
  • write each leaf with a separate h5write-command (high level). It's ok performance-wise, if the data arrays of the leafs are reasonably large. Thousands of scalar leafs will be a problem - I guess.
  • if you have "write-once and read-many" use fixed size datasets, i.e. contiguous not chunked. Contiguous datasets seems to require less of Windows file cache and show better performance when RAM is limited.
.
--- In response to comment 1 ---
Your structure, "struct contains structs within structs within structs within structs", cannot be store in a Compound Datatype of HDF5. I shouldn't have mentioned it. It might be useful for simple structures with a few fields, the values of which are basic data types (of C). (I don't know whether "Compound Datatypes" can be nested, but too me that really "smells".)
The low level HDF5-functions of Matlab are little documented and not really "matlabish". One the other hand the high level HDF5-functions (R2012a) are simple to use and surprisingly (read surprised me) powerful (both expressiveness and performance).
I see two possibilities:
  1. save( 'h5test.mat', 'Data', '-v7.3' )
  2. my tentative recipe for structures (see above). A similar approach is needed for cell arrays.
Underneath the hood, save applies something similar to "my tentative recipe".
I would definitely try save first. I guess it is worth to check whether
  2 Comments
Jonathan
Jonathan on 26 Jul 2012
I have successfully created the dataset for the doubles. However, the trouble is the structs and the cells. The way I have been creating the dataset is by creating a group for each for each field within data. For example,
h5create('h5test.h5','/data/field1',[size(data.field1,1), size(data.field1,2)]);
h5create('h5test.h5','/data/field2',[size(data.field2,1), size(data.field2,2)]);
etc... And then writing to them by:
h5write('h5test.h5','/data/field1',data.field1);
h5write('h5test.h5','/data/field1',data.field1);
etc... The problem comes when I get to the struct within the data struct. This struct contains structs within structs within structs within structs.... I would have to code hundreds of thousands of lines! There must be an easier way to create an .h5 file with an HDF5 dataset for structs.
Also, I should explain that the reason I am making this .h5 file is so that I can make the data struct interchangeable and compatible with the C++ version of my MATLAB program. The C++ version must be able to read in the .h5 file.
per isakson
per isakson on 27 Jul 2012
See answer above

Sign in to comment.


John
John on 26 Jul 2012
Edited: per isakson on 26 May 2015
The HDF5 high level routines H5CREATE and H5WRITE will not allow compound datasets (struct), so you would need to do the low level interfaces to do that. Take a look at this thread from a couple of days ago.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!