TMW do seem to struggle with file formats
I agree with @Jan above that not having a switch to disable compression in both v7 and 7.3 is problematic.
As @Walter points out, v7 can be fast - but this is context-dependent. Problems arise because information about the variables disc class, size etc AND NAME is compressed along with the data. The v6 and v7 file format uses a linked list of data entries. If you have 26 variables named A...Z stored in that order, you will need to access each of A to Y to find Z. In v6, that is less of an issue but in v7 you will need to decompress each preceding variable to find the one you are after. This means
[1] data stored earlier in the file can be accessed more quickly and
[2] for v7, large data entries should be stored last.
For both v6 and v7, there is a problem with using -append. Both are conservative about using disc space and will squeeze the file to recover any space that is freed. So if you have a uint16 scalar called 'A' and replace its value using
save(filename, 'A', '-append);
up to 2Gb of data will be shifted upwards in the file before A is appended to the result. This squeeze seems to be done always - even when all conditions are met for the new entry just to overwrite the old one.
Note that v6 and v7 formats are both "Level 5". I would not recommend it, but you can mix save -appends with -v6 and -v7 to the same file without getting an immediate error. Whether or not you might eventually is another matter.
is a whos like function but provides information about the class of data on disc and byte offset in the file. That info can be used to do low-level i/o on a -v6 file or to memory map it.
can be used to rename a variable in the file to prevent its data area being reclaimed with a subsequent save -append.
I am presently updating these utilities to provide some easier to use functions e.g. getMap which simply returns a standard memmapfile object e.g.
map=getMap(filename, '/mydata/data/xdata');
There is, however, no good way to support v7 files - their design makes them useful only when the file is always to be loaded in its entirety or is small.
I am also adding v7.3 and HDF5 support. As v7.3 is HDF5-based you can use standard HDF5 tools such as HDFView http://www.hdfgroup.org/products/hdf5_tools/ to examine the content. The problems reported above might arise from the "chunk" size being inappropriate rather than gzip compression per se. I had hoped v7.3 would improve on v7 but from what is posted above it looks as though that may not be the case. Rather disappointingly, TMW support tell me that the v7.3 format "...is not pure HDF5 however and we never claimed that these files will be readable by any HDF5 libraries".
So it looks as though the pre-2004 v6 will often be the best option as it's
- the fastest
- documented and supported outside of MATLAB - as is v7 - e.g. in Octave, Python and R amongst others.
0 Comments
Sign in to comment.