Possible bug in H5D.write, truncation of VLEN strings

2 views (last 30 days)
Hello,
I have discovered a potential bug, or at least some flaky behavior when using the low level HDF5 write function. When I try to write a long string as a variable length string, it seems to get truncated at 512 bytes (511 + the terminating null). I can write it just fine as a fixed length string.
The minimal script below reproduces the error. I see this on R2012a on both Linux and Mac. Am I missing a parameter or function call that controls the VLEN buffer size, or is something improperly hard coded in the underlying mex function?
Cheers, Souheil
-------------
% Create a long string
str = repmat('Hello from matlab. ',[1 1000]);
fprintf('Size of string = %d\n',length(str));
% Create an HDF5 file
filename = 'vlen_string_bug.h5';
fid = H5F.create(filename,'H5F_ACC_TRUNC','H5P_DEFAULT','H5P_DEFAULT');
% Write to a dataset as a variable length string
VLstr_type = H5T.copy('H5T_C_S1');
H5T.set_size(VLstr_type,'H5T_VARIABLE');
space = H5S.create_simple(1, 1, []);
dset = H5D.create(fid, 'VLstr', VLstr_type, space, 'H5P_DEFAULT');
fprintf('Size of VLEN_BUF before = %d\n',H5D.vlen_get_buf_size(dset, VLstr_type, space));
H5D.write(dset, VLstr_type, 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT', {str});
fprintf('Size of VLEN_BUF after = %d\n',H5D.vlen_get_buf_size(dset, VLstr_type, space));
H5T.close(VLstr_type);
H5S.close(space);
H5D.close(dset);
% Write to a dataset as a fixed length string
Fstr_type = H5T.copy('H5T_C_S1');
H5T.set_size(Fstr_type, length(str));
space = H5S.create_simple (1, 1, []);
dset = H5D.create (fid, 'Fstr', Fstr_type, space, 'H5P_DEFAULT');
H5D.write(dset, Fstr_type, 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT', str);
H5T.close(Fstr_type);
H5S.close(space);
H5D.close(dset);
% Close the file
H5F.close(fid);
% Read the strings back in using the high level read function
t = h5read(filename,'/VLstr');
vlstr = t{1};
fprintf('Size of VLEN string on disk = %d\n',length(vlstr));
t = h5read(filename,'/Fstr');
fstr = t{1};
fprintf('Size of fixed string on disk = %d\n',length(fstr));

Accepted Answer

Souheil Inati
Souheil Inati on 27 Sep 2012
Looks like this is a bug in the R2012a mex files on mac and linux. It seems that R2012b resolves it. Thanks for everyone's input.

More Answers (1)

per isakson
per isakson on 15 Sep 2012
Edited: per isakson on 15 Sep 2012
I ran the example h5ex_t_vlstring with your long string. Yes, it is truncated as you state.
However, HDF5 User's Guide, page 228, says:
[...] a length and data buffer must be allocated.
I don't see how.
This is not much of an answer. However, could it be that 512 is a default value that needs to be replaced by an appropriate value.
  4 Comments
Oleg Komarov
Oleg Komarov on 15 Sep 2012
Edited: Oleg Komarov on 15 Sep 2012
I found a description on the fields for H5F.get_mdc_config on http://www.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-SetMdcConfig and maybe the properties set_initial_size and initial_size are relevant to the buffer.
However, I am unsure where to set those properties, at the File, dataset or property list level (H5F, H5D, H5P)...
I think it would be faster if you submitted a technical support request to TMW or to the HDFgroup.
Post any solution here (I am curious as well).
per isakson
per isakson on 16 Sep 2012
Edited: per isakson on 16 Sep 2012
Here is a link to hdf-forum. A few Matlab related questions have been answered there. I cannot really contribute.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!