Different runs producing the same structure/data HDF5 file lead to different md5sum hash digests. How to solve it?
Show older comments
I have a matlab script which prroduces a HDF55 dataset. The MD5SUM for this fiile changes accross different runs, although the HDF5's content maintains identical... How to provent that?
I supposed it has to do with the creation timestamp that might be stored somewhere in the HDF5 file, but I didn't find it. This is how I am creating and storing values in it:
filename = "cpssm_dataset.h5";
file_path = fullfile(fileparts(mfilename('fullpath')), 'raw', 'cpssm', filename);
file_id = H5F.create(...
file_path, ... % filename
'H5F_ACC_TRUNC', ... % overwrite any existing file
'H5P_DEFAULT', ... % default file‐creation properties
'H5P_DEFAULT' ... % default file‐access properties
);
% Create Data/ group
H5G.create(file_id, '/Data', 'H5P_DEFAULT', 'H5P_DEFAULT', 'H5P_DEFAULT');
% ...
% ...
% ...
group_path = sprintf('/Data/%s', city_name);
H5G.create(file_id, group_path, 'H5P_DEFAULT', 'H5P_DEFAULT', 'H5P_DEFAULT');
h5writeatt(file_path, group_path, 'Name', city_name);
% ...
% ...
% ...
group_path = sprintf('/Data/%s/%s/drift_vel%d/sat%d/%s/%s', city_name, severity, eastward_drift_vel, sat_idx, constellation, freq);
% add data
amplitude = scenario.(freq).amplitude.timeseries_postprop.Var1;
h5create(file_path, group_path + "/amplitude", [1 numel(amplitude)]);
h5write(file_path, group_path + "/amplitude", amplitude.');
% ...
% ...
% ...
That is the md5sum digest message for two different runs
(iono-scint-charact) tapyu@felix-Alienware-m16-R1:~/git/iono-scint-charact/data/raw/cpssm$ md5sum cpssm_dataset.h5
66c3ea9930c1adfeb49d4e15dcfdf018 cpssm_dataset.h5
(iono-scint-charact) tapyu@felix-Alienware-m16-R1:~/git/iono-scint-charact/data/raw/cpssm$ md5sum cpssm_dataset.h5
445738bb297dd50b1f8c69646a487645 cpssm_dataset.h5
They should be the same!
The values are in fact the same as h5dump leads to the same STDOUT.
4 Comments
Umar
on 6 Sep 2025
Hi @Rubem,
What you’re seeing is expected: HDF5 stores some internal metadata that changes every time the file is created. MD5 is not a reliable indicator of content equality for HDF5 files. Focus on comparing the actual data arrays and attributes.
Rubem
on 8 Sep 2025
Umar
on 8 Sep 2025
Hi @Rubem,
That’s a fair observation — and it points to the subtlety of how HDF5 files are laid out internally.
- When you compute an MD5 of the raw file bytes, you are sensitive not only to dataset values and attributes, but also to low-level structural details: object header creation times, free-space manager state, alignment padding, or chunk indexing. These can differ across writes even if the logical content is the same. This is why the HDF Group themselves caution that HDF5 files are not bitwise stable.
- In your Python workflow, it may appear “reliable” because you are likely writing the file with identical settings, in the same session, and without features that introduce non-determinism (e.g. timestamps or chunk free lists). In that constrained situation, two writes may indeed yield identical byte streams — so the MD5 happens to match. But this should be seen as an implementation artifact, not a guarantee of the format.
- If your goal is to verify that two HDF5 files contain the same scientific data, the robust approach is to compare datasets and attributes programmatically (e.g. using `h5diff` on the command line, or `h5py` in Python).
So in short: yes, sometimes the MD5 will coincide; but it is not a reliable, portable guarantee of equality across environments or HDF5 versions.
Rubem
on 8 Sep 2025
Answers (1)
Walter Roberson
on 6 Sep 2025
0 votes
HDF5 objects have unique object identifiers, but there is no requirement that two HDF5 files written the same way use the same object identifiers.
2 Comments
Rubem
on 8 Sep 2025
Walter Roberson
on 8 Sep 2025
Mathworks does not provide any way to ensure that the same object identifiers are used.
Categories
Find more on HDF5 Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!