I work with large/numerous scientific data, and my original system was to write my data in a binary file, indexed in a structure so I could only read the portion of data that interessed me. I recently tested hdf5 file sructure (h5create/h5write....), and my first approach was to make many independ hdf5 files so to be able to work with Parallel computing. In my first test, I have created 20,000+ h5 files. While it work fine, it is at least ten time slower that my previous approach where I dumbed everythings on binary file and recorded the starting and end position.
I would appreciate any help on the following topics:
1- Will it be faster if I used the low lovel h5 functions rather than the high-level functions?
2- Will it be fasted to have one large h5 file (>1GB) rather than many, and in this case will I be able to access this file siumltaneously with many workers (one worker par dataset through)?
Thanks in advance