Write Stream Binary Header with different precisions

I want to write a stream binary file that has different precisions for the header. Below is the header that I want to include with 256 Characters for the name of the file, double precision for the number of rows and columns, and then the data set is single precision.
256 Character -> "Unformatted file version=292498251"
Double -> "13"
Double -> "1000000"
Single -> "data" which is an array with 1000000 x 13 entries
Any help would be appreciated!

Answers (2)

dpb
dpb on 28 Sep 2024
Edited: dpb on 28 Sep 2024
Use fwrite with the 'precision' optional argument
str='Unformatted file version=292498251'; % string as char() array
N=100000; M=13;
data=rand(N,M);
fid=fopen('filename.bin','w');
fwrite(fid,'char*1',pad(str,256));
fwrite(fid,[N M],'double');
fwrite(fid,data,'single');
fid=fclose(fid);
clear fid

Hi @Jaden Hoechstetter,

You mentioned, “ _I want to write a stream binary file that has different precisions for the header. Below is the header that I want to include with 256 Characters for the name of the file, double precision for the number of rows and columns, and then the data set is single precision.256 Character -> "Unformatted file version=292498251"Double -> "13"Double -> "1000000"Single -> "data" which is an array with 1000000 x 13 entries,Any help would be appreciated!_ ”

Please see my response to your comments below.

To achieve the objective of writing a stream binary file with different precisions for the header in MATLAB, create string of 256 characters, followed by two double precision numbers and a dataset of single precision numbers with dimensions 1000000 x 13. Then, use MATLAB's file I/O functions to open a binary file for writing, write the header and the dataset to the binary file in the specified format and make sure that the file is properly closed after writing. Here is the complete MATLAB code that implements the above steps:

% Define the Header
headerString = 'Unformatted file version=292498251'; % Initial string
headerString = pad(headerString, 256); % Pad to 256 characters
numRows = 1000000; % Number of rows
numCols = 13; % Number of columns
% Generate the Data
data = single(rand(numRows, numCols)); % Create a 1000000 x 13 array of   
single precision random numbers
% Open a Binary File
fileID = fopen('output_binary_file.bin', 'wb'); % Open file for writing in 
binary  mode
if fileID == -1
  error('Failed to open the file for writing.');
end
% Write the Header and Data
% Write the header string (256 characters)
fwrite(fileID, headerString, 'char');
% Write the number of rows and columns (double precision)
fwrite(fileID, numRows, 'double');
fwrite(fileID, numCols, 'double');
% Write the data (single precision)
fwrite(fileID, data, 'single');
% Close the File
fclose(fileID);
% Display final results
disp('Binary file written successfully with the following contents:');
disp(['Header: ', headerString]);
disp(['Number of Rows: ', num2str(numRows)]);
disp(['Number of Columns: ', num2str(numCols)]);
disp(['Data Size: ', num2str(size(data))]);

Please see attached.

So, in the above code example provided,header string is initialized and padded to ensure it is exactly 256 characters long. This is crucial for maintaining the structure of the binary file. A dataset of random numbers is created using rand, which generates values between 0 and 1. The single function converts these values to single precision. Then fopen function is utilized which opens a binary file named output_binary_file.bin for writing. The 'wb' mode indicates that the file is opened for writing in binary format and Error handling is included to ensure that the file opens successfully. The fwrite function is used to write the header string, followed by the two double precision numbers (number of rows and columns), and finally the dataset of single precision numbers and fclose function is called to close the file, ensuring that all data is flushed and the file is properly saved. Finally, script concludes by displaying the contents of the header, the number of rows and columns, and the size of the data array to confirm that the operations were successful.

For more information on file functions used in the code above, please refer to

fopen

fwrite

<https://www.mathworks.com/help/matlab/ref/fclose.html?s_tid=doc_ta fclose >

Hope this should help resolve your problem. Please let us know if you have any further questions.

6 Comments

...
data = single(rand(numRows, numCols)); % Create a 1000000 x 13 array of single precision random numbers
...
NOTA BENE: While the casting of the data array to single doesn't hurt anything, fwrite will do the conversion internally anyway and unless the precision optional argument is provided, the precision is uint8 so one doesn't save anything by passing the single array instead of the default double...it would be more convenient if it had been such that the default behavior were to follow the class of the input argument, but that is not the implementation chosen.
Hi @dpb,
Thanks for pointing out about precision consideration, so you are trying to say casting to single precision does not hurt performance, it may not be necessary since fwrite can handle this internally and by default, fwrite uses uint8 unless specified otherwise.
Hi @Jaden Hoechstetter,
Please let us know if you need any further assistance from us.
I haven't specifically tried to time it as to whether the JIT compiler is smart enough to not do the same thing (cast the data from double to single) twice or not, @Umar, I was simply pointing out that it is, at least, superfluous to do the explicit cast because you must specify the desired precision explicitly to fwrite anyway, and it will do the conversion internally based on that input. Hence, there is no advantage in writing the explicit cast first in order to pass a single instead of double array to fwrite. Whether there's a measurable perfromance difference would probably be difficult to measure, but certainly the most efficient code is that not executed if not needed, so not asking for the explicit conversion here would seem the better route simply for the question of writing the specific file as requested.
Of course, that one could cut the memory footprint in half might be important for other pieces of the overall problem of which the specific question might be a small piece; we don't have any information on the context of the question so I was only addressing the specifics of fwrite.
Memory bandwidth and cache hit/miss can be different also and while the number of bytes is half for single, the data organization in the algorithm may kill that performance if not processing sequentially. All in all, it's simply so complicated with modern CPUs as to be unanserable except by comparison of the same problem on the same machine with the same version of MATLAB.
In general, trying to preoptimize is a bad idea; use the default where can and only if that turns out to be unsatisfactory begin to try things to improve performance is a basic tenet. Even with very large datasets, using the tools available in MATLAB such as tall arrays may well alleviate need for more exotic solutions.
In the olden days with much smaller memory footprints and simpler CPUs and limited or nonexistant GPUs, it was a lot easier to answer--now it's an extremely complicated problem to analyze.
Hi @dpb,
First, let me tell you as a friend that your advice on certain posts in the past has always been helpful. Again, thank you for your detailed feedback regarding the use of data casting with the fwrite function. I appreciate your insights into the complexities of performance optimization, especially in relation to modern CPU architectures and memory management.
Your point about the internal conversion that fwrite performs based on specified precision is well taken. It is indeed prudent to avoid unnecessary explicit casts, particularly when they may not yield significant performance benefits. As you rightly noted, optimizing code without a clear understanding of its context can lead to complications rather than improvements.
I also agree that leveraging MATLAB's built-in tools, such as tall arrays, can often provide effective solutions for handling large datasets without resorting to more intricate optimizations prematurely.
Your perspective on pre-optimization resonates strongly with me; focusing on default settings until a performance issue arises is a sensible approach.
Thank you once again for your thoughtful analysis. Your expertise adds considerable value to our discussions. I hope OP accepts your answer.
"...you must specify the desired precision explicitly to fwrite anyway, and it will do the conversion internally based on that input. "
which -all fwrite
built-in (/MATLAB/toolbox/matlab/iofun/fwrite) /MATLAB/toolbox/matlab/serial/@serial/fwrite.m % serial method /MATLAB/toolbox/instrument/instrument/@i2c/fwrite.m % i2c method /MATLAB/toolbox/shared/instrument/@icinterface/fwrite.m % icinterface method
The iofun base version is builtin so can't easily tell if it has sufficient preprocessing up front to know it doesn't need a cast if the input precision matches the argument class or whether it foregoes testing upfront overhead and "does its thang" regardless...either way, it's generally quite a fast operation so in reality the cost penalty would probably never be noticed; just a case of trying to minimize code that isn't specifically required as a exercise in parsimony...

Sign in to comment.

Categories

Asked:

on 28 Sep 2024

Commented:

dpb
on 29 Sep 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!