Write Stream Binary Header with different precisions

Question

0 votes

I want to write a stream binary file that has different precisions for the header. Below is the header that I want to include with 256 Characters for the name of the file, double precision for the number of rows and columns, and then the data set is single precision.

256 Character -> "Unformatted file version=292498251"

Double -> "13"

Double -> "1000000"

Single -> "data" which is an array with 1000000 x 13 entries

Any help would be appreciated!

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 28 Sep 2024

Edited: dpb on 28 Sep 2024

Open in MATLAB Online

1 vote

Use fwrite with the 'precision' optional argument

str='Unformatted file version=292498251';      % string as char() array
N=100000; M=13;
data=rand(N,M);
fid=fopen('filename.bin','w');
fwrite(fid,'char*1',pad(str,256));
fwrite(fid,[N M],'double');
fwrite(fid,data,'single');
fid=fclose(fid);
clear fid

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 2

Umar on 28 Sep 2024

0 votes

Hi @Jaden Hoechstetter,

You mentioned, “ _I want to write a stream binary file that has different precisions for the header. Below is the header that I want to include with 256 Characters for the name of the file, double precision for the number of rows and columns, and then the data set is single precision.256 Character -> "Unformatted file version=292498251"Double -> "13"Double -> "1000000"Single -> "data" which is an array with 1000000 x 13 entries,Any help would be appreciated!_ ”

Please see my response to your comments below.

To achieve the objective of writing a stream binary file with different precisions for the header in MATLAB, create string of 256 characters, followed by two double precision numbers and a dataset of single precision numbers with dimensions 1000000 x 13. Then, use MATLAB's file I/O functions to open a binary file for writing, write the header and the dataset to the binary file in the specified format and make sure that the file is properly closed after writing. Here is the complete MATLAB code that implements the above steps:

% Define the Header
headerString = 'Unformatted file version=292498251'; % Initial string
headerString = pad(headerString, 256); % Pad to 256 characters
numRows = 1000000; % Number of rows
numCols = 13; % Number of columns

% Generate the Data
data = single(rand(numRows, numCols)); % Create a 1000000 x 13 array of   
single precision random numbers

% Open a Binary File
fileID = fopen('output_binary_file.bin', 'wb'); % Open file for writing in 
binary  mode

if fileID == -1
  error('Failed to open the file for writing.');
end

% Write the Header and Data
% Write the header string (256 characters)
fwrite(fileID, headerString, 'char');

% Write the number of rows and columns (double precision)
fwrite(fileID, numRows, 'double');
fwrite(fileID, numCols, 'double');

% Write the data (single precision)
fwrite(fileID, data, 'single');

% Close the File
fclose(fileID);

% Display final results
disp('Binary file written successfully with the following contents:');
disp(['Header: ', headerString]);
disp(['Number of Rows: ', num2str(numRows)]);
disp(['Number of Columns: ', num2str(numCols)]);
disp(['Data Size: ', num2str(size(data))]);

Please see attached.

So, in the above code example provided,header string is initialized and padded to ensure it is exactly 256 characters long. This is crucial for maintaining the structure of the binary file. A dataset of random numbers is created using rand, which generates values between 0 and 1. The single function converts these values to single precision. Then fopen function is utilized which opens a binary file named output_binary_file.bin for writing. The 'wb' mode indicates that the file is opened for writing in binary format and Error handling is included to ensure that the file opens successfully. The fwrite function is used to write the header string, followed by the two double precision numbers (number of rows and columns), and finally the dataset of single precision numbers and fclose function is called to close the file, ensuring that all data is flushed and the file is properly saved. Finally, script concludes by displaying the contents of the header, the number of rows and columns, and the size of the data array to confirm that the operations were successful.

For more information on file functions used in the code above, please refer to

fopen

fwrite

<https://www.mathworks.com/help/matlab/ref/fclose.html?s_tid=doc_ta fclose >

Hope this should help resolve your problem. Please let us know if you have any further questions.

6 Comments
Show 4 older comments Hide 4 older comments

dpb on 28 Sep 2024

I haven't specifically tried to time it as to whether the JIT compiler is smart enough to not do the same thing (cast the data from double to single) twice or not, @Umar, I was simply pointing out that it is, at least, superfluous to do the explicit cast because you must specify the desired precision explicitly to fwrite anyway, and it will do the conversion internally based on that input. Hence, there is no advantage in writing the explicit cast first in order to pass a single instead of double array to fwrite. Whether there's a measurable perfromance difference would probably be difficult to measure, but certainly the most efficient code is that not executed if not needed, so not asking for the explicit conversion here would seem the better route simply for the question of writing the specific file as requested.

Of course, that one could cut the memory footprint in half might be important for other pieces of the overall problem of which the specific question might be a small piece; we don't have any information on the context of the question so I was only addressing the specifics of fwrite.

Memory bandwidth and cache hit/miss can be different also and while the number of bytes is half for single, the data organization in the algorithm may kill that performance if not processing sequentially. All in all, it's simply so complicated with modern CPUs as to be unanserable except by comparison of the same problem on the same machine with the same version of MATLAB.

In general, trying to preoptimize is a bad idea; use the default where can and only if that turns out to be unsatisfactory begin to try things to improve performance is a basic tenet. Even with very large datasets, using the tools available in MATLAB such as tall arrays may well alleviate need for more exotic solutions.

In the olden days with much smaller memory footprints and simpler CPUs and limited or nonexistant GPUs, it was a lot easier to answer--now it's an extremely complicated problem to analyze.

Umar on 29 Sep 2024

Hi @dpb,

First, let me tell you as a friend that your advice on certain posts in the past has always been helpful. Again, thank you for your detailed feedback regarding the use of data casting with the fwrite function. I appreciate your insights into the complexities of performance optimization, especially in relation to modern CPU architectures and memory management.

Your point about the internal conversion that fwrite performs based on specified precision is well taken. It is indeed prudent to avoid unnecessary explicit casts, particularly when they may not yield significant performance benefits. As you rightly noted, optimizing code without a clear understanding of its context can lead to complications rather than improvements.

I also agree that leveraging MATLAB's built-in tools, such as tall arrays, can often provide effective solutions for handling large datasets without resorting to more intricate optimizations prematurely.

Your perspective on pre-optimization resonates strongly with me; focusing on default settings until a performance issue arises is a sensible approach.

Thank you once again for your thoughtful analysis. Your expertise adds considerable value to our discussions. I hope OP accepts your answer.

dpb on 29 Sep 2024

Open in MATLAB Online

"...you must specify the desired precision explicitly to fwrite anyway, and it will do the conversion internally based on that input. "

which -all fwrite
built-in (/MATLAB/toolbox/matlab/iofun/fwrite)
/MATLAB/toolbox/matlab/serial/@serial/fwrite.m           % serial method
/MATLAB/toolbox/instrument/instrument/@i2c/fwrite.m      % i2c method
/MATLAB/toolbox/shared/instrument/@icinterface/fwrite.m  % icinterface method

The iofun base version is builtin so can't easily tell if it has sufficient preprocessing up front to know it doesn't need a cast if the input precision matches the argument class or whether it foregoes testing upfront overhead and "does its thang" regardless...either way, it's generally quite a fast operation so in reality the cost penalty would probably never be noticed; just a case of trying to minimize code that isn't specifically required as a exercise in parsimony...

Sign in to comment.

Write Stream Binary Header with different precisions

0 Comments
Show -2 older comments Hide -2 older comments

Answers (2)

0 Comments
Show -2 older comments Hide -2 older comments

6 Comments
Show 4 older comments Hide 4 older comments

Categories

Tags

Community Treasure Hunt

Write Stream Binary Header with different precisions

0 Comments Show -2 older comments Hide -2 older comments

Answers (2)

0 Comments Show -2 older comments Hide -2 older comments

6 Comments Show 4 older comments Hide 4 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

6 Comments
Show 4 older comments Hide 4 older comments