Saving data as binary

Basically, i have for example k = [0 5 4], i want it to be saved as [0 101 100] instead of [00000000 00000101 00000100] so that it takes the least size possible, how can i do that ?

Answers (2)

k = [0 5 4];
arrayfun(@(x)dec2bin(x,max(1,ceil(log2(x)))),k,'UniformOutput',false)
ans = 1×3 cell array
{'0'} {'101'} {'100'}

11 Comments

But when saving the file, the file size of the new cell array is still bigger then the original array
OK, so? Are you implying that is a problem or is unexpected?
Is essence you have 3 strings with 7 characters so it should be 7 bytes. Plus there is some overhead for using a cell array so it could be more than that.
k has 3 two-byte double numbers, or 6 bytes.
Why do you want strings anyway? What's wrong with the numbers in their original form?
i want for example if i have a variable x = 1, i want it to be saved as 1 bit instead of a byte as 00000001 since im trying to implement some image compression algorithms
Maybe use uint8 (or uint16, uint32, uint64, depending on the range of your data, or possibly their signed counterparts int8, etc.) instead of character arrays.
Exploring the amount of storage used for various data types:
x = '00000001'; % 1-by-8 character array
whos x % 16 bytes (but could be made 8 using a different encoding)
Name Size Bytes Class Attributes x 1x8 16 char
x = '1'; % scalar character
whos x % 2 bytes (but could be 1 with different encoding)
Name Size Bytes Class Attributes x 1x1 2 char
x = false(1,8); % 1-by-8 logical array. you might think this
x(end) = true; % would be 8 bits, but in fact it's 8 bytes
x
x = 1×8 logical array
0 0 0 0 0 0 0 1
whos x % 8 bytes
Name Size Bytes Class Attributes x 1x8 8 logical
x = true % scalar logical
x = logical
1
whos x % 1 byte
Name Size Bytes Class Attributes x 1x1 1 logical
x = 1; % double-precision floating point number (8 bytes)
whos x % 8 bytes
Name Size Bytes Class Attributes x 1x1 8 double
x = uint8(1); % unsigned 8-bit integer (1 byte)
whos x % 1 byte
Name Size Bytes Class Attributes x 1x1 1 uint8
x = [0 5 4]; % 3 doubles
whos x % 24 bytes
Name Size Bytes Class Attributes x 1x3 24 double
x = uint8(x); % 3 uint8's
whos x % 3 bytes
Name Size Bytes Class Attributes x 1x3 3 uint8
By the way, trying to get down to less than one byte, e.g., storing 1 as 1 bit and storing 4 = 100 as 3 bits will make the resulting file impossible to decode. For instance, if your file contains the sequence of bits 1100 somewhere, you would not know whether that should be interpreted as:
  • 1100 (i.e., decimal 12), or
  • 110, 0 (i.e., decimal 6, 0), or
  • 11, 0, 0 (i.e., decimal 3, 0, 0), or
  • 1, 100 (i.e., decimal 1, 4), or
  • 1, 10, 0 (i.e., decimal 1, 2, 0), or
  • 1, 1, 0, 0 (i.e., decimal 1, 1, 0, 0)
All six of those interpretations use the minimum number of bits required for each decimal number (i.e., no leading zeros).
[ The other two possible interpretations:
  • 11, 00 (i.e., decimal 3, 0), and
  • 1, 1, 00 (i.e., decimal 1, 1, 0)
do not meet the requirement that every number is encoded with the minimum number of bits (i.e., they have leading zeros: decimal 0 is bits 00 instead of bit 0), so they could be ruled out. ]
It's an interesting problem to think about:
But when coding huffman for example, lets say i had an array like this [245 2 6 78] and after encoding it i get it as [01 10 11 00] meaning there wont be an issue destinguishing between them, as long as matlab saves each number without preceding zeros, it will be perfect ( i can save each number in it's own array then save all in a cell array so that wont be a problem either)
Voss
Voss on 15 May 2022
If 245, 2, 6, and 78 are the only possible numbers you need to encode, then sure, you could encode them like that. I think you'd have to write a MATLAB function to do the encoding yourself, but that wouldn't be too difficult. Is that correct, that you'll only ever have those four numbers?
In any case, I don't think there is a way to write less than 1 byte (e.g., write two bits at a time) to file. You'd have to combine your two-bit symbols in groups of four symbols. So you'd encode [245 2 6 78] to [01 10 11 00], then write to file the concatenation of those 4 two-bit symbols, which is the byte 01101100 (decimal 108, hex 6C).
That way, you could do four symbols per byte. If you have more symbols/numbers to encode, then you'd have to do with fewer symbols per byte. For instance,
  • more than 4, up to 16 symbols -> 4 bits per symbol -> 2 symbols per byte
  • more than 16, up to 256 symbols -> 8 bits per symbol -> 1 symbol per byte -> use built-in type uint8 (no custom encoding function required)
i was suggested that, but the issue is during the decoding part, how can i destinguish between each symbol since in huffman for example, symbols get varrying word lengths so one would be 0 while the other could be 1101, so if one of my bytes become 0110100 how would it know what each symbol is
You can fwrite with 'bit1'. All of the values that you fwrite() in a single call will be packed into consecutive bits, but at the end of the call if you do not happen to be positioned at the end of a byte then enough 0s will be added to reach the byte boundary.
You would typically use the Huffman decoding function to decode the stream of bits, and that decoding function needs to be passed the dictionary.
can you please go more into detail about how to use fwrite exactly ?
here is more explination what i wanna do exactly:
okay so i have a 750x750 jpeg pictures with values ranging from 0 to 255 and im supposed to apply losless image compression algorithms to reduce the size of those pictures, lossless image compression algorithms such as huffman work by reducing the length of frequent occuring symbols, for example if 150 was my most occuring then my huffman algorithm gives it the code 0 for example and so i ll be saving 7 bits times the frequncy of that data which means compression, the problem is matlab automatically makes that 1 bit length 0 into a 00000000 so essentialy, my algorithm is pointless since matlab will make all the data 8 bit length again, so i want a way to save data exactly the size i want, whether 1 bit,2bits.3....etc instead of it forcing all data to be 8bits
here is an example of how the algorithm changes the symbols
the picture i used isnt the best example of compression but you can get the idea
bits = {[1] [0 0] [1] [0 1 1] }
Bitstream = [bits{:}];
fid = fopen('test.bin','w');
fwrite(fid, Bitstream, 'bit1');
fclose(fid);

Sign in to comment.

Ilya Dikariev
Ilya Dikariev on 20 May 2022

0 votes

k_new=str2num(dec2bin(k))' would do. But if you want to still reduce the the size, just use dec2bin which keeps the data in char type which is 8 times smaller

1 Comment

Walter Roberson
Walter Roberson on 20 May 2022
Edited: Walter Roberson on 20 May 2022
only 4 times smaller. Each character needs 16 bits.
If you uint8(k_new) then that would need only one byte per value

Sign in to comment.

Categories

Asked:

on 14 May 2022

Commented:

on 20 May 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!