Saving data as binary
Show older comments
Basically, i have for example k = [0 5 4], i want it to be saved as [0 101 100] instead of [00000000 00000101 00000100] so that it takes the least size possible, how can i do that ?
Answers (2)
k = [0 5 4];
arrayfun(@(x)dec2bin(x,max(1,ceil(log2(x)))),k,'UniformOutput',false)
11 Comments
Adel Hafri
on 14 May 2022
Image Analyst
on 15 May 2022
OK, so? Are you implying that is a problem or is unexpected?
Is essence you have 3 strings with 7 characters so it should be 7 bytes. Plus there is some overhead for using a cell array so it could be more than that.
k has 3 two-byte double numbers, or 6 bytes.
Why do you want strings anyway? What's wrong with the numbers in their original form?
Adel Hafri
on 15 May 2022
Maybe use uint8 (or uint16, uint32, uint64, depending on the range of your data, or possibly their signed counterparts int8, etc.) instead of character arrays.
Exploring the amount of storage used for various data types:
x = '00000001'; % 1-by-8 character array
whos x % 16 bytes (but could be made 8 using a different encoding)
x = '1'; % scalar character
whos x % 2 bytes (but could be 1 with different encoding)
x = false(1,8); % 1-by-8 logical array. you might think this
x(end) = true; % would be 8 bits, but in fact it's 8 bytes
x
whos x % 8 bytes
x = true % scalar logical
whos x % 1 byte
x = 1; % double-precision floating point number (8 bytes)
whos x % 8 bytes
x = uint8(1); % unsigned 8-bit integer (1 byte)
whos x % 1 byte
x = [0 5 4]; % 3 doubles
whos x % 24 bytes
x = uint8(x); % 3 uint8's
whos x % 3 bytes
By the way, trying to get down to less than one byte, e.g., storing 1 as 1 bit and storing 4 = 100 as 3 bits will make the resulting file impossible to decode. For instance, if your file contains the sequence of bits 1100 somewhere, you would not know whether that should be interpreted as:
- 1100 (i.e., decimal 12), or
- 110, 0 (i.e., decimal 6, 0), or
- 11, 0, 0 (i.e., decimal 3, 0, 0), or
- 1, 100 (i.e., decimal 1, 4), or
- 1, 10, 0 (i.e., decimal 1, 2, 0), or
- 1, 1, 0, 0 (i.e., decimal 1, 1, 0, 0)
All six of those interpretations use the minimum number of bits required for each decimal number (i.e., no leading zeros).
[ The other two possible interpretations:
- 11, 00 (i.e., decimal 3, 0), and
- 1, 1, 00 (i.e., decimal 1, 1, 0)
do not meet the requirement that every number is encoded with the minimum number of bits (i.e., they have leading zeros: decimal 0 is bits 00 instead of bit 0), so they could be ruled out. ]
It's an interesting problem to think about:
Adel Hafri
on 15 May 2022
Voss
on 15 May 2022
If 245, 2, 6, and 78 are the only possible numbers you need to encode, then sure, you could encode them like that. I think you'd have to write a MATLAB function to do the encoding yourself, but that wouldn't be too difficult. Is that correct, that you'll only ever have those four numbers?
In any case, I don't think there is a way to write less than 1 byte (e.g., write two bits at a time) to file. You'd have to combine your two-bit symbols in groups of four symbols. So you'd encode [245 2 6 78] to [01 10 11 00], then write to file the concatenation of those 4 two-bit symbols, which is the byte 01101100 (decimal 108, hex 6C).
That way, you could do four symbols per byte. If you have more symbols/numbers to encode, then you'd have to do with fewer symbols per byte. For instance,
- more than 4, up to 16 symbols -> 4 bits per symbol -> 2 symbols per byte
- more than 16, up to 256 symbols -> 8 bits per symbol -> 1 symbol per byte -> use built-in type uint8 (no custom encoding function required)
Adel Hafri
on 15 May 2022
Walter Roberson
on 15 May 2022
You can fwrite with 'bit1'. All of the values that you fwrite() in a single call will be packed into consecutive bits, but at the end of the call if you do not happen to be positioned at the end of a byte then enough 0s will be added to reach the byte boundary.
Walter Roberson
on 15 May 2022
You would typically use the Huffman decoding function to decode the stream of bits, and that decoding function needs to be passed the dictionary.
Adel Hafri
on 15 May 2022
Walter Roberson
on 20 May 2022
bits = {[1] [0 0] [1] [0 1 1] }
Bitstream = [bits{:}];
fid = fopen('test.bin','w');
fwrite(fid, Bitstream, 'bit1');
fclose(fid);
Ilya Dikariev
on 20 May 2022
0 votes
k_new=str2num(dec2bin(k))' would do. But if you want to still reduce the the size, just use dec2bin which keeps the data in char type which is 8 times smaller
1 Comment
Walter Roberson
on 20 May 2022
Edited: Walter Roberson
on 20 May 2022
only 4 times smaller. Each character needs 16 bits.
If you uint8(k_new) then that would need only one byte per value
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!