How can I do audio compression using Huffman encoding?

Question

0 votes

I know the concept of Huffman encoding and how to implement it on MATLAB. My question here is what will i be using as symbols in this case? Will it be the samples that i get after reading the audio(audioread())? Do i need to process my audio first( quantization / downsample / fft )?

7 Comments
Show 5 older comments Hide 5 older comments

Mittal on 16 Oct 2023

can you provide the code

Walter Roberson on 16 Oct 2023

@Mittal

Yes, I could provide the code... but I am not going to write people's homework assignments for them. I already provided a somewhat detailed outline of the steps that need to be taken.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Walter Roberson on 30 Apr 2018

0 votes

When you read from the audio file, use the 'native' option. The result is quite likely to be integer valued. You can use the integers as the symbols.

You will likely find, though, that you can get better compression by using the difference between adjacent integers as the symbols.

4 Comments
Show 2 older comments Hide 2 older comments

Walter Roberson on 3 May 2018

Edited: Walter Roberson on 4 May 2018

You deleted your posted attempt while I was working on explaining the details. The explanation was:

original.wav is built from the mono version of the entire original sound.

Then the first 200 entries are taken from that, and used to construct a stream of 0's and 1's that encode the 200 entries using huffman encoding. In the test I did, that was a stream of 918 values each double(0) or double(1)

You then audiowrite() that stream of 0's and 1's as independent values, thus making each bit of output into a sample by itself.

Because the values are double, and the default bits per sample is 16, audiowrite is defined as converting the double to int16 for writing to wav file, so two bytes per sample would be used.

audiowrite() is permitted (but not required) to use whatever lossless compression system that is supported by the audio file type, so the file has potentially been compressed (but probably not.)

audiowrite() adds on headers such as information about the sampling frequency, so the size of the file on disk reflects more than just the audio data itself.

Then you decode the huffman encoding, which gets you back to 200 entries, each a double precision number that are going to be floating point and not just 0 or 1. You then audiowrite() the 200 double precision numbers, again with the default 16 bits per entry, so again it will write out 2 bytes per entry for the 200 values, and might or might not compress the data as it does so (but probably will not.)

You then compare the size of the file on disk, which as indicated in the previous discussion, includes headers. This is going to be smaller than the other file because you are only writing out 200 values, where-as you were writing out about 900-ish values.

What you should be writing to disk is a binary file that is not a .wav file. The binary file should include information about the sampling frequency, and should include the encoding dictionary (along with information about the size of the encoding dictionary). After that you should write out one bit for each value in the huffman encoding: use fwrite() with 'ubit1' for that (you need to write all of the values out at the same time.)

You also need to decide how you are going to handle end of file, because huffman encoding will often not happen to produce an exact multiple of 8 bits. There are two common ways of handling this:

put in a length field in the headers that you write out before the binary data; or
when you are preparing the huffman dictionary, include an symbol that does not otherwise occur, and give it low probability (1/number of elements being encoded.) Then when doing the encoding, stick that symbol on the end of the input stream, and do the encoding. Later when you do the decoding, as soon as you recognize that you have decoded that symbol, stop reading and return the bit vector up to that point -- the extra symbol becomes an "end of stream" marker.

Reminder, this is all to a binary file, not to a .wav file.

The decoding process involves reading the headers out of the binary file, including the dictionary. Then read the bit vector from the binary file and pass it through the decoding process with the dictionary you read in.

You must include the dictionary with the binary file in order to properly calculate the size of the stored data. The binary file by itself must be enough to recreate the sound, as if it had been transmitted to another system that did not happen to have a copy of the dictionary on hand.

Walter Roberson on 4 May 2018

Saurabh Parkar had used huffmandict() on the samples to build the encoding tables, and then used huffmanenco() to perform the encoding to a stream of 0 and 1 values.

Saurabh Parkar on 7 May 2018

Thank you so much Walter! I was stuck on this since a long time.

This was really very helpful.

Sign in to comment.

How can I do audio compression using Huffman encoding?

7 Comments
Show 5 older comments Hide 5 older comments

Answers (1)

4 Comments
Show 2 older comments Hide 2 older comments

Categories

Tags

Community Treasure Hunt

How can I do audio compression using Huffman encoding?

7 Comments Show 5 older comments Hide 5 older comments

Answers (1)

4 Comments Show 2 older comments Hide 2 older comments

Categories

Tags

See Also

Community Treasure Hunt

7 Comments
Show 5 older comments Hide 5 older comments

4 Comments
Show 2 older comments Hide 2 older comments