How can I do audio compression using Huffman encoding?

I know the concept of Huffman encoding and how to implement it on MATLAB. My question here is what will i be using as symbols in this case? Will it be the samples that i get after reading the audio(audioread())? Do i need to process my audio first( quantization / downsample / fft )?

7 Comments

Hey, do you still have your implementation around? That would really help me right now.
can you send the implementation please?
I could, but I will not do so, as this is a common university assignment, and if I sent code for it to a student, they would not be able to say that they wrote the program themselves.
please share your solution my friend .
How do you fill the probability vector ?
Thank you.
When you read from the audio file, use the 'native' option. Those will be integer data type. double() to convert to floating point. Now create a vector which is the first sample, together with diff() of the double precision values. Call that DS. Create a huffman dictionary, call it HD, from the result, DS: that will figure out which values are used, and what the probabilities are.
Now take that dictionary HD and do a huffman encoding of DS, getting a huffman encoding, call it HE
You now need to store the huffman dictionary, HD, and the huffman encoding HE. You need to store both to reconstruct the data.
can you provide the code
Yes, I could provide the code... but I am not going to write people's homework assignments for them. I already provided a somewhat detailed outline of the steps that need to be taken.

Sign in to comment.

Answers (1)

When you read from the audio file, use the 'native' option. The result is quite likely to be integer valued. You can use the integers as the symbols.
You will likely find, though, that you can get better compression by using the difference between adjacent integers as the symbols.

4 Comments

You deleted your posted attempt while I was working on explaining the details. The explanation was:
original.wav is built from the mono version of the entire original sound.
Then the first 200 entries are taken from that, and used to construct a stream of 0's and 1's that encode the 200 entries using huffman encoding. In the test I did, that was a stream of 918 values each double(0) or double(1)
You then audiowrite() that stream of 0's and 1's as independent values, thus making each bit of output into a sample by itself.
Because the values are double, and the default bits per sample is 16, audiowrite is defined as converting the double to int16 for writing to wav file, so two bytes per sample would be used.
audiowrite() is permitted (but not required) to use whatever lossless compression system that is supported by the audio file type, so the file has potentially been compressed (but probably not.)
audiowrite() adds on headers such as information about the sampling frequency, so the size of the file on disk reflects more than just the audio data itself.
Then you decode the huffman encoding, which gets you back to 200 entries, each a double precision number that are going to be floating point and not just 0 or 1. You then audiowrite() the 200 double precision numbers, again with the default 16 bits per entry, so again it will write out 2 bytes per entry for the 200 values, and might or might not compress the data as it does so (but probably will not.)
You then compare the size of the file on disk, which as indicated in the previous discussion, includes headers. This is going to be smaller than the other file because you are only writing out 200 values, where-as you were writing out about 900-ish values.
What you should be writing to disk is a binary file that is not a .wav file. The binary file should include information about the sampling frequency, and should include the encoding dictionary (along with information about the size of the encoding dictionary). After that you should write out one bit for each value in the huffman encoding: use fwrite() with 'ubit1' for that (you need to write all of the values out at the same time.)
You also need to decide how you are going to handle end of file, because huffman encoding will often not happen to produce an exact multiple of 8 bits. There are two common ways of handling this:
  1. put in a length field in the headers that you write out before the binary data; or
  2. when you are preparing the huffman dictionary, include an symbol that does not otherwise occur, and give it low probability (1/number of elements being encoded.) Then when doing the encoding, stick that symbol on the end of the input stream, and do the encoding. Later when you do the decoding, as soon as you recognize that you have decoded that symbol, stop reading and return the bit vector up to that point -- the extra symbol becomes an "end of stream" marker.
Reminder, this is all to a binary file, not to a .wav file.
The decoding process involves reading the headers out of the binary file, including the dictionary. Then read the bit vector from the binary file and pass it through the decoding process with the dictionary you read in.
You must include the dictionary with the binary file in order to properly calculate the size of the stored data. The binary file by itself must be enough to recreate the sound, as if it had been transmitted to another system that did not happen to have a copy of the dictionary on hand.
How did you construct a stream of 0's and 1's that encode the 200 entries using huffman encoding ?
Saurabh Parkar had used huffmandict() on the samples to build the encoding tables, and then used huffmanenco() to perform the encoding to a stream of 0 and 1 values.
Thank you so much Walter! I was stuck on this since a long time.
This was really very helpful.

Sign in to comment.

Tags

Asked:

on 30 Apr 2018

Commented:

on 16 Oct 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!