Huffman Encoding Example
The goal of this program is to demonstrate the construction of a huffman encoding tree. The tree is represented as a binary tree using MATLAB's built in treeplot commands. Contruction of the tree as well as the huffman code book will be described in later sections.
Contents
- Initial Setup
- User-Defined Input
- Probability Calculation Flag
- Encoding Bit Calculation
- Console Output - Initial Characters and Their Respective Probabilties
- Initialize The Encoding Array
- Huffman Encoding Process
- Form Huffman Tree Data
- Calculate Tree Parameters
- Plot the Huffman Tree
- Console Output - Tree Symbols and Their Probabilities
- Tree Parameter Extraction
- Label Tree Nodes
- Label Tree Edges
- Huffman Codebook Calculation
- Console Output - Huffman Codebook
Initial Setup
The first piece of code in this program initializes the MATLAB console by clearing all existing state (variables, figures, etc.)
% ************************ % Hufmman Coding Example % % By Jason Agron % ************************ % Setup the MATLAB console clc; clear all; close all;
User-Defined Input
The user is able to define the string that they wish to huffman encode. The probability distribution for individual characters can either be automatically calculated based on the character's ASCII values, or can be manually inputted by the user
% Define the character string my_str = 'jason';
Probability Calculation Flag
Flag used to enable auto-calculation of probabilities * 1 = Auto-calculate * 0 = User-defines probabilities manually
auto_prob = 1; if (auto_prob == 1) % Automatically calculate the probability distribution % Get ASCII version of each character % Each ASCII value represents the probability of finding the character prob_dist = double(my_str); else % Manually define the probability distribution prob_dist = [10 19 30 40 50]; end
Encoding Bit Calculation
Calculate the number of bits to encode with. In a binary system it takes n-bits to encode 2^n items, so the number of bits to use in the huffman encoder is simply the log-base2 of the number of items to encode.
num_bits = ceil(log2(length(prob_dist)))
num_bits =
3
Console Output - Initial Characters and Their Respective Probabilties
Display character vs. probability
disp('Character Probability:'); for i = 1:length(prob_dist) display(strcat(my_str(i),' --> ',num2str(prob_dist(i)))); end total = sum(prob_dist)
Character Probability: j -->106 a -->97 s -->115 o -->111 n -->110 total = 539
Initialize The Encoding Array
A cell array is used to hold the groups of symbols for encoding. The process of huffman encoding combines characters into larger symbols, and a cell is the ideal type of container for managing this type of data in MATLAB. Initialize a cell array (where each cell is a symbol)
for i = 1:length(my_str) sorted_str{i} = my_str(i); end % Save initial set of symbols and probabilities for later use init_str = sorted_str; init_prob = prob_dist;
Huffman Encoding Process
Iteratively sort and combine symbols based on their probabilities. Each iteration sorts the available symbols and then takes the two symbols with the "worst" probabilties and combines them into a single symbol for the next iteration. The loop halts when only a single symbol remains.
sorted_prob = prob_dist; rear = 1; while (length(sorted_prob) > 1) % Sort probs [sorted_prob,indeces] = sort(sorted_prob,'ascend'); % Sort string based on indeces sorted_str = sorted_str(indeces); % Create new symbol new_node = strcat(sorted_str(2),sorted_str(1)); new_prob = sum(sorted_prob(1:2)); % Dequeue used symbols from "old" queue sorted_str = sorted_str(3:length(sorted_str)); sorted_prob = sorted_prob(3:length(sorted_prob)); % Add new symbol back to "old" queue sorted_str = [sorted_str, new_node]; sorted_prob = [sorted_prob, new_prob]; % Add new symbol to "new" queue newq_str(rear) = new_node; newq_prob(rear) = new_prob; rear = rear + 1; end
Form Huffman Tree Data
Get all elements for the tree (symbols and probabilities) by concatenating the original set of symbols with the new combined symbols and probabilities
tree = [newq_str,init_str]; tree_prob = [newq_prob, init_prob]; % Sort all tree elements [sorted_tree_prob,indeces] = sort(tree_prob,'descend'); sorted_tree = tree(indeces);
Calculate Tree Parameters
Calculate parent relationships for all tree elements. This will allow the treeplot command to realize the correct tree structure.
parent(1) = 0; num_children = 2; for i = 2:length(sorted_tree) % Extract my symbol me = sorted_tree{i}; % Find my parent's symbol (search until shortest match is found) count = 1; parent_maybe = sorted_tree{i-count}; diff = strfind(parent_maybe,me); while (isempty(diff)) count = count + 1; parent_maybe = sorted_tree{i-count}; diff = strfind(parent_maybe,me); end parent(i) = i - count; end
Plot the Huffman Tree
This is easily done using the treeplot command whose argument is the array that contains the parent relationships for each node.
treeplot(parent); title(strcat('Huffman Coding Tree - "',my_str,'"'));
Console Output - Tree Symbols and Their Probabilities
Print out tree in text form
display(sorted_tree) display(sorted_tree_prob)
sorted_tree =
'jason' 'jas' 'on' 'ja' 's' 'o' 'n' 'j' 'a'
sorted_tree_prob =
539 318 221 203 115 111 110 106 97
Tree Parameter Extraction
Extract binary tree parameters. These parameters include the (x,y) coordinates of each node so that we can accurately place label on the tree.
[xs,ys,h,s] = treelayout(parent);
Label Tree Nodes
Place labels on each tree node
text(xs,ys,sorted_tree);
Label Tree Edges
Put weights on the tree. The slope of each edge indicates whether it is a left-branch or a right-branch. Left-branches have positive slope and are labelled with a '1', while right-branches have negative slope and are labelled with a '0'. The labels for the weights are placed at the midpoint of each edge. Calculate mid-points for each node
for i = 2:length(sorted_tree) % Get my coordinate my_x = xs(i); my_y = ys(i); % Get parent coordinate parent_x = xs(parent(i)); parent_y = ys(parent(i)); % Calculate weight coordinate (midpoint) mid_x = (my_x + parent_x)/2; mid_y = (my_y + parent_y)/2; % Calculate weight (positive slope = 1, negative = 0) slope = (parent_y - my_y)/(parent_x - my_x); if (slope > 0) weight(i) = 1; else weight(i) = 0; end text(mid_x,mid_y,num2str(weight(i))); end
Huffman Codebook Calculation
A huffman codebook can be calculated by traversing the huffman tree while recording the weights encountered while travelling to each node. Calculate codebook
for i = 1:length(sorted_tree) % Initialize code code{i} = ''; % Loop until root is found index = i; p = parent(index); while(p ~= 0) % Turn weight into code symbol w = num2str(weight(index)); % Concatenate code symbol code{i} = strcat(w,code{i}); % Continue towards root index = parent(index); p = parent(index); end end
Console Output - Huffman Codebook
Display code book in textual form
codeBook = [sorted_tree', code']
codeBook =
'jason' ''
'jas' '1'
'on' '0'
'ja' '11'
's' '10'
'o' '01'
'n' '00'
'j' '111'
'a' '110'
