Introduction to the Huffman algorithm By : Subaita Ahmed, 2006566
Definition David A. Huffman, while he was a Sc.D. student at MIT invented a greedy algorithm that constructs an optimal prefix code for the purpose of lossless data compression, this specific algorithm is called the Huffman code. -Introduction to Algorithms, CLRS
Terminology Greedy Algorithm : A solution which always takes the locally optimal choice in the hope that this choice will lead to a globally optimal solution. Data compression : The process of encoding, restructuring or otherwise modifying data in order to reduce its size. Fundamentally, it involves re-encoding information using fewer bits than the original representation.
Applications Huffman encoding is widely used in compression formats like WINZIP Multimedia codecs like JPEG, PNG and MP3 use Huffman encoding Cloud systems and servers use Huffman code to compress data and save valuable storage space Huffman encoding still dominates the compression industry since newer schemes are avoided due to their patent issues.
Encoding 1. Create a leaf node for each symbol and add it to the priority queue. 2. While there is more than one node in the queue: 2.1 : Remove the two nodes of highest priority from the queue. 2.2 : Create a new internal node with these two nodes as children and with frequency equal to the sum of the two nodes' frequency. 2.3 : Add the new node to the queue. 3. The remaining node is the root node and the Huffman tree is complete.
Formula The total number of bits required to encode a file of characters is given as the summation of the product of each character’s frequency with its depth in the Huffman tree.
Decoding The process of decompression is simply a matter of translating the stream of prefix codes to individual byte value, usually by traversing the Huffman tree node by node as each bit is read from the input stream. Reaching a leaf node necessarily terminates the search for that particular byte value. The leaf value represents the desired character.
Example & Explanation By : Om Shree, 2006077 Niraj Kumar Sah , 2006 562 Dhruvam Dhruvil, 2006213
Problem statement For a given frequency table of characters determine :- 1. The Huffman code for each character 2. Average code length 3. Total length of Huffman encoded message (in bits)
The given frequency table :-
Step 1 Leaf node corresponding to each character is created :- These nodes are stored in a queue(sorted from least to greatest)
Step 2 2 nodes with max priority are removed from the queue and added to our binary tree
Step 3 : A new node is created whose frequency is the sum of the frequency of our root node and that of the max priority node
Step 4 : Steps 2 and 3 are being repeated
Recurring steps 2 and 3 :
Recurring s teps 2 and 3 :
Recurring steps 2 and 3 : We’ve reached our final Huffman tree
Decoding phase 1 :-
Finding out the average code length :-
Finding out the total length of code :-
THANK YOU ! The presentation on Huffman code has been concluded, we hope you enjoyed.