A complete slide that represent huffman coding and tree with examples explained and very easy to understantd
Size: 1.4 MB
Language: en
Added: May 14, 2018
Slides: 57 pages
Slide Content
DATA STRUCTURE & ALGORITHM HUFFMAN CODING /TREE
Presented By: M. Saqib Rehan 503 Presented to: Sir Naveed Hassan & the whole class
3 OUTLINE
an algorithm developed by David A. Huffman while he was a Ph.D. / Sc.D.(Doctor of Science) student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized. 4 History of Huffman Coding
An Application of Binary Trees and Priority Queues Huffman Coding is a method of shortening down messages sent from one computer to another so that it can be sent quicker. Each code is a binary string that is used for transmission of the corresponding message. For Example : BAGGAGE 100 11 0 0 11 0 101 Plain Text Code 5 What is Huffman Coding
Huffman codes are an effective technique of ‘lossless data compression’ which means no information is lost. The idea is to assign variable-length codes to input characters Lengths of the assigned codes are based on the frequencies of corresponding characters Most frequent character smallest code Least frequent character largest code 6 Huffman Codes
Huffman Code is greedy when it locally chooses and merges two of the smallest nodes (nodes are weighted after occurrence/frequency. Those which occur most frequent has the largest values, and those with that occur least has the lowest values) at a time, until there is no more nodes left over and a binary tree is built 7 How Huffman Coding is Greedy
Used to compress files for transmission To decrease our storage cost Huffman coding is widely used for image compression. Since it reduces the coding redundancy and results in lossless data at the receiver end Uses statistical coding more frequently used symbols have shorter code words Works well for text and fax transmissions 8 Applications
Fixed length code:- If every word in the code has the same length, the code is called a fixed-length code, or a block code 9 Fixed & Variable length code Advantage Disadvantage Access is fast because the computer knows where each Word starts Using Fixed length records, the records are usually larger and therefore need more storage space and are slower to transfer
variable-length codes The advantage of variable-length codes over fixed-length is short codes can be given to characters that occur frequently on average, the length of the encoded message is less than fixed-length encoding Potential problem: how do we know where one character ends and another begins? not a problem if number of bits is fixed! A = 00 B = 01 C = 10 D = 11 00 10 11 01 11 00 11 11 11 11 11 A C D B A D D D D D
Prefix property A code has the prefix property if no character code is the prefix (start of the code) for another character Example: 000 is not a prefix of 11, 01, 001, or 10 11 is not a prefix of 000, 01, 001, or 10 … Symbol Code P 000 Q 11 R 01 S 001 T 10 01001101100010 R S T Q P T
Code without prefix property The following code does not have prefix property The pattern 1110 can be decoded as QQQP , QTP , QQS , or TS Symbol Code P Q 1 R 01 S 10 T 11
Consider a file of 100 characters from a- f, with these frequencies: a = 45 b = 13 c = 12 d = 16 e = 9 f = 5 13 Problem Analysis
Typically each character in a file is stored as a single byte (8 bits) If we know we only have six characters, we can use a 3 bit code for the characters instead: a = 000, b = 001, c = 010, d = 011, e = 100, f = 101 This is called a fixed-length code With this scheme, we can encode the whole file with 300 bits (45*3+13*3+12*3+16*3+9*3+5*3) We can do better Better compression More flexibility 14 Fixed-Length Code
Variable length codes can perform significantly better Frequent characters are given short code words, while infrequent characters get longer code words Consider this scheme: a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100 How many bits are now required to encode our file? 45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4 = 224 bits This is in fact an optimal character code for this file 15 Variable-Length Code
Compression: Data Compression, in Huffman coding, shrinks down a String so that it takes up less space. This is desirable for data storage and data communication. Storage space on disks is expensive so a file which occupies less disk space is "cheaper" than an uncompressed String. Decompression: In the decompression we convert the binary codes into the Original string. Techniques of Huffman coding:
Scan text to be compressed and tally occurrence of all characters. Sort or prioritize characters based on number of occurrences in text. Build Huffman code tree based on prioritized list. Perform a traversal of tree to determine all code words. Scan text again and create new file using the Huffman codes. The (Real) Basic Algorithm
Consider the following short text: Huffman coding ,the greedy algorithm Size is = 36 Characters Scan the text and prepare a frequency table. 18 EXAMPLE
Huffman coding ,the greedy algorithm 19 Frequency table character frequency character frequency character frequency H 1 c 1 h 2 u 1 o 2 r 2 f 2 d 2 y 1 m 2 i 2 l 1 a 2 g 3 e 3 n 2 , 1 Space 4 t 2
Building a Tree Prioritize characters Create binary tree nodes with character and frequency of each character Place nodes in a priority queue The lower the occurrence, the higher the priority in the queue
21 Building a Tree (Enqueue) H1 u1 c1 ,1 y1 l1 f2 m 2 a2 n2 O 2 d2 i2 T 2 h2 r2 g3 e3 sp 4
22 Building a Tree H1 u1 c1 ,1 y1 l1 f2 m 2 a2 n2 O 2 d2 i2 T 2 h2 r2 g3 e3 sp 4 2
23 Building a Tree c1 ,1 y1 l1 f2 m 2 a2 n2 O 2 d2 i2 T 2 h2 r2 g3 e3 sp 4 2 H1 u1 2
24 Building a Tree y1 l1 f2 m 2 a2 n2 O 2 d2 i2 T 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2
25 Building a Tree y1 l1 f2 m 2 a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2
26 y1 l1 2 Building a Tree y1 l1 f2 m 2 a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2
27 y1 l1 2 Building a Tree f2 m 2 a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2
28 y1 l1 2 Building a Tree f1 m 2 a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 f2 m2 4
y1 l1 2 Building a Tree a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 f 2 m2 4
30 y1 l1 2 Building a Tree a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 f2 m2 4
31 y1 l1 2 Building a Tree o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 f 2 m2 4
32 y1 l1 2 Building a Tree o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 f 2 m2 4
33 y1 l1 2 Building a Tree i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 f 2 m2 4
34 y1 l1 2 Building a Tree i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 i2 t2 4 f 2 m2 4
35 y1 l1 2 Building a Tree h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 i2 t2 4 f 2 m2 4
47 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 16 h2 r2 4 H1 u1 2 c 1 ,1 2 4 8 2 4 6 y1 l1 f 2 m2 g3 e3 6 12 20 36 After enqueuing only one index is left
Dequeue the single node left in the queue. This tree contains the new code words for each character. Frequency of root node should equal number of characters in text Huffman coding ,the greedy algorithm =36 Characters 48 Building a Tree
Coding Now we assign codes to the tree by placing a 0 on every left branch and a 1 on every right branch A traversal of the tree from root to leaf give the Huffman code for that particular leaf character Note that no code is the prefix of another code
52 New code after traversing Character code Character Code Character Code space 000 d 0101 y 11000 e 1111 o 0100 , 10111 g 1110 n 0011 c 10110 r 1001 a 0010 u 10101 h 1000 m 11011 H 10100 t 0111 f 11010 i 0110 l 11001 frequent letter=less- bits less frequent- letter=more bits The tree could have been built in a number of ways; each would yielded different codes but the code would still be minimal.
Original text “ Huffman coding ,the greedy algorithm ” By using fixed length coding , length is characters*8 36*8= 288 By using variable length code encoded form is: 10100101011101011010110110010001100010110010001010110001111100001011101111000111100011101001111111110101110000000010110011110100101100111100011011 Compressed form contain 146 bits , save memory 49 % Huffman encoding
Once receiver has tree it scans incoming bit stream 0 ⇒ go left 1 ⇒ go right 54 Decoding the File