Huffman Coding

SaqibRehan2 864 views 57 slides May 14, 2018
Slide 1
Slide 1 of 57
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57

About This Presentation

A complete slide that represent huffman coding and tree with examples explained and very easy to understantd


Slide Content

DATA STRUCTURE & ALGORITHM HUFFMAN CODING /TREE

Presented By: M. Saqib Rehan 503 Presented to: Sir Naveed Hassan & the whole class

3 OUTLINE

an algorithm developed by David A. Huffman while he was a Ph.D. / Sc.D.(Doctor of Science) student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized. 4 History of Huffman Coding

An Application of Binary Trees and Priority Queues Huffman Coding is a method of shortening down messages sent from one computer to another so that it can be sent quicker. Each code is a binary string that is used for transmission of the corresponding message. For Example : BAGGAGE 100 11 0 0 11 0 101 Plain Text Code 5 What is Huffman Coding

Huffman codes are an effective technique of ‘lossless data compression’ which means no information is lost. The idea is to assign variable-length codes to input characters Lengths of the assigned codes are based on the frequencies of corresponding characters Most frequent character smallest code Least frequent character largest code 6 Huffman Codes

Huffman Code is greedy when it locally chooses and merges two of the smallest nodes (nodes are weighted after occurrence/frequency. Those which occur most frequent has the largest values, and those with that occur least has the lowest values) at a time, until there is no more nodes left over and a binary tree is built 7 How Huffman Coding is Greedy

Used to compress files for transmission To decrease our storage cost Huffman coding is widely used for image compression. Since it reduces the coding redundancy and results in lossless data at the receiver end Uses statistical coding more frequently used symbols have shorter code words Works well for text and fax transmissions 8 Applications

Fixed length code:- If every word in the code has the same length, the code is called a fixed-length code, or a block code 9 Fixed & Variable length code Advantage Disadvantage Access is fast because the computer knows where each Word starts Using Fixed length records, the records are usually larger and therefore need more storage space and are slower to transfer

variable-length codes The advantage of variable-length codes over fixed-length is short codes can be given to characters that occur frequently on average, the length of the encoded message is less than fixed-length encoding Potential problem: how do we know where one character ends and another begins? not a problem if number of bits is fixed! A = 00 B = 01 C = 10 D = 11 00 10 11 01 11 00 11 11 11 11 11 A C D B A D D D D D

Prefix property A code has the prefix property if no character code is the prefix (start of the code) for another character Example: 000 is not a prefix of 11, 01, 001, or 10 11 is not a prefix of 000, 01, 001, or 10 … Symbol Code P 000 Q 11 R 01 S 001 T 10 01001101100010 R S T Q P T

Code without prefix property The following code does not have prefix property The pattern 1110 can be decoded as QQQP , QTP , QQS , or TS Symbol Code P Q 1 R 01 S 10 T 11

Consider a file of 100 characters from a- f, with these frequencies: a = 45 b = 13 c = 12 d = 16 e = 9 f = 5 13 Problem Analysis

 Typically each character in a file is stored as a single byte (8 bits) If we know we only have six characters, we can use a 3 bit code for the characters instead: a = 000, b = 001, c = 010, d = 011, e = 100, f = 101 This is called a fixed-length code With this scheme, we can encode the whole file with 300 bits (45*3+13*3+12*3+16*3+9*3+5*3) We can do better Better compression More flexibility 14 Fixed-Length Code

Variable length codes can perform significantly better Frequent characters are given short code words, while infrequent characters get longer code words Consider this scheme: a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100 How many bits are now required to encode our file? 45*1 + 13*3 + 12*3 + 16*3 + 9*4 + 5*4 = 224 bits This is in fact an optimal character code for this file 15 Variable-Length Code

Compression: Data Compression, in Huffman coding, shrinks down a String so that it takes up less space. This is desirable for data storage and data communication. Storage space on disks is expensive so a file which occupies less disk space is "cheaper" than an uncompressed String. Decompression: In the decompression we convert the binary codes into the Original string. Techniques of Huffman coding:

Scan text to be compressed and tally occurrence of all characters. Sort or prioritize characters based on number of occurrences in text. Build Huffman code tree based on prioritized list. Perform a traversal of tree to determine all code words. Scan text again and create new file using the Huffman codes. The (Real) Basic Algorithm

Consider the following short text: Huffman coding ,the greedy algorithm Size is = 36 Characters Scan the text and prepare a frequency table. 18 EXAMPLE

Huffman coding ,the greedy algorithm 19 Frequency table character frequency character frequency character frequency H 1 c 1 h 2 u 1 o 2 r 2 f 2 d 2 y 1 m 2 i 2 l 1 a 2 g 3 e 3 n 2 , 1 Space 4 t 2

Building a Tree Prioritize characters Create binary tree nodes with character and frequency of each character Place nodes in a priority queue The lower the occurrence, the higher the priority in the queue

21 Building a Tree (Enqueue) H1 u1 c1 ,1 y1 l1 f2 m 2 a2 n2 O 2 d2 i2 T 2 h2 r2 g3 e3 sp 4

22 Building a Tree H1 u1 c1 ,1 y1 l1 f2 m 2 a2 n2 O 2 d2 i2 T 2 h2 r2 g3 e3 sp 4 2

23 Building a Tree c1 ,1 y1 l1 f2 m 2 a2 n2 O 2 d2 i2 T 2 h2 r2 g3 e3 sp 4 2 H1 u1 2

24 Building a Tree y1 l1 f2 m 2 a2 n2 O 2 d2 i2 T 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2

25 Building a Tree y1 l1 f2 m 2 a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2

26 y1 l1 2 Building a Tree y1 l1 f2 m 2 a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2

27 y1 l1 2 Building a Tree f2 m 2 a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2

28 y1 l1 2 Building a Tree f1 m 2 a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 f2 m2 4

y1 l1 2 Building a Tree a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 f 2 m2 4

30 y1 l1 2 Building a Tree a2 n2 o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 f2 m2 4

31 y1 l1 2 Building a Tree o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 f 2 m2 4

32 y1 l1 2 Building a Tree o 2 d2 i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 f 2 m2 4

33 y1 l1 2 Building a Tree i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 f 2 m2 4

34 y1 l1 2 Building a Tree i2 t 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 i2 t2 4 f 2 m2 4

35 y1 l1 2 Building a Tree h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 i2 t2 4 f 2 m2 4

36 y1 l1 2 h2 r2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 i2 t2 4 h2 r2 4 f 2 m2 4

37 y1 l1 2 g3 e3 sp 4 H1 u1 2 c 1 ,1 2 a2 n2 4 o2 d2 4 i2 t2 4 h2 r2 4 H1 u1 2 c 1 ,1 2 4 f 2 m2 4

38 y1 l1 2 g3 e3 sp 4 f 2 m2 4 a2 n2 4 o2 d2 4 i2 t2 4 h2 r2 4 H1 u1 2 c 1 ,1 2 4 2 4 6 y1 l1 f 2 m2

39 g3 e3 sp 4 a2 n2 4 o2 d2 4 i2 t2 4 h2 r2 4 H1 u1 2 c 1 ,1 2 4 2 4 6 y1 l1 f 2 m2 g3 e3 6

40 sp 4 a2 n2 4 o2 d2 4 i2 t2 4 h2 r2 4 H1 u1 2 c 1 ,1 2 4 2 4 6 y1 l1 f 2 m2 g3 e3 6 sp 4 a2 n2 4 8

41 o2 d2 4 i2 t2 4 h2 r2 4 H1 u1 2 c 1 ,1 2 4 2 4 6 y1 l1 f 2 m2 g3 e3 6 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8

42 h2 r2 4 H1 u1 2 c 1 ,1 2 4 2 4 6 y1 l1 f2 m2 g3 e3 6 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8

43 h2 r2 4 H1 u1 2 c 1 ,1 2 4 2 4 6 y1 l1 f 2 m2 g3 e3 6 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 8

44 h2 r2 4 H1 u1 2 c 1 ,1 2 4 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 8 2 4 6 y1 l1 F2 m2 g3 e3 6 12

45 h2 r2 4 H1 u1 2 c 1 ,1 2 4 8 2 4 6 y1 l1 f 2 m2 g3 e3 6 12 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 16

46 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 16 h2 r2 4 H1 u1 2 c 1 ,1 2 4 8 2 4 6 y1 l1 f2 m2 g3 e3 6 12 20

47 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 16 h2 r2 4 H1 u1 2 c 1 ,1 2 4 8 2 4 6 y1 l1 f 2 m2 g3 e3 6 12 20 36 After enqueuing only one index is left

Dequeue the single node left in the queue. This tree contains the new code words for each character. Frequency of root node should equal number of characters in text Huffman coding ,the greedy algorithm =36 Characters 48 Building a Tree

49 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 16 h2 r2 4 H1 u1 2 c 1 ,1 2 4 8 2 4 6 y1 l1 f 2 m2 g3 e3 6 12 20 36

Coding Now we assign codes to the tree by placing a 0 on every left branch and a 1 on every right branch A traversal of the tree from root to leaf give the Huffman code for that particular leaf character Note that no code is the prefix of another code

51 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 16 h2 r2 4 H1 u1 2 c 1 ,1 2 4 8 2 4 6 y1 l1 f2 m2 g3 e3 6 12 20 36 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

52 New code after traversing Character code Character Code Character Code space 000 d 0101 y 11000 e 1111 o 0100 , 10111 g 1110 n 0011 c 10110 r 1001 a 0010 u 10101 h 1000 m 11011 H 10100 t 0111 f 11010 i 0110 l 11001 frequent letter=less- bits less frequent- letter=more bits The tree could have been built in a number of ways; each would yielded different codes but the code would still be minimal.

Original text “ Huffman coding ,the greedy algorithm ” By using fixed length coding , length is characters*8 36*8= 288 By using variable length code encoded form is: 10100101011101011010110110010001100010110010001010110001111100001011101111000111100011101001111111110101110000000010110011110100101100111100011011 Compressed form contain 146 bits , save memory 49 % Huffman encoding

Once receiver has tree it scans incoming bit stream 0 ⇒ go left 1 ⇒ go right 54 Decoding the File

55 10100101011101011010110110010001100010110010001010110001111100001011101111000111100011101001111111110101110000000010110011110100101100111100011011 sp 4 a2 n2 4 8 o2 d2 4 i2 t2 4 8 16 h2 r2 4 H1 u1 2 c 1 ,1 2 4 8 2 2 4 y1 l1 f 1 m1 g3 e3 6 10 18 34 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

56

57