Submitted by: Ashutosh Kawadkar Mohammed Juned D ata Compression
Contents: Introduction Need for Data Compression Types of Compression Lossless Compression Lossy Compression Huffman Coding Applications Summary
Introduction What is Data compression? Data compression is a technique in which information in the file is encoded in such a way that it takes up less space.
Need for Data Compression Compression reduces the size of file: To save space when storing it. To save time when transmitting it. Most files have lot of redundancy .
Continued…. Who needs Compression? Parkinson’s Law : Data expands to fill available space. Moore’s Law : # Transistors on a chip doubles every 18-24 months. Text , images ,sounds ,videos ,etc.
Types of Compression Lossless Compression Lossy Compression Original Data Decompressor (decoder) Compressor (encoder) Compressed data
Lossless Compression A lossless compression algorithm eliminates only redundant information, so one can easily recover the data upon decompression of the file. Lossless data compression is compression without any loss of data quality. The decompressed is replica of original file. It is used when it is important that original and decompressed data be identical. Some image formats, notably PNG, use only loseless compression, while those likes TIFF may use either lossy or lossless methods.
Lossy Compression A Lossy compression method is one where that compressing data and then decompressing it retrieves data that may well be different from the original, but close enough to be useful in some way. The algorithm eliminates irrelevant data as well, and permits only an approximate reconstruction of original file. Lossy compression is dangerously attractive because it can provide compression ratios of 100:1 and 200:1, depending on the type of information being compressed. But the cost is loss of data.
Examples of LOSSY Methods are: PCM JPEG MPEG JPEG-Compression (left to right: decreasing quality setting results in a 8x8 block generation of pixels)
Huffman Coding Huffman coding compression technique involves preliminary analysis of the frequency of occurrence of symbols. Huffman technique creates, for each symbol, a binary data code, the length of which is inversely related to the frequency of occurrence. In this technique we just add symbols at leaf node and root of this leaf node given values of addition of two leafs, and tree is generated. It gives us the best way for encoding any data string using minimum bits.
Example: Consider a string ABRA KA DABRA. OCCURRENCE- A=5 Let A=010 B=2 B=11 R=2 K=00 D=1 D=10 K=1 R=011
Therefore, if we write the given string in Binary form then, i.e. ABRAKADABRA BINARY FORM…. 01011011010000101001011011010 29 bits Now apply Huffman coding logic: Here, the weight of D and K is 1. As occurrence of both the characters is 1 time. Therefore, the weight of parent node is 1+1=2 i.e. Addition of weight oh two leaf nodes. Similarly for other characters…… D K 2 1 Weight => 1 1
11 6 4 2 R B D K A 1 1 1 1 Here, LHS nodes have value = 0 and RHS nodes have value= 1 So, from fig. the values will be- A=0 B=101 D=110 K=111 R=100 Now the representation of string is- 01011000111011001011000 Of 23 bits. Note : No. in red shows weight of node. Tree diagram for given string by Huffman coding 5 2 2 1 1
Summary: Data Compression: Is the substitution of frequently occurring data items, or symbols with short code that requires fewer bits for storage than the original symbols. Saves space, but requires time to save and extract. Success varies with types of data. Works best on data with low spatial variability and limited possible values. Works poorly on high spatial variability data or continuous surface. Exploits inherent redundancy and irrelevancy by transforming a data file into a smaller one.
Multiple Choice Questions: Which one of the image format uses only Lossless Data Compression? GIF PNG TIFF JPEG On which type of data, compression is not suitable… Matrix Highly spatial Vector None of these
Continued… Which one of the following technique is irreversible in nature. Run-length PCM Both a and b None of these MPEG ______ type of compression. Lossy compression Lossless compression Both None of these