Data Representation Data refers to the symbols that represent people, events, things, and ideas. Data can be a name, a number, the colors in a photograph, or the notes in a musical composition. Data Representation refers to the form in which data is stored, processed, and transmitted. Devices such as smartphones , iPods, and computers store data in digital formats that can be handled by electronic circuitry.
The 0s and 1s used to represent digital data are referred to as binary digits — from this term we get the word bit that stands for binary digit. A bit is a 0 or 1 used in the digital representation of data. A digital file, usually referred to simply as a file, is a named collection of data that exits on a storage medium, such as a hard disk, CD, DVD, or flash drive.
Digital Number system Number System represents value of number with respect to its given base. Base of a number system is the number of different symbols available in a particular number system.. Based on its Base value, a number has unique representation and different number systems have different representation of the same number. For example, Binary, Octal, Decimal and Hexadecimal Number systems are used in microprocessor programming.
Decimal Number System If the Base value of a number system is 10, then it is called Decimal number system. It has 10 symbols, these are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Position of every digit has a weight(positional value) which is a power of 10. Each position in the decimal system is 10 times more significant than the previous position That means numeric value of a decimal number is determined by multiplying each digit of the number by the value of the position in which the digit appears and then adding the products.
The number 6784 is interpreted as : 6784 = 6x10 3 + 7x10 2 + 8x10 1 + 4x10 = 6000 + 700 + 80 + 4 = 6784 The number 123.45 is interpreted as : 123.45 = 1x10 2 + 2x10 1 + 3x10 + 4x10 -1 + 5x10 -2 = 100+20+3+0.4+0.05 = 123.45 10 2 10 1 10 10 -1 10 -2 1 2 3 4 5 Positional Values (weights) Most significant Digit (MSD) Least Significant Digit (LSD)
Binary Number System Uses two digits, 0 and 1 Base 2 Position of every bit has a weight(positional value) which is a power of 2. 10101 2 = 1 x 2 4 + x 2 3 + 1 x 2 2 + x 2 1 + 1 x 2 = 16 + 0 + 4 + 0 + 1 = 21 10
Octal Number System Uses eight digits, 0,1,2,3,4,5,6,7 Base is 8 12570 8 =1x 8 4 + 2 x 8 3 + 5 x 8 2 + 7 x 8 1 + x 8 = 4096 + 1024 + 320 + 56 + = 5496 10
Hexadecimal Number System Uses 10 digits and 6 letters, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F Base is 16 19FDE 16 = 1 x 16 4 + 9 x 16 3 + F x 16 2 + D x 16 1 + E x 16 = 1 x 16 4 + 9 x 16 3 + 15 x 16 2 + 13 x 16 1 + 14 x 16 = 65536 + 36864 + 3840 + 208 + 14 = 106462 10
1’s Complement : is obtained by changing bits from 0 to 1 and 1 to 0 1’s complement of 10100110 is 01011001 2’s complement : is obtained by first finding 1’s complement of the binary number then adding 1. Find 2’s complement of 11100010 2 1’s complement 00011101 Add 1 1 00011110
Representing Integer number using sign and magnitude 1 1 +9 10 =+0001001 2 Magitude Sign bit -5 10 = -101 2 1 1 1 Magitude Sign bit 0 : + sign 1 : - sign
Representing Real numbers using Mantissa & Exponent form 1 1 1 1 1 1 1 1 -1111010.1 2 = -1.111010x2 6 Sign bit Mantissa Exponent
Character Representation in memory ASCII ISCII Unicode UTF-8 UTF-32
ASCII American Standard Code for Information Interchange. is a character encoding standard for electronic communication. ASCII codes represent text in computers. It is a 7 bit code, 2 7 = 128 characters can be represented. A - 65 -1000001 a=97 B - 66 -1000010 b=98 …. Z- 90 z-122 0 - 48 1- 49 … 9 – 57 An extended ASCII code has 8 its and it can represent 2 8 =256 characters
ISCII Encoding acronym for Indian Script Code for Information Interchange. ISCII is an encoding scheme that represents various languages that are written and spoken in India . It follows an 8-bit encoding schema and contains about 256 characters. The first 128 characters, that is, from 0-127 are same as that for ASCII. The next characters, that are from 128-255 represent the characters from the Indian scripts . Most of the Indian language characters are taken from the ancient Brahmi script and resemble close to each other due to having similar phonetic structure . Hence, a common character set was possible.
Characters Before Unicode Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different systems, called character encodings, for assigning these numbers. These early character encodings were limited and could not contain enough characters to cover all the world's languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. Early character encodings also conflicted with one another. That is, two encodings could use the same number for two different characters, or use different numbers for the same character . Any given computer (especially servers) would need to support many different encodings. However, when data is passed through different computers or between different encodings, that data runs the risk of corruption.
Unicode Characters The Unicode Standard provides a unique number for every character, no matter what platform, device, application or language. It has been adopted by all modern software providers and now allows data to be transported through many different platforms, devices and applications without corruption. Support of Unicode forms the foundation for the representation of languages and symbols in all major operating systems, search engines, browsers, laptops, and smart phones—plus the Internet and World Wide Web
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
UTF-8 U nicode T ransformation F ormat 8 -bit is a variable-width encoding that can represent every character in the Unicode character set. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte code units . U+0000, U+0001….
UTF-32 Unicode Transformation Format 32-bit a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes )