IEEE Floating Point Number with Single and Double Precision(Analog and Digital Electronics).pptx

atirathpal007 31 views 12 slides Aug 11, 2024
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

College Assignment to build a PowerPoint Presentation.
Subject : Analog and Digital Electronics
Topic : IEEE Floating Point Number Representation with Single and Double Precision ; IEEE Standard 754 ; Concept of Sign Bit , Mantissa , Biased Exponent .


Slide Content

GOVERNMENT COLLEGE OF ENGINEERING AND TEXTILE TECHNOLOGY, SERAMPORE Continuous Assessment : 1 Name Of The Topic : IEEE floating point number with single and double precision Name : Atirath Pal University Registration No : 231100110042(2023-24) University Roll No : 11000123007 Department : Computer Science and Engineering Year : 2 nd Year Semester : 3 rd S em Paper Name : Analog and Digital Electronics Paper Code : ESC-301

Topics Discussed Introduction to IEEE floating number IEEE Standard 754 3 basic components Single Precision Double Precision Examples of Single and Double Representation Important Points Conclution

IEEE floating point number Introduction The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point computation which was established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating point implementations that made them difficult to use reliably and reduced their portability. IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based PC’s, Macs, and most Unix platforms . Floating point numbers are a method for representing real numbers on a computer . They are used to approximate real numbers and are essential for scientific computing, graphics, and many other applications. I EEE Standard 754 There are several ways to represent floating point number but IEEE 754 is the most efficient in most cases page number 1

1. The Sign of Mantissa – This is as simple as the name. 0 represents a positive number while 1 represents a negative number . 2. The Biased exponent – The exponent field needs to represent both positive and negative exponents. A bias is added to the actual exponent in order to get the stored exponent. IEEE 754 has 3 basic components: IEEE 754 numbers are divided into two based on the above three components: : → single precision : →double precision. 3. The Normalised Mantissa – The mantissa is part of a number in scientific notation or a floating-point number, consisting of its significant digits. Here we have only 2 digits, i.e. O and 1. So a normalised mantissa is one with only one 1 to the left of the decimal. page number 2

single precision A floating-point variable can represent a wider range of numbers than a fixed-point variable of the same bit width at the cost of precision. A signed 32-bit integer variable has a maximum value of 2 31  − 1 = 2,147,483,647, whereas an IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2 −23 ) × 2 127  ≈ 3.4028235 × 10 38 In the IEEE 754 standard, the 32-bit base-2 format is officially referred to as  binary32 ; it was called  single  in IEEE 754-1985. IEEE 754 specifies additional floating-point types, such as 64-bit base-2  double precision  and, more recently, base-10 repre - sentations . IEEE 754 standard: binary32 The IEEE 754 standard specifies a  binary32  as having: Sign bit: 1 bit Exponent width: 8 bits Significant precision: 24 bits (23 explicitly stored) page number 3

Double Precision Double-precision floating-point format  (sometimes call  FP64  or  float64 ) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Double precision may be chosen when the range or precision of single precision would be insufficient. In the IEEE 754 standard, the 64-bit base-2 format is officially referred to as  binary64 ; it was called  double  in IEEE 754-1985. IEEE 754 specifies additional floating-point formats, including 32-bit base-2  single precision  and, more recently, base-10 representations (decimal floating point). The representation and properties of floating-point data types depended on the computer manufacturer and computer model, and upon decisions made by programming-language implementers. E.g., GW-BASIC's double-precision data type was the 64-bit MBF floating-point format. page number 4

Double Precision IEEE 754 double-precision binary floating-point format: binary64 Double-precision binary floating-point is a commonly used format on PCs, due to its wider range over single-precision floating point, in spite of its performance and bandwidth cost. It is commonly known simply as  double . The IEEE 754 standard specifies a  binary64  as having: Sign bit: 1 bit Exponent: 11 bits Significand precision: 53 bits (52 explicitly stored) The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2 −53  ≈ 1.11 × 10 −16 ). With the 52 bits of the fraction (F) significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits page number 5

Examples of Single precision floating point representation The value of a float type is represented using single precision. The value is -> (-27.625) base 10 Sol: 27 = (11011) , 0.625 = (101) 27.625 = (11011.101) = (1.1011101 * 2 4 ) {after normalization} Hence bias = 127 Means Bias exponent = 4+127 = 131 = (10000011) Mantissa = (1011101…..0) For negative value the signed bit S = 1 . 1 1 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 …… 0 Sign Exponent Mantissa page number 6

Examples of Double precision floating point representation The value of a float type is represented using Double precision. The value is -> (-0.001) base 2 Sol: (0.001) = (1.00 * 2 --3 ) ……{after normalization} Hence bias = 1023 (for Double precision) Means Bias exponent = --3 + 1023 = 1020 = (01111111100) Mantissa = (0000 …… 0) For negative value the signed bit S = 1 . 1 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 ……..0 (52 0’s) Sign Exponent Mantissa page number 7

Important points : Here are the keypoints regarding single and double precision floating-point representation, summarized using single points: Single Precision (32-bit) Size : 32 bits. Components : 1 sign bit, 8 exponent bits, 23 mantissa bits. Precision : Approximately 7 decimal digits. Range : ~1.4×10−451.4 \times 10^{-45}1.4×10−45 to 3.4×10383.4 \times 10^{38}3.4×1038. Memory Usage : Lower, faster computation. Applications : Graphics processing, machine learning, mobile apps. Double Precision (64-bit) Size : 64 bits. Components : 1 sign bit, 11 exponent bits, 52 mantissa bits. Precision : Approximately 15-17 decimal digits. Range : ~4.9×10−3244.9 \times 10^{-324}4.9×10−324 to 1.8×103081.8 \times 10^{308}1.8×10308. Memory Usage : Higher, slower computation. Applications : Scientific computing, financial calculations, high-precision tasks. page number 8

Conclusion In conclution of the topic we will discuss about some key takeaways. · The IEEE 754 standard is a cornerstone of reliable numerical computation in digital systems. · Proper understanding and application of floating-point representation are essential for designing accurate and efficient electronic systems. · Awareness of precision and rounding issues helps mitigate potential errors in critical applications. · IEEE 754 Standard : The foundation for floating-point arithmetic, including single and double precision formats · Applications in Electronics : How floating-point representation is crucial in digital signal processing, control systems, and other electronic applications where precision is key. page number 9