1.1 Introduction.pptx about the design thinking of the engineering students

HrushikeshDandu 7 views 22 slides Mar 05, 2025
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

Full detailed about the design thinking


Slide Content

UNIT 1 - Introduction

AUTOMATING PARALLEL PROGRAMMING When writing code, we typically don’t need to understand the details of the target system, as the compiler handles it . Developers usually think in terms of a single CPU and sequential processing during coding and debugging . Implementing algorithms for parallel systems (software or hardware) is more related than it seems . Parallelism in software and hardware shares common challenges and approaches.

AUTOMATING PARALLEL PROGRAMMING

AUTOMATING PARALLEL PROGRAMMING Layers of Implementation Layer 5 - Application Layer : Defines the application or problem to be implemented on a parallel computing platform . Specifies the inputs and outputs, including data storage and timing requirements. Layer 4 - Algorithm Development : Focuses on defining the tasks and their interdependencies . Parallelism may not be evident in this layer, as tasks are usually developed for linear execution . The result is a dependence graph, directed graph (DG), or adjacency matrix summarizing task dependencies.

AUTOMATING PARALLEL PROGRAMMING Layer 3 - Parallelization Layer : Extracts parallelism from the algorithm developed in Layer 4 . It generates thread timing and processor assignments for software or hardware implementations . This layer is crucial for optimizing the algorithm for parallel execution . Layer 2 - Coding Layer : Involves writing the parallel algorithm in a high-level language . The language depends on the target parallel computing platform . For general-purpose platforms, languages like Cilk ++, OpenMP , or CUDA (computer unified device architecture) are used . For custom platforms, Hardware Description Languages (HDLs) like Verilog or VHDL are used.

AUTOMATING PARALLEL PROGRAMMING Layer 1 - Realization Layer : The algorithm is realized on a parallel computer platform, using methods like multithreading or custom parallel processors (e.g., ASICs (application specific integrated circuit) or FPGAs (Field programmable gateways) ). Automatic Programming in Parallel Computing : Automatic serial programming: The programmer writes code in high-level languages (C, Java, FORTRAN), and the code is compiled automatically . Parallel computing : It is more complex as programmers need to manage how tasks are distributed and executed across multiple processors . Parallelizing compilers can handle simple loops and embarrassingly parallel algorithms (tasks that can be easily parallelized ). For more complex tasks, the programmer needs intimate knowledge of processor interactions and task execution timing.

Parallel Algorithms and Parallel Architectures Parallel algorithms and parallel hardware are interconnected; the development of one often depends on the other . Parallelism can be implemented at different levels in a computing system through hardware and software techniques : Data-Level Parallelism Operates on multiple bits of a datum or multiple data simultaneously . Examples : Bit-parallel addition, multiplication, division, vector processor arrays, and systolic arrays . Instruction-Level Parallelism (ILP ) Executes multiple instructions simultaneously within a processor . Example : Instruction pipelining.

Parallel Algorithms and Parallel Architectures Thread-Level Parallelism (TLP ) Executes multiple threads (lightweight processes sharing processor resources) simultaneously . Threads can run on one or multiple processors . Process-Level Parallelism Manages multiple independent processes, each with dedicated resources like memory and registers . Example : Classic multitasking and time-sharing across single or multiple machines.

Measuring benefits of Parallel Computing Speedup Factor The benefit of parallel computing is measured by comparing the time taken to complete a task on a single processor with the time taken on N parallel processors. The speedup, S(N ) , is defined as : where T p (1) is the algorithm processing time on a single processor and T p ( N ) is the processing time on the parallel processors. In an ideal situation, for a fully parallelizable algorithm, and when the communication time between processors and memory is neglected , we have T p ( N ) = T p (1)/ N , and the above equation gives

Communication Overhead Both single and parallel computing systems require data transfer between processors and memory . Communication delays occur due to a speed mismatch between the processor and memory . Parallel systems need processors to exchange data via interconnection networks, adding complexity . Issues Affecting Communication Efficiency : Interconnection Network Delay: Delays arise from factors like: Bit propagation. Message transmission. Queuing within the network. These delays depend on network topology, data size, and network speed.

Communication Overhead Memory Bandwidth : Memory access is limited by a single-port system, restricting data transfer to one word per memory cycle . Memory Collisions : Occur when multiple processors try to access the same memory module simultaneously . Arbitration mechanisms are required to resolve access conflicts . M emory Wall : Memory transfer speeds lag behind processor speeds . This problem is being solved using memory hierarchy such as register → cache → RAM → electronic disk → magnetic disk → optical disk).

Communication Overhead

Estimating Speedup Factor and Communication Overhead Let us assume we have a parallel algorithm consisting of N independent tasks that can be executed either on a single processor or on N processors Under these ideal circumstances,

Amdahl's Law Amdahl's Law is a fundamental principle used to estimate the potential speedup that can be achieved by parallelizing a computation. It describes the maximum expected improvement in the execution time of a program when part of the computation is parallelized.

Amdahl's Law Overall Speedup(max) = 1/{1 – Fraction Enhanced } Likewise , we can also think of the case where f = 1. Amdahl’s law is a principle that states that the maximum potential improvement to the performance of a system is limited by the portion of the system that cannot be improved. In other words, the performance improvement of a system as a whole is limited by its bottlenecks. The law is often used to predict the potential performance improvement of a system when adding more processors or improving the speed of individual processors. It is named after Gene Amdahl, who first proposed it in 1967.

Amdahl's Law The formula for Amdahl’s law is: S = 1 / (1 – P + (P / N)) Where: S is the speedup of the system P is the proportion of the system that can be improved N is the number of processors in the system For example, if a system has a single bottleneck that occupies 20% of the total execution time, and we add 4 more processors to the system, the speedup would be: S = 1 / (1 – 0.2 + (0.2 / 5)) S = 1 / (0.8 + 0.04 ) S = 1 / 0.84 S = 1.19 This means that the overall performance of the system would improve by about 19% with the addition of the 4 processors.

APPLICATIONS OF PARALLEL COMPUTING Scientific Research and Simulation: Weather Forecasting: Running complex models to predict weather patterns and climate changes. Astrophysics and Cosmology: Simulating celestial bodies, universe evolution, etc. Molecular Dynamics: Studying molecular interactions , protein folding, drug discovery, etc . Big Data Analytics and Data Processing: Data Mining: Analyzing vast datasets to extract patterns, trends, and insights. Machine Learning and AI: Training deep neural networks, processing large datasets in real-time. Web Search Engines: Indexing and retrieving information from enormous web databases.

APPLICATIONS OF PARALLEL COMPUTING High-Performance Computing (HPC): Financial Modeling: Performing risk analysis, option pricing, and portfolio optimization. Fluid Dynamics and Computational Chemistry : Simulating fluid flows, chemical reactions, etc . Finite Element Analysis: Solving complex engineering problems in aerospace, automotive industries, etc . Parallel Databases and Search Algorithms: Parallel Database Systems: Handling concurrent queries and transactions in large-scale databases. Parallel Search Algorithms: Speeding up searches in large datasets, such as in cryptography and pattern matching.

APPLICATIONS OF PARALLEL COMPUTING Image and Signal Processing: Medical Imaging: Processing MRI, CT scans for diagnostics and treatment planning. Video Processing: Real-time video encoding, decoding, and analysis . Distributed Systems and Networking: Distributed Computing: Handling distributed tasks efficiently in cloud computing environments. Network Routing and Traffic Analysis: Optimizing routing algorithms, analyzing network traffic. Real-Time Systems and Simulation: Robotics and Automation: Controlling multiple robots simultaneously for complex tasks. Virtual Reality and Gaming: Rendering complex scenes and simulations in real-time.

SHARED - MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [ UMA ]) Shared-memory processors are popular due to their simple and general programming model, enabling easy development of parallel software . Another term for shared-memory processors is Parallel Random Access Machine (PRAM ) . A shared-address space is used for communication between processors, with all processors accessing a common memory space.

SHARED - MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [ UMA ])