Spark architecture

GauravBiswas9 2,388 views 21 slides Apr 19, 2019

Slide 1 of 21

About This Presentation

APACHE Spark architecture

Size: 1.27 MB

Language: en

Added: Apr 19, 2019

Slides: 21 pages

Slide Content

APACHE SPARK ARCHITECTURE Gaurav biswas Bit mesra 16-04-2019 1

OUTLINE SPARK & ITS FEATURE SPARK ARCHITECTURE RESILIENT DISTRIBUTED DATASETS(RDDs) DIRECT ACYCLIC GRAPH(DAG) ADVANTAGES & DRAWBACKS CONCLUSION 16-04-2019 2

INTRODUCTION Apache Spark : an open source cluster computing framework for real-time data processing According to Spark Certified Experts : Sparks performance is up to 100 times faster in memory and 10 times faster on disk when compared to Hadoop The main feature of Apache Spark is its in-memory cluster computing that increases the processing speed of an application 16-04-2019 3

FEATURES OF APACHE SPARK 16-04-2019 4

FEATURES OF APACHE SPARK Speed : Spark runs up to 100 times faster than Hadoop MapReduce for large-scale data processing Powerful Caching : Simple programming layer provides powerful caching and disk persistence capabilities . Deployment: It can be deployed through Mesos , Hadoop via YARN, or Spark’s own cluster manager 16-04-2019 5

FEATURES OF APACHE SPARK Real-Time: It offers Real-time computation & low latency because of in-memory computation Polyglot: Spark provides high-level APIs in Java, Scala , Python, and R. Spark code can be written in any of these four languages. It also provides a shell in Scala and Python 16-04-2019 6

SPARK ARCHITECTURE 16-04-2019 7 Figure:-Apache spark architecture

CORE CONCEPTS 16-04-2019 8

SPARK ARCHITECTURE SPARK DRIVE :- Separate process to execute user application Creates SparkContext to schedual Jobs execution & negotiate with cluster manager EXECUTORS :- Run tasks scheduled by driver Store computation result in memory,on disk or off-heap Interact with storage systems 16-04-2019 9

SPARK ARCHITECTURE CLUSTER MANAGER :- Spark context works with the cluster manager to manage various jobs The driver program & Spark context takes care of the job execution within the cluster 16-04-2019 10

SPARK ARCHITECTURE Apache Spark Architecture is based on two main abstractions: Resilient Distributed Dataset (RDD) Directed Acyclic Graph (DAG) 16-04-2019 11

Resilient Distributed Dataset (RDD) 16-04-2019 12

Resilient Distributed Dataset (RDD) 16-04-2019 13

Resilient Distributed Dataset (RDD) 16-04-2019 14

Resilient Distributed Dataset (RDD) 16-04-2019 15

OPERATION OF RDD:- RDDs can perform two types of operations: Transformations: They are the operations that are applied to create a new RDD. Actions: They are applied on an RDD to instruct Apache Spark to apply computation and pass the result back to the driver. 16-04-2019 16

DIRECT ACYCLIC GRAPH(DAG) 16-04-2019 17

DIRECT ACYCLIC GRAPH(DAG) 16-04-2019 18

ADVANTAGES & drawbacks ADVANTAGES: Integration with Hadoop Faster Real time stream processing DRAWBACKS: No File Management system No Support for Real-Time Processing Cost Effective Manual Optimization 16-04-2019 19

Conclusion SPARK makes it easy to write and run complicated data processing It enables computation of tasks at a very large scale Although spark has many limitations, it is still trending in the big data world Due to these drawbacks, many technologies are overtaking Spark Such as Flink offers complete real-time processing than the spark In this way somehow other technologies overcoming the drawbacks of Spark 16-04-2019 20

THANK YOU 16-04-2019 21

Download

Download Slideshow Get the original presentation file

Quick Actions

Statistics

Views 2,388
Slides 21
Favorites 1
Age 2420 days

Spark architecture

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Spark architecture

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx