introducing spark RDDs Resilient Distribute Dataset

GravenGuan 7 views 9 slides Apr 14, 2024

Slide 1 of 9

About This Presentation

At a high level, every Spark application consists of a driver program that runs the user’s main function and executes various parallel operations on a cluster. The main abstraction Spark provides is a resilient distributed dataset (RDD), which is a collection of elements partitioned across the nod...

Size: 228.7 KB

Language: en

Added: Apr 14, 2024

Slides: 9 pages

Slide Content

INTRODUCING
RDD'S
Frank Kane

RDD
■Resilient
■Distributed
■Dataset

The SparkContext
■Created by your driver program
■Is responsible for making RDD's resilient and distributed!
■Creates RDD's
■The Spark shell creates a "sc" object for you

Creating RDD's
■nums = parallelize([1, 2, 3, 4])
■sc.textFile("file:///c:/users/frank/gobs-o-text.txt")
–or s3n:// , hdfs://
■hiveCtx = HiveContext(sc) rows = hiveCtx.sql("SELECT name, age FROM users")
■Can also create from:
–JDBC
–Cassandra
–HBase
–Elastisearch
–JSON, CSV, sequence files, object files, various compressed formats

Transforming RDD's
■map
■flatmap
■filter
■distinct
■sample
■union, intersection, subtract, cartesian

map example
■rdd = sc.parallelize([1, 2, 3, 4])
■squaredRDD= rdd.map(lambda x: x*x)
■This yields 1, 4, 9, 16

What’s that lambda thing?
Many RDD methods accept a functionas a parameter
rdd.map(lambda x: x*x)
Is the same thing as
defsquareIt(x):
return x*x
rdd.map(squareIt)
There, you now understand functional programming.

RDD actions
■collect
■count
■countByValue
■take
■top
■reduce
■… and more ...

Lazy evaluation
■Nothing actually happens in your driver program until an action is called!

introducing spark RDDs Resilient Distribute Dataset

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

introducing spark RDDs Resilient Distribute Dataset

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx