Live Demo: Introducing the Spark Connector for MongoDB

mongodb 1,708 views 48 slides Sep 15, 2016
Slide 1
Slide 1 of 48
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48

About This Presentation

In this session we will guide you on the principles of MongoDB and Spark and provide examples using the new MongoDB-Spark connector.


Slide Content

MongoDB Connector For Spark

HDFS Distributed Data

Spark Stand Alone YARN Mesos HDFS Distributed Resources

YARN Spark Mesos HDFS Spark Stand Alone Hadoop Distributed Processing

YARN Spark Mesos Hive Pig HDFS Hadoop Spark Stand Alone D omain Specific Languages

YARN Spark Mesos Hive Pig Spark SQL Spark Shell Spark Streaming HDFS Spark Stand Alone Hadoop

YARN Spark Mesos Hive Pig Spark SQL Spark Shell Spark Streaming Spark Stand Alone Hadoop

Stand Alone YARN Spark Mesos Spark SQL Spark Shell Spark Streaming

Stand Alone YARN Spark Mesos Spark SQL Spark Shell Spark Streaming

executor Worker Node executor Worker Node Master Spark Connector Driver Application

Parellelize Parellelize Parellelize Parellelize

Parellelize Parellelize Parellelize Parellelize Transform Transform Transform Transform

Transformations filter( func ) union( func ) intersection( set ) distinct( n ) map( function )

Parellelize Parellelize Parellelize Parellelize Transform Transform Transform Transform Transform Transform Transform Transform

Parellelize Parellelize Parellelize Parellelize Transform Transform Transform Transform Transform Transform Transform Transform Action Action Action Action

Actions collect() count() first() take( n ) reduce( function )

Parellelize Parellelize Parellelize Parellelize Transform Transform Transform Transform Transform Transform Transform Transform Action Action Action Action Result Result Result Result

Parellelize Parellelize Parellelize Parellelize Transform Transform Transform Transform Transform Transform Transform Transform Action Action Action Action Result Result Result Result Lineage

Parellelize Parellelize Parellelize Parellelize Transform Transform Transform Transform Transform Transform Transform Transform Action Action Action Action

Parellelize Parellelize Parellelize Parellelize Transform Transform Transform Transform Transform Transform Transform Transform Action Action Action Action

Parellelize Parellelize Parellelize Parellelize Transform Transform Transform Transform Transform Transform Transform Transform Action Action Action Action Result Result Result Result

Using the Connector

https://github.com/mongodb/mongo- spark

http://spark.apache.org/docs/latest/

{ "_id" : ObjectId("578be1fe1fe699f2deb80807"), "user_id" : 196, "movie_id" : 242, "rating" : 3, "timestamp" : 881250949 }

./b in/spark-shell \ --conf \ "spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

./bin/spark-shell \ --conf \ "spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

./bin/spark-shell \ --conf \ "spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recomm endations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

./bin/spark-shell \ --conf \ "spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

import com.mongodb.spark._ import com.mongodb.spark.rdd.MongoRDD import org.bson.Document val rdd = sc.loadFromMongoDB() for( doc <- rdd.take( 10 ) ) println( doc )

Read Config Write Config

Aggregation Filters $match | $project | $group

JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON

JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON

val aggRdd = rdd.withPipeline( Seq( Document.parse( "{ $match: { Country: \"USA\" } }" ) ) )

Spark SQL + Dataframes

RDD + Schema = Dataframe

JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON $sample

Data Locality mongos

Courses and Resources

https://university.mongodb.com/courses/M233/about

THANKS! @blimpyacht