Introduction to GraphX | Big Data Hadoop Spark Tutorial | CloudxLab

CloudxLab 283 views 13 slides May 14, 2018
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Big Data with Hadoop & Spark Training: http://bit.ly/2IYeuvF

This CloudxLab Introduction to GraphX tutorial helps you to understand GraphX in detail. Below are the topics covered in this tutorial:

1) Introduction to GraphX
2) What is Graph?
3) Examples of Graph Computation
4) Pagerank using Gr...


Slide Content

GraphX
GraphX

GraphX
What is a graph

GraphX
Examples of graph computations
●Finding common friends
●Finding the page rank
●And Many more…

GraphX
Finding common friends
Examples of graph computations

GraphX
Finding Page Rank
Examples of graph computations

GraphX
●Unifies Graph Computation
○ETL
○Exploratory analysis
○Iterative
●View the same Data as Graph and Collections
●Transform and join graphs with RDDs efficiently
●Extends the Spark RDD by introducing a new Graph
abstraction
GraphX

GraphX
GraphX
○PageRank
■If important pages link you, you are more important
○Connected Components
■Clusters amongst your facebook friends
○Triangle Counting
■Triangles passing through each vertex => measure of
clustering.
○Label propagation
○SVD++
○Strongly connected components
Has library of algorithms

GraphX
GraphX
●subgraph
●joinVertices
●aggregateMessages
●And more….
Provides set of fundamental operations
https://spark.apache.org/docs/latest/graphx-programming-guide.html

GraphX
GraphX - Pagerank
1.BarackObama
2. Lady Gaga
3. John Resig
4. Justin Bieber
6. Matei Zaharia
6. Martin Odersky
7. anonsys
PR(A) = 0.15 + 0.85 * ( rank of node / outgoing)

GraphX
GraphX - Pagerank
$ hadoop fs -cat /data/spark/graphx/followers.txt
2 1
4 1
1 2
6 3
7 3
7 6
6 7
3 7
https://github.com/cloudxlab/bigdata/blob/master/spark/examples/graphx/pagerank.scala

GraphX
GraphX - Pagerank
import org.apache.spark.graphx.GraphLoader

// Load the edges as a graph
val graph = GraphLoader.edgeListFile(sc, "/data/spark/graphx/followers.txt")
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val users = sc.textFile("/data/spark/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
val ranksByUsername = users.join(ranks).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result
println(ranksByUsername.collect().mkString("\n"))

See more

GraphX
GraphX - Pagerank
1.BarackObama
2. Lady Gaga
3. John Resig
4. Justin Bieber
6. Matei Zaharia
6. Martin Odersky
7. anonsys
0.15
0.70
1.39
1.46
1.0
1.3

Thank you!
GraphX
[email protected]