GraphSage vs Pinsage #InsideArangoDB

arangodb 2,250 views 6 slides Jul 15, 2021
Slide 1
Slide 1 of 6
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6

About This Presentation

The ArangoML Group had a detailed discussion on the topic "GraphSage Vs PinSage" where they shared their thoughts on the difference between the working principles of two popular Graph ML algorithms. The following slidedeck is an accumulation of their thoughts about the comparison between ...


Slide Content

GraphSage Vs PinSage
Discussion between the two popular Graph ML
algorithms
ArangoDB ML Reading Group

PinSage
●It is an Inductive based Graph Convolutional Neural Networks (GCNs) for Web-Scale
Recommender Systems
●PinSage is a random-walk based GCNs algorithm which learns embeddings for nodes (in
billions) in web scale graphs
●Due to its inductive nature it is highly-scalable and generic model.
●Altogether, the Pinterest graph (Dataset) contains 2 billion pins, 1 billion boards, and over 18
billion edges (i.e. memberships of pins to their corresponding boards)
●Once embeddings are learned (aka pin embeddings) it can be used for classification,
clustering or reranking.
●It is mainly used by Pinterest for visual recommendations (pins are visual bookmarks e.g. for
buying clothes or other products)
●It solves the problem of operating on entire graph laplacian during training.

PinSage
●Pinterest is a platform in which they
share and organise images.
●Images are referred to as Pins
●Users stack similar images in albums
(boards)
●PinSage simplify the graph, by
forming a Bipartite Graph of
pins — boards
Reference: PinSage
Fig1: PinSage

GraphSage
●An inductive variant of GCNs
●Could be Supervised or Unsupervised or
Semi-Supervised
●Aggregator gathers all of the sampled
neighbourhood information into 1-D
vector representations
●Does not perform on-the-fly
convolutions
●The whole graph needs to be stored in
GPU memory
●Does not support MapReduce Inference
●Perform random sampling to get the
neighbourhood of a node u
PinSage
●Built on top of GraphSage
●Supervised
●It has Convolutions which are same as
aggregators in GraphSage
●It performs on-the-fly convolutions
(sampling the neighbourhood of nodes on
demand)
●The whole graph does not need to be in
memory (using producer consumer
architecture)
●Supports Efficient MapReduce Inference
(while performing aggregation we might
compute convolutions repeatedly which can
be minimized by mapreduce)
●Use Random Walks for neighborhood
sampling (where the neighborhood of a
node u is defined as the T nodes that exert
the most influence on node u)

GraphSage
●Does not support importance pooling
●Does not support Curriculum training
PinSage
●Supports importance pooling where we
compute scores of each and every
neighbour using random walk
●Supports Curriculum training where
algorithm is fed harder-and-harder examples
during training (12% performance gain)