240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx

thanhdowork 102 views 13 slides Apr 29, 2024

Slide 1 of 13

About This Presentation

Simplifying and Empowering Transformers for Large-Graph Representations

Size: 1.55 MB

Language: en

Added: Apr 29, 2024

Slides: 13 pages

Slide Content

Simplifying and Empowering Transformers for Large-Graph Representations ( NeurIPS23) Van Thuy Hoang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-04-08

BACKGROUND: Message Passing GNNs vs Graph Transformers Generate node embeddings based on local network neighborhoods Nodes have embeddings at each layer, repeating combine messages from their neighbor using neural networks

Message Passing GNNs vs Graph Transformers a node’s update is a function over its neighbors , in GTs, a node’s update is a function of all nodes in a graph (thanks to the self-attention mechanism in the Transformer layer).

Graph Transformers: Challenges How to build GT for large-graph representations: The quadratic global attentions hinder the scalability for large graphs Over-fitting problem

Deep attention layers Do we need many attention layers? Other Transformers often require multiple attention layers for desired capacity

The power of 1-layer attention mini-batch sampling that randomly partitions the input graph into mini-batches with smaller sizes. will be fed into the SGFormer model that is implemented with a one-layer global attention and a GNN network

Simple Global Attention A single-layer global attention is sufficient. This is because through one-layer propagation over a densely connected attention graph, the information of each node can be adaptively propagated to arbitrary nodes within the batch. The computation of Eq. (3) can be achieved in O(N) time complexity, which is much more efficient than the Softmax attention in original Transformers

Incorporation of Structural Information A simple-yet-effective scheme that combines Z with the propagated embeddings by GNNs at the output layer:

Empirical Evaluation Datasets: Cora CiteSeer PubMed Actor Squirrel Chameleon Deezer

Empirical Evaluation node property prediction benchmarks

Empirical Evaluation Scalability test of training time per epoch Amazon2M dataset and randomly sample a subset of nodes with the node number ranging from 10K to 100K.

SUMMARY The potential of simple Transformer-style architectures for learning large-graph representations where the scalability challenge plays a bottleneck A one-layer attention model combined with a vanilla GCN can surprisingly produce highly competitive performance. Challenge of out-of-distribution learning

240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx