250303_JW_labseminar[Self-Supervised Graph Transformer on Large-Scale Molecular Data].pptx

thanhdowork 109 views 19 slides Mar 03, 2025
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

Self-Supervised Graph Transformer on Large-Scale Molecular Data


Slide Content

Self-Supervised Graph Transformer on Large-Scale Molecular Data Jin-Woo Jeong Network Science Lab Dept. of Mathematics The Catholic University of Korea E-mail: [email protected] Rong, Yu, et al. NeurIPS  (2020)

Introduction Transformer & GNNs The GROVER Pre-training Framework Experiments Conclusions Q/A

Introduction Introduction Molecular Property Prediction with GNNs: GNNs have been effectively applied to predict molecular properties by modeling molecules as graphs. Existing Challenges: Limited Labeled Data: Wet-lab experiments for obtaining labels are expensive, leading to scarce supervised data. Poor Generalization: Traditional GNNs struggle to generalize to novel molecules in the vast chemical space. Our Approach: We propose a novel framework that overcomes these issues by leveraging self-supervised pre-training on massive unlabeled molecular data, integrating dynamic message passing with Transformer-style architectures. G raph R epresentation fr O m self- super V ised m E ssage passing t R ansformer .

Transformer & GNNs Transformer and Attention mechanism The attention mechanism is the main building block of Transformer Where is the dimension of and .   Where are the projection matrices of head .  

Transformer & GNNs GNNs The key operation of 3 GNNs lies in a message passing process, which involves message passing (also called neighborhood aggregation) between the nodes in the graph. In general, the message passing process involves several iterations, each iteration can be further partitioned into several hops. Suppose there are iterations, and iteration contains hops. Formally, in iteration , the - th hop can be formulated as,   Where is the aggregated message, and is some activation function. And READOUT  

The GROVER Pre-training Framework Details of Model Architecture GNN Transformer (GTransformer) A vanilla attention block, such as that in , requires vectorized inputs. However, graph inputs are naturally structural data that are not vectorized. So we design a tailored GNNs (dyMPN) to extract vectors as queries, keys and values from nodes of the graph, then feed them into the attention block. - Alleviating vanishing gradient problem Alleviating - problem  

The GROVER Pre-training Framework Details of Model Architecture Dynamic Message Passing Network (dyMPN) The general message passing process ( ) has two hyperparameters: number of iterations/layers and number of hops within each iteration. Instead of pre-specified , we develop a randomized strategy for choosing the number of message passing hops during training process: at each epoch, we choose from some random distribution for layer Two choices of randomization work well , drawn from a uniform distribution is drawn from a truncated normal distribution  

The GROVER Pre-training Framework Details of Model Architecture

The GROVER Pre-training Framework Self-supervised Task Construction for Pre-training Contextual Property Prediction

The GROVER Pre-training Framework Self-supervised Task Construction for Pre-training Graph-level Motif Prediction One important class of motifs in molecules are functional groups, which encodes the rich domain knowledge of molecules and can be easily detected by the professional software, such as RDKit . Formally, the motif prediction task can be formulated as a multi-label classification problem, where each motif corresponds to one label.

The GROVER Pre-training Framework Fine-tuning for Downstream Tasks READOUT & MLP

Experiments Datasets Pre-training Data Collection We collect 11 million ( ) unlabelled molecules sampled from ZINC15 and Chembl datasets to pre-train GROVER. We randomly split 10% of unlabelled molecules as the validation sets for model selection. Fine-tuning Tasks and Datasets 11 benchmark datasets from the MoleculeNet  

Experiments Baselines 10 popular baselines from MoleculeNet and several state-of-the-arts (STOAs) approaches. TF_Robust GraphConv Weave SchNet MPNN DMPNN MGCN AttentiveFP N-GRAM HU. et.al

Experiments Results on Downstream Tasks

Experiments Ablation Study How Useful is the Self-supervised Pre-training?

Experiments Ablation Study How Powerful is GTransformer Backbone?

Experiments Ablation Study Effect of the Proposed dyMPN and GTransformer.

Conclusion Conclusion & Future Works With well-designed self-supervised tasks and largely-expressive architecture, our model GROVER can learn rich implicit information from the enormous unlabelled graphs. More importantly, by fine-tuning on GROVER, we achieve huge improvements (more than 6% on average) over current STOAs on 11 challenging molecular property prediction benchmarks, which first verifies the power of self-supervised pretrained approaches in the graph learning area. Despite the successes, there is still room to improve GNN pre-training in the following aspects: More self-supervised tasks More downstream tasks Wider and deeper models

Q & A Q / A
Tags