240610_Thuy_Labseminar[Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules].pptx
thanhdowork
70 views
14 slides
Jun 24, 2024
Slide 1 of 14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
About This Presentation
Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules
Size: 2.29 MB
Language: en
Added: Jun 24, 2024
Slides: 14 pages
Slide Content
Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules Van Thuy Hoang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-06-10
BACKGROUND: Graph Convolutional Networks (GCNs) Key Idea: Each node aggregates information from its neighborhood to get contextualized node embedding. Limitation: Most GNNs focus on homogeneous graph. Neural Transformation Aggregate neighbor’s information
BACKGROUND: Motivation A large part of deep learning revolves around finding rich representations of unstructured data such as images, text and graphs. SSL on graphs
BACKGROUND: Motivation A architecture of Variational Graph AutoEncoder
Graph tokenizer Graph tokenizer: Given a graph G, the graph tokenizer employs a graph fragmentation function to break G into smaller subgraphs, such as nodes and motifs Then, these fragments are mapped into fixed-length tokens to serve as the targets being reconstructed later. Clearly, the granularity of graph tokens determines the abstraction level of representations in masked modelling
Preliminary: Masked Graph Modeling Three key steps: graph tokenizer, graph masking, and graph autoencoder Graph tokenizer A graph tokenizer tok(g) = {y_t = m(t) ∈ R d |t ∈ f(g)} to generate its graph tokens as the reconstruction targets. The tokenizer tok(·) is composed of a fragmentation function f that breaks g into a set of subgraphs Graph masking: Remask decoding masks the hidden representations of the masked nodes Vm again by a special token m1 Graph autoencoder
Preliminary: Revisiting Molecule Tokenizers Summarize the molecule tokenizers into four distinct categories
Pretrained GNN-based tokenizer (b) A motif-based tokenizer that applies the fragmentation functions of cycles and the remaining nodes. (c) A two-layer GIN-based tokenizer that extracts 2-hop rooted subtrees for every node in the graph.
Overview of the SimSGT’s framework. It applies the GTS architecture for both its encoder and decoder. SimSGT features a Simple GNN-based Tokenizer (SGT), and employs a new remask strategy to decouple the encoder and decoder of the GTS architecture ( GINE and GraphTrans )
Simple GNN-based Tokenizer SGT simplifies existing aggregation-based GNNs by removing the nonlinear update functions in GNN layers. It is inspired by studies showing that carefully designed graph operators can generate effective node representations
Experiments: Molecular property prediction Transfer learning
Experiments: Molecular property prediction Transfer learning performance for molecular property prediction (regression)
Conclusion and Future Works The roles of tokenizer and decoder in MGM for molecules A comprehensive range of molecule fragmentation functions as molecule tokenizers. The results reveal that a subgraph-level tokenizer gives rise to MRL performance. For future works, the potential application of molecule tokenizers to joint molecule-text modeling