240909_Thuy_Labseminar[Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery].pptx

thanhdowork 65 views 17 slides Sep 09, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery


Slide Content

Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery Van Thuy Hoang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-09-02 AAAI ’24

BACKGROUND: Graph Convolutional Networks (GCNs) Key Idea: Each node aggregates information from its neighborhood to get contextualized node embedding. Limitation: Most GNNs focus on homogeneous graph. Neural Transformation Aggregate neighbor’s information

Molecular property prediction with GNNs Learning molecular structures though GNNs Inputs: Molecules Outputs: a score for specific task prediction Graph Neural Networks Molecules Pooling Function Task Prediction

Problem Most existing graph pretraining endeavors often restrict themselves to simple prediction tasks rooted in close neighborhood structures. Toy example showing interactive dual-level graph pretraining. Both node-level and subgraph level are trained jointly

Incompetent Multi-Level Interactions For effective pretraining, graph self-supervised learning models should capture knowledge from node attributes and structural topology DGPM introduces a duallevel pretraining architecture.

Contributions a novel dual-level graph pretraining architecture, DGPM , to address the challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions for graph self-supervised pretraining. A novel and challenging pretext task for subgraph-level pretraining The motif auto-discovery module can autonomously uncover the crucial patterns in graph structure to enhance the generalization of graph pretraining model and improve the interpretability by providing the visualized motifs in the pretraining process.

DGPM framework Dual-level pretraining and Cross-level matching. The Duallevel pretraining encompasses a node-level feature reconstruction task and a subgraph-level motif auto-discovery task.

Dual-Level Pretrain Node Feature Reconstruction Task: For the node-level learning component, an encoder is designed primarily to capture the local node information: Employ a graph auto-encoder The goal of node level learning task is to reconstruct the input:

Subgraph Level Motif Auto-Discovery Task EdgePool Layers: collapse pairs of nodes connected by an edge into a single node to obtain a coarsened graph.

Subgraph Level Motif Auto-Discovery Task Graph Similarity Loss. With EdgePool layers, the input graph is pooled into a coarsened graph, whose nodes denote motifs from the original graph. employ the Wasserstein Weisfeiler Lehman (WWL) graph kernel to measure motif similarity as ground truth, guiding EdgePool layer training. WWL graph kernel jointly models structural similarity and node feature agreement on graphs, effectively supervising graph topology properties.

Subgraph Level Motif Auto-Discovery Task Cross-Level Matching Task: training in node reconstruction and motif discovery, we obtain a node-level encoder and a subgraph-level encoder for corresponding representations. To exploit the inherent inter-relationship between nodes and motifs, we establish a node-motif matching task connecting node-level and subgraph-level training. Given permutation P, corresponding matching labels:

Experiments Two typical scenarios : Unsupervised Representation Learning (direct utilization of trained representations for graph classification) Transfer Learning (applying a pretrain-finetune approach for molecular property prediction).

Experiments Two typical scenarios : Unsupervised Representation Learning (direct utilization of trained representations for graph classification) Transfer Learning (applying a pretrain-finetune approach for molecular property prediction).

Experiments: Unsupervised Representation Learning DGPM outperforms all unsupervised baselines across all datasets

Experiments: transfer learning The model is first pre-trained on ZINC15 and then finetuned on the following datasets The consistent outcomes in these two task settings demonstrate DGPM’s effectiveness and generalisability for a wide range of applications in various domains.

CONCLUSION DGPM: a dual-level graph self-supervised pretraining with motif discovery. DGPM introduces a motif auto-discovery task to effectively learn subgraph-level topological information. A cross-matching learning module is proposed for better dual-level feature fusion.