[NS][Lab_Seminar_240626]GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification.pptx

thanhdowork 72 views 21 slides Jun 28, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification


Slide Content

GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/06/26 Bin Wang et al. WACV 2024

Challenges for Chest X-ray Classification Chest X-ray has limited soft tissue contrast, containing a variety of complex anatomical structures overlapping in planar (2D) view Many tissues, such as organs, blond vessels, and muscles, have similar intensity values on the chest X-ray images Confuse the deep-learning model to localize the abnormality

Motivation Eye Tracking Technology collects Eye Gaze from radiologists Eye Gaze contains: Location information that radiologists have fixation during diagnostic => Potential abnormality or important regions (Challenging X-ray cases can not provide) => Supplement human attention into deep-learning model => Learn in interpretable way

Current Solutions Attention consistency architecture Attention Consistency Architecture Consider Eye Gaze as a supervision source Drawbacks No Eye Gaze data when inference => Not Robust Performance drop when testing on dataset with distribution shift

Current Solutions Two-stream architecture Two-stream Architecture Two branches dedicated to processing the image and eye gaze information separately Embed eye gaze feature into the training loop Pre-generate all Visual Attention Map (VAM) Drawbacks Transfer Eye Gaze into VAM for each case when inference Generate of VAM -> Time Consuming (~10 seconds) Not practical for real-world clinical diagnosis

Current Solutions Proposed Method Still embed eye gaze in inference => Robust Get rid of the visual attention map generation Replace it with gaze embedding => Time-efficient

Method GazeGNN

Method Patch Embedding Image input size is 224*224 Divide image into multiple 15*15 patches and each patch as a node Given an image I, split into N patches Each patch extracted a feature vector that encodes local image information Adopt the overlapping patch embedding method to extract feature vectors from image patches [43] [43] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 8(3):415–424, 2022

Method Gaze Embedding Eye-gaze data consists of many scatter points, each of them means the radiologist’s eyes have concentrated on location for a moment when they were performing image reading Eye gaze not only provides the location information but also offers the time duration for each point Perform time aggregation to get the fixation time for each patch, to maintain consistency with the feature vector Sum up all eye-gaze points’ fixation time in the patch to represent the attention feature of the patch, each patch gaze embedding:

Method Position Embedding During graph processing in GNN, features are treated as unordered nodes To keep positional information in the original image, => adopt position embedding method [11], contain two steps: Add a learnable absolute positional encoding vector to feature vector Calculate relative positional distance between nodes => determine the neighbors of given node in the k-nearest neighbors [11] Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, and En- hua Wu. Vision gnn: An image is worth graph of nodes. arXiv preprint arXiv:2206.00272, 2022.

Method Graph Construction With patch, gaze, and position embeddings, graph node feature vector: Calculate k-nearest neighbors, edges are defined:

Method GNN Consist of L graph processing blocks, an average pooling layer, and a graph classification head Graph processing block consists of multiple fully-connected (FC) layers and a graph convolutional layer Graph convolution operation FC layer Weight matrix Aggregation function

Experiments Datasets Public chest X-ray dataset contains 1083 cases from MIMIC-CXR dataset Each case, a gray-scaled X-ray image with the size of 3000x3000, eye-gaze data, ground-truth classification labels are provided 3 classified categories: Normal, Congestive Heart Failure, and Pneumonia Generate static VAMs from eye-gaze data using data post-processing method Metrics: Area under the receiver operating characteristic curve (AUC) Precision Recall F1-score

Experiments Improving Disease Classification Accuracy

Experiments Improving Disease Classification Accuracy

Experiments Improving Inference Speed

Experiments Improving Model Robustness

Experiments Effectiveness of GNN

Results Ablation Study of Gaze Usage

Results Ablation Study of Gaze Usage

Conclusion Propose a novel gaze-guided GazeGNN to perform disease classification task GazeGNN can utilize the raw eye-gaze information directly by embedding it with image patch and position information into graph nodes Avoid generating the VAMs, required in mainstream gaze-guided methods Develop real-time, end-to-end disease classification algorithm without preparing the visual attention maps