[NS][Lab_Seminar_240626]GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification.pptx
thanhdowork
72 views
21 slides
Jun 28, 2024
Slide 1 of 21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
About This Presentation
GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification
Size: 2.85 MB
Language: en
Added: Jun 28, 2024
Slides: 21 pages
Slide Content
GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/06/26 Bin Wang et al. WACV 2024
Challenges for Chest X-ray Classification Chest X-ray has limited soft tissue contrast, containing a variety of complex anatomical structures overlapping in planar (2D) view Many tissues, such as organs, blond vessels, and muscles, have similar intensity values on the chest X-ray images Confuse the deep-learning model to localize the abnormality
Motivation Eye Tracking Technology collects Eye Gaze from radiologists Eye Gaze contains: Location information that radiologists have fixation during diagnostic => Potential abnormality or important regions (Challenging X-ray cases can not provide) => Supplement human attention into deep-learning model => Learn in interpretable way
Current Solutions Attention consistency architecture Attention Consistency Architecture Consider Eye Gaze as a supervision source Drawbacks No Eye Gaze data when inference => Not Robust Performance drop when testing on dataset with distribution shift
Current Solutions Two-stream architecture Two-stream Architecture Two branches dedicated to processing the image and eye gaze information separately Embed eye gaze feature into the training loop Pre-generate all Visual Attention Map (VAM) Drawbacks Transfer Eye Gaze into VAM for each case when inference Generate of VAM -> Time Consuming (~10 seconds) Not practical for real-world clinical diagnosis
Current Solutions Proposed Method Still embed eye gaze in inference => Robust Get rid of the visual attention map generation Replace it with gaze embedding => Time-efficient
Method GazeGNN
Method Patch Embedding Image input size is 224*224 Divide image into multiple 15*15 patches and each patch as a node Given an image I, split into N patches Each patch extracted a feature vector that encodes local image information Adopt the overlapping patch embedding method to extract feature vectors from image patches [43] [43] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 8(3):415–424, 2022
Method Gaze Embedding Eye-gaze data consists of many scatter points, each of them means the radiologist’s eyes have concentrated on location for a moment when they were performing image reading Eye gaze not only provides the location information but also offers the time duration for each point Perform time aggregation to get the fixation time for each patch, to maintain consistency with the feature vector Sum up all eye-gaze points’ fixation time in the patch to represent the attention feature of the patch, each patch gaze embedding:
Method Position Embedding During graph processing in GNN, features are treated as unordered nodes To keep positional information in the original image, => adopt position embedding method [11], contain two steps: Add a learnable absolute positional encoding vector to feature vector Calculate relative positional distance between nodes => determine the neighbors of given node in the k-nearest neighbors [11] Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, and En- hua Wu. Vision gnn: An image is worth graph of nodes. arXiv preprint arXiv:2206.00272, 2022.
Method Graph Construction With patch, gaze, and position embeddings, graph node feature vector: Calculate k-nearest neighbors, edges are defined:
Method GNN Consist of L graph processing blocks, an average pooling layer, and a graph classification head Graph processing block consists of multiple fully-connected (FC) layers and a graph convolutional layer Graph convolution operation FC layer Weight matrix Aggregation function
Experiments Datasets Public chest X-ray dataset contains 1083 cases from MIMIC-CXR dataset Each case, a gray-scaled X-ray image with the size of 3000x3000, eye-gaze data, ground-truth classification labels are provided 3 classified categories: Normal, Congestive Heart Failure, and Pneumonia Generate static VAMs from eye-gaze data using data post-processing method Metrics: Area under the receiver operating characteristic curve (AUC) Precision Recall F1-score
Conclusion Propose a novel gaze-guided GazeGNN to perform disease classification task GazeGNN can utilize the raw eye-gaze information directly by embedding it with image patch and position information into graph nodes Avoid generating the VAMs, required in mainstream gaze-guided methods Develop real-time, end-to-end disease classification algorithm without preparing the visual attention maps