[NS][Lab_Seminar_240612]Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation.pptx

thanhdowork 75 views 13 slides Jul 04, 2024
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation


Slide Content

Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/06/12 Rongjie Li et al. CVPR 2021

Introduction Figure 1: The illustration of biased scene graph generation and empirical study on Visual Genome. As shown in (D), the baseline (MSDN [23]) performance is dominated by the head categories due to the imbalanced data. We estimate an upper-bound performance by ignoring negative predicate-entity connections during message propagation, as shown in (B). Its performance (shown in (D)) indicates a large room for improvement in context modeling.

Introduction Scene graph generation aims to detect visual objects and their relationships (<subject, predicate, object>) in the image Challenges: Imbalance object and relation distribution => biased relationship prediction Lack of sufficient annotations for many categories Main contributions Introduce bipartite GNN with adaptive message propagation to alleviate the error propagation and achieve effective context modeling Propose bi-level data resampling to achieve a better trade-off between head and tail categories

Method Problem Given an image I, task of SGG aims to parse the input I into scene graph G = {V, E} where V is node set encoding object entities and E is the edge set that represents predicate between an ordered pair of entities

Method Overview Adopt a hypothesize-and-classify strategy for the unbiased scene graph generation First generate a set of entity and predicate proposals, compute their context-aware representations by predicting their categories A bipartite graph network explicitly modes the interactions between entities and their predicates in order to cope with unreliable contextual information from the noisy proposals Develop an adaptive message passing scheme capable of actively controlling the information flows to reduce the noise impact in the graph => generate a robust context-aware representation for each relation proposal

Method A proposal generation network Utilize an object detector network like Faster R-CNN to generate a set of entity and relationship proposals Entity proposals have their categories and classification scores also Relationship proposals are generated by forming ordered pairs of all the entity proposals Given relationship proposals, compute initial representation for both entities and predicates FC geometric feature based on its bounding box semantic feature based on word embedding of its class Relationship proposal from entity i to j: FC convolutional feature of their union region

Method Bipartite Graph Neural Network Figure 2: Illustration of overall pipeline of our BGNN model. RCE denotes the relationship confidence estimation module. CMP denotes the confidence-aware message propagation model. SG Predictor is the scene graph predictor for the final prediction.

Method Bipartite Graph Neural Network Graph consists of 2 groups of nodes, correspond to entity and predicate representations 2 groups connected by 2 sets of directed edges representing information flows G b = {V e , V p , E e->p , E p->e } Develop BGNN on graph, has multi-stage message propagation and each stage consists A relationship confidence estimation module to provide a confidence estimate on relationship A confidence-aware message propagation to incorporate scene context and semantic cues into entity/predicate proposals To reduce noise in context modeling, use relationship confidence estimation module, predicts a confidence score for each relationship proposal to control the information flow in the message propagation multilayer FC network class probabilities for entity from detection Fuse confidence scores into a global confidence score for predicate node sigmoid activation function parameters for fusion

Method Bipartite Graph Neural Network (Confidence-aware Message Propagation) Design 2 types of message passing update , entity-to-predicate message and predict-to-entity message based on edge directions Figure 3: Two kinds of message propagation with confidence gating of bipartite graph neural network. The dashed arrows mean information flow is blocked when the source is uncertain. The different alpha value stands for different aggregation weights for rest connections.

Method Bipartite Graph Neural Network (Confidence-aware Message Propagation) Entity-to-predicate Message Propagation : update representation of predicate node by fusing its neighboring entity nodes: ReLU linear transformation matrix learnable affinity functions of entity and predicate parameters

Method Bipartite Graph Neural Network (Confidence-aware Message Propagation) Predicate-to-entity Message Propagation: entity nodes update to alleviate such noise effect, introduce confidence gating function to control information flow from an entity’s neighbor: learnable threshold parameters gating function For each entity node, divide its neighboring predicates into 2 set: subject and object, update the entity representations by aggregating it neighbors’ messages

Method Scene Graph Prediction Generate scene graph of the given image introduce two linear classifiers to predict the class of the entities and predicates based on their refined representations. Concretely, for each relationship proposal, our classifier integrates the final representation of predicates proposal from BGNN For each entity, introduce learnable weight to fuse the initial visual feature and enhanced features output by BGNN. Final entity classification: weight [0,1] parameters of 2 classifiers

Method Learning with Bi-level Data Sampling Design two-level data sampling strategy integrates the 2 ideas on rebalancing Image-level over-sampling Instance-level under-sampling Figure 4: Illustration of bi-level data sampling for one image. The top row is the instance frequency of head(H), body(B), and tail(T) categories in the image. The middle row shows image-level oversampling with repeat factor N. The bottom row shows the instance-level under-sampling for instances of different categories.