[NS][Lab_Seminar_240902]MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection.pptx

thanhdowork 59 views 18 slides Sep 02, 2024

Slide 1 of 18

About This Presentation

MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection

Size: 1.81 MB

Language: en

Added: Sep 02, 2024

Slides: 18 pages

Slide Content

MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/09/02 Guiping Cao et al. IJCAI 2024

Introduction Background: Object detection is essential in CV for identify bounding boxes and categories in images Traditional detectors rely on convolutional networks and complex pipelines Transformer-based detectors like DETR simplify these pipelines but face issues with query distribution and box-sensitive category predictions Problem: DETR-like models suffer from box-sensitive category predictions and imbalance spatial distribution of queries

Introduction Problems Figure 1: Comparison of different models of AP w.r.t the training epochs and parameters on val2017 of COCO. MLP-DINO gets the best performance under different training epoch settings

Introduction Problems Figure 2: The models with (-w-) QICS achieves both higher scores of categories prediction and boosted performance than that of without (-wo-) utilizing QICS on val2017 of COCO dataset. The DINO and MLP-DINO model use ResNet50 and Strip-MLP-TSWMP as the backbone, respectively

Introduction Problems Figure 3: Example of different spatial distributions of 20 query points on 5 objects (4 classes marked with different shapes) within an image. (a) Queries are selected only by the predicted confidence score, with many query points (in red color) clustering in a single object (e.g. indicated by the rectangle). (b) By representing query points as nodes in a graph, we incorporate spatial information into the query selection process, getting updated distribution of query points (in black color) and enabling higher hit-rate of queries to objects, as presented in Table 5.

Proposed model Key Innovations Query-Independent Category Supervision (QICS): Decouples category prediction from bounding box regression Deep MLP Backbone: integrate an MLP-based model into a transformer framework to handle both long-range and short-range information Graph-based Query Selection (GQS): use graph representation to balance the spatial distribution of queries, improving query hit-rate

Method Overall Figure 4: The overall architecture of the proposed MLP-DINO. b1 ∼ b4 and e1 ∼ e4 represent the output features from the deep MLP backbone and transformer-encoder, respectively. These features are at different levels and resolutions

Method Query-Independent Category Supervision (QICS) Motivation: Category prediction in DETR-like is often influenced by the accuracy of bounding box predictions QICS approach: Global classification module (GCM) predicts categories independent of queries Use global average pooling on backbone features to decouple category prediction from the box regression process

Method Graph-based Query Selection (GQS) Motivation: Standard query methods often lead to clustered queries, reducing hit-rate on objects GQS Approach: Represent query points as nodes in a graph Incorporate spatial information through a Coefficient of Variation (CV) based metric to distribute queries more effectively Improve query hit-rate by ensuring queries cover a broader range of potential objects

Method SWMP for Deep MLP Challenge: MLP models are sensitive to input image size due to their reliance on fully-connected layers SWMP Solution: The Sharing Weight on Mini-Patch (SWMP) method allows MLP models to handle images of arbitrary size Crop images into mini-patches, apply MLP layers with shared weights, and reassembles the image

Experiments Experimental Settings COCO2017 dataset 118k training images 5k validation images AdamW optimizer Weight decay 1*10-4 Batch size 8

Experiments Main Results Table 1: Comparison results of MLP-DINO with other popular detection models under different backbones and training epochs on val2017 of COCO. Models of DINO and MLP-DINO adopt 4 scales of feature maps from the backbone network.

Experiments Ablation Study - Ablation on All Components of MLP-DINO Table 2: Ablation results on all components of the MLP-DINO

Experiments Ablation Study - Ablation on Different Backbones in DINO Table 4: The ablation comparison of different backbones in DINO. The type of S and L represent short-range and long-range information built by the model, respectively. The number of parameters and FLOPs of Strip-MLP-T model is fewer than ResNet50 and Swin-T

Experiments Ablation Study - Effectiveness of GQS on Hit-Rate Table 5: Ablations of GQS on different backbone models

Experiments Ablation Study - Ablation of QICS on Different Features Table 6: Ablation results of applying QICS to different part of backbone features and encoder features of MLP-DINO

Experiments Ablation Study - Ablation of the Queries Pool Size for GQS Table 7: The ablation results of different queries candidate pool size of λ for GQS. The backbone model is Strip-MLP-SWMP

Conclusion MLP-DINO integrate MLPs into transformer-based detection models, enhancing category prediction and query distribution QICS and GQS methods significantly improve object detection performance Explore in other tasks like segmentation Further research on improving transformer-based models using MLPs

[NS][Lab_Seminar_240902]MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

[NS][Lab_Seminar_240902]MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd