[NS][Lab_Seminar_240609]Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud.pptx

thanhdowork 104 views 18 slides Jul 04, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud


Slide Content

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/06/09 Weijing Shi et al. CVPR 2020

Introduction Downstream task: 3D object detection in a point cloud A point cloud composes set of points in space is a widely-used format for 3D sensors such as LiDAR Problem: Points cloud are unstructured, making traditional grid-based convolution methods inefficient for processing them directly Point-GNN is a solution

Method

Method Graph Construction Point cloud: a collection of points in 3D space, where each point has coordinates (x, y, z) and potentially additional attributes like color or intensity Given a point cloud P, graph G = (P, E), with fixed radius near-neighbors search problem Apply voxel downsampled point cloud to reduce density of point cloud Preserve information within the original point cloud => encode the dense point cloud in the initial state value of the node => search the raw points within radius of each node, use NN to extract their features Follow [10][23] embed the lidar reflection intensity and relative coordinates using MLP then aggregate them by Max function

Method GNN with Auto-Registration where e t and v t are edge and node features from the t th iteration, f t (.) computes the edge feature between 2 nodes. p(.) set function which aggregates the edge features for each node, g t (.) takes the aggregated edge features to update node features, output is node features or repeats the process Redefine a node’s state using its neighbor’s states Use relative coordinates of the neighbors as input to f t (.) => induce translation invariance against the global shift of the point cloud In local structure, propose aligning neighbor’s coordinates by their structural features instead of the center node coordinates , due to center node already contains some structural features from the previous iteration, use it to predict an alignment offset, propose auto-registration mechanism

Method GNN with Auto-Registration where x is the coordination offset for nodes to register their coordinates. ht(.) calculates the offset using center node state value from previous iteration A single iteration in the graph After T iterations of GNN, use node state value to predict both category and the bounding box of object where the node belongs Classification branch MLPcls computes a multi-class probability Localization branch MLPloc computes a bounding box for each class

Method Loss For object category, classification branch computes a multi-class probability distribution for each node M is total number of object classes If node is within a bounding box of an object, assign the object class to node If node is outside any bounding boxes, assign background class Use average cross-entropy loss as classification loss For object bound box, predict it in 7 degree-of- freedom format b = (x,y,z,l,h,w,0) where (x,y,z) represent the center position of the bounding box, (l,h,w) represent the box length, height, width, 0 is the yaw angle, encode the bounding box with node coordinates If node is within bounding box, compute the Huber loss If node is outside bound boxes or it belongs to a class that do not need to localize, set loss = 0

Method Loss For object bound box, predict it in 7 degree-of-freedom format b = (x,y,z,l,h,w,0) where (x,y,z) represent the center position of the bounding box, (l,h,w) represent the box length, height, width, 0 is the yaw angle, encode the bounding box with node coordinates If node is within bounding box, compute the Huber loss If node is outside bound boxes or it belongs to a class that do not need to localize, set loss = 0 Prevent over-fitting, add L1 regularization to each MLP, total loss: where α,β,у are constant weights to balance each loss

Method Box Merging and Scoring Multiple nodes can be on the same object => need to merge these bounding boxes into one and assign the confidence score Non-maximum suppression (NMS) is applied, select the box with the highest classification score and suppress the other overlapping boxes Classification score not always reflect localization quality, a partially occluded object can have a strong clue indicating the type of the object but lacks enough shape information Propose merged box by considering the entire overlapped box cluster, consider median position and size of the overlapped bounding boxes Compute the confidence score as the sum of classification scores weighted by IoU factor and occlusion factor Occlusion factor represents the occupied volume ratio Given a box bi, let li, wi, hi be its length, width, height and vil, viw, vih be unit vectors that indicate their directions, xj are the coordinates of point pj, the occlusion factor oi:

Method Box Merging and Scoring

Experiments Dataset KITTI object detection benchmark 74812 training samples and 7518 testing samples Each samples provides both point cloud and camera image Evaluate average precision of 3 types of objects: car, pedestrian and cyclist Follow [10][23][19][21]

Experiments Results

Experiments Results

Experiments Alation study

Experiments Alation study

Experiments Alation study

Experiments Alation study

Conclusion Point-GNN detect 3D objects from a graph representation of the point cloud Using graph representation , encode the point cloud compactly without mapping to a grid or sampling and grouping repeatedly Achieve the leading accuracy in both 3D and Bird’s Eye View object detection Proposed auto-registration mechanism reduces transition variance, box merging and scoring operation improves the detection accuracy