[NS][Lab_Seminar_240819]Re_PolyWorld.pptx

thanhdowork 60 views 18 slides Aug 20, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

Re:PolyWorld - A Graph Neural Network for Polygonal Scene Parsing


Slide Content

Re:PolyWorld - A Graph Neural Network for Polygonal Scene Parsing Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/08/19 Stefano Zorzi et al. ICCV 2023

Introduction Re:PolyWorld is a neural network designed for polygonal scene parsing It improves upon the original PolyWorld model by incorporating edge-aware features and a novel graph representation Applications include building extraction, floorplan reconstruction, and wireframe parsing Figure 1: PolyWorld Remastered (Re:PolyWorld) extracts local vertex and edge features from an intensity image, and embeds global information about the scene by using an Edge-aware GNN. The connections between vertices are generated by solving a differentiable optimal transport problem. By redefining the representation of the polygonal scene, this method becomes a generalized approach that can be applied to a variety of tasks and problem settings.

Problem Statement Challenges Traditional segmentation methods produce pixel-based masks, which are not suitable for applications needing precise vector polygons The original PolyWorld model has limitations, particularly in handling complex scenes where polygons share vertices Need for a more generalized and flexible method for polygon extraction

Method Overview Generalized Polygon Representation: Allows vertices to have multiple connections, essential for complex scenes Edge-Aware Graph Neural Network (GNN): Integrates both vertex and edge information to improve polygon accuracy Optimal Connection Network (OCN): Determines the best connections between vertices using a permutation matrix

Method Graph Construction Vertex Detection: Identify key points (vertices) in the image using a CNN Feature Extraction: Extract features for both vertices and edges Graph Representation: Vertices become nodes, edges represent possible connections GNN Processing: Update vertex and edge features using message-passing and attention mechanisms Optimal Connection: Use the OCN to determine the most likely connections between vertices

Method PolyWorld Remastered Figure 2: Example of a polygonal representation in Re:PolyWorld. Each vertex in the scene has K instances that allow to have multiple connections (in this example K = 2). The connections between vertices can therefore be represented by an adjacency matrix A, where rows and columns sum to K (left side), since unconnected instances can be assigned to the diagonal. The adjacency matrix can be expanded to a permutation matrix that encodes the connection of each vertex instance (right side). In this example, the two instances of each vertex are indicated by the symbols a and b. Note that the permutation matrix is not unique, i.e. multiple equivalent representations of the polygons [v1 ) v5 ) v4] and [v1 ) v2 ) v5] are possible by using different vertex instance combinations.

Method PolyWorld Remastered Represent polygons in an image as a set of vertices, by a corresponding matrix that encodes connection between them This matrix is permutation matrix, allows to train NN by minimizing an optimal transport loss Model is enforced to predict strong edges and allows the generation of precise polygons Problem: If image contains polygons with shared vertices, the edge cannot longer be represented Propose polygonal scene, vertex is represented as set of K vertex, allow vertex have at most K different edges Present edge-aware NN composed of 3 blocks: Graph Encoder Network (GEN): detect salient vertices in image and generate visual descriptor for nodes and edges Edge-Aware GNN (EA-GNN): reason graph and update node, edge Optimal Connection Network (OCN): perform vertex matching in optimal way

Method Graph Encoder Network Figure 3: Overview of Re:PolyWorld. The Graph Encoder Network extracts a set of local vertex and edge features from the intensity image. An edge-aware Graph Neural Network embeds global information of the scene by analyzing the extracted vertex and edge representations. The Optimal Connection Network generates connections between vertices, encoded in a permutation matrix, by solving a linear sum assignment problem.

Method Graph Encoder Network Image I R3*H*W as input Feature map F RD*H*W using CNN backbone Predict vertex direction map Y RH*W by propagating F through pixel-wise projection, generate set of N 2D-position pi top-N detected peaks after Y with Non-Maximum Suppression layer Vertex and edge encodings Edge descriptor M descriptors obtained by sampling feature map F from position pi to pj, linear projection

Method Graph Neural Network Local vertex and edge feature (visual appearance, pose, vertex angles, edge discontinuities) are encoded by CNN Edge-Aware Self-Attention Network Information is propagated along edges via message passing The state message relative to i-th vertex Edge updated using global information Attentional aggregation Attentional weight scale factor related to edge contribute, = E(ei->j)

Method Graph Neural Network Local vertex and edge feature (visual appearance, pose, vertex angles, edge discontinuities) are encoded by CNN Edge-Aware Self-Attention Network Positional refinement: estimate positional offset for every vertex: Maximum radius of r to obtain refined positions

Method Optimal Connection Network Connect the detected vertices by generating a permutation matrix Module evaluate connection strength between each pair of vertex instances computing a score matrix Given matching descriptors , evaluate whether i-th vertex embeds instances clockwise-connected to j-th vertex by calculating intra-vertex scores Block matrix Score matrix Apply Sinkhorn algorithm to solve linear problem

Experiments Building extraction Table 1: MS COCO [11] results on the CrowdAI test dataset [13] for building detection. The results of PolyWorld and the Remastered are calculated using the refinement offset for the vertex positions. FFL refers to the Frame Field Learning [6] method. “simple poly” refers to the Douglas–Peucker polygon simplification [4], and “ACM poly” refers to the Active Contour Model [6] polygonization method. Table 2: Intersection over union (IoU), max tangent angle error (MTA) [6], and complexity aware IoU (C-IoU) [30] results on the test-set of the CrowdAI dataset [13].

Experiments Building extraction Figure 4: Examples of building extraction and polygonization on the CrowdAI [13] test dataset by using Frame Field Learning [6] with ACM polygonization, and PolyWorld Remastered.

Experiments Floorplan & outdoor reconstruction Figure 5: Floorplan reconstruction results on the S3D dataset [27] by HEAT [9] and Re:PolyWorld. “Offset off” refers to the result without using the positional refinement offset. “Offset on” refers to the full method. Figure 6: Outdoor building reconstruction task [9]. Results obtained by HEAT [9] and Re:PolyWorld. Table 3: Floorplan reconstruction results on the Structured3D dataset [27].

Experiments Wireframe parsing Figure 7: Qualitative evaluation of wireframe detection on the Wireframe Parsing dataset [29] by HAWPv2 [25] and Re:PolyWorld. The wireframes visualized for the HAWP method have score > 0.9. The wireframes detected by Re:PolyWorld are obtained by solving the linear sum assignment problem. Table 4: Results on the Wireframe parsing dataset [29]. The average number of ground truth lines is listed in the last column. The number of proposals of Re:PolyWorld is the average number of lines generated by solving the linear sum assignment problem.

Limitation Relies on high-quality annotations for training Noisy or incomplete annotations can lead to missing vertices or incomplete polygons Requires careful tuning of parameters for different applications

Conclusion Remaster and improve version of PolyWorld, a NN generate polygons by extracting object vertices from an image and connecting them optimally by solving optimal transport problem Extract local visual descriptors of both edges and nodes from image, incorporates global cues and priors by using GNN Propose edge-aware attention mechanism as central part of model => generate more precise polygons Re:PolyWorld not only outperforms PolyWorld in extracting building from aerial images but also reach SOTA in diverse other tasks: floorplan reconstruction, wireframe parsing, architecture reconstruction