Panoptic Segmentation @CVPR2019

683 views 39 slides Jul 22, 2019
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Panoptic Segmentation was introduces at ECCV 2018 by FAIR (Facebook AI Research). It's gaining the popularity and there were a few papers presented at one of the biggest computer vision conference, CVPR 2019. This slide contains the descriptions of panoptic segmentation networks presented at CVP...


Slide Content

Panoptic Segmentation
@CVPR2019


22nd July 2019

AI System Group
Kosuke Kuzuoka

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●Profile
○Kosuke Kuzuoka
○22 years old
●Experience
○June 2018 - Present
AI Research Engineer at DeNA Co., Ltd.
○March 2017 - June 2018
R&D manager at CONCORE’S, inc.
●Interests
○Self Driving Cars
○Computer Vision
Who I am
Facebok Github LinkedIn

Copyright © DeNA Co.,Ltd. All Rights Reserved.

Copyright © DeNA Co.,Ltd. All Rights Reserved.

Panoptic Segmentation
Semantic Segmentation can:
-Segment instances
without boundaries
-Segment every pixel in the
input image
Instance Segmentation can:
-Segment instance class
with boundaries
-Segment object in the RoI
(Region of Interest)
A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dolla ́r. Panoptic segmentation

Copyright © DeNA Co.,Ltd. All Rights Reserved.

Panoptic Segmentation
Every instance that belongs to things (people,
cars, etc.) needs to be identified (instance
segmentation), while every class that belongs to
stuff class (sky, road, etc.) needs to be correctly
classified (semantic segmentation)
A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dolla ́r. Panoptic segmentation

Copyright © DeNA Co.,Ltd. All Rights Reserved.
What’s the challenge in this task?

Copyright © DeNA Co.,Ltd. All Rights Reserved.

Panoptic Segmentation
●FCN (Fully Convolutional Network) and DC
(Dilated Convolution) are widely used in high
precision semantic segmentation networks
●Each pixel is classified by producing the output
feature map with the same image shape, except
the depth channel
●The first part of the network produces class
agnostic boxes (RoIs), which then will be
classified by the second part of the network
●Box refinement and pixel classification will be
applied for each RoI produced by the RPN
(Region Proposal Network)
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional
networks for semantic segmentation
K. He, G. Gkioxari, P. Dolla ́r, and R. Girshick. Mask R- CNN

Copyright © DeNA Co.,Ltd. All Rights Reserved.

Panoptic Segmentation
●FCN (Fully Convolutional Network) and DC
(Dilated Convolution) are widely used in high
precision semantic segmentation networks
●Each pixel is classified by producing the output
feature map with the same image shape, except
the depth channel
●The first part of the network produces class
agnostic boxes (RoIs), which then will be
classified by the second part of the network
●Box refinement and pixel classification will be
applied for each RoI produced by the RPN
(Region Proposal Network)
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional
networks for semantic segmentation
K. He, G. Gkioxari, P. Dolla ́r, and R. Girshick. Mask R- CNN
Panoptic segmentation network is difficult
to design, because the architectures differ!

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Is there any dataset for this task?

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●Cityscapes
○5000 images (2975 train, 500 val and 1525 test)
○19 classes (8 thing classes and 11 stuff classes)
●ADE20k
○25k images (20k train, 2k val and 3k test)
○150 classes (100 thing classes and 50 stuff classes)
●Mapillary Vistas
○25k images (18k train, 2k val and 5k test)
○65 classes (37 thing classes and 28 stuff classes)
Panoptic Segmentation

Copyright © DeNA Co.,Ltd. All Rights Reserved.
New task, new evaluation metric!

Copyright © DeNA Co.,Ltd. All Rights Reserved.

Panoptic Feature Pyramid Network
Any prediction that
has an IoU with a GT
object greater than
0.5 is considered a
TP
Class prediction needs to
be the same as the GT
class, hence it’s an FP
RQ (Recognition Quality) is the F1
score for the instance segmentation
network, while SQ (Semantic
Quality) is the mIoU of the TP
segments.
A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dolla ́r. Panoptic segmentation

Copyright © DeNA Co.,Ltd. All Rights Reserved.

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●An end-to-end panoptic segmentation network proposed by FAIR (Facebook AI
Research)
●Used Mask R-CNN for a semantic segmentation task by attaching a newly
proposed branch, a semantic branch.
●Recorded high competitive precision when compared to other single panoptic
segmentation networks with less memory usage
Panoptic Feature Pyramid Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
What are the motivations?

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●Most panoptic segmentation networks rely on separated backbone networks,
due to the network architecture difference (not end-to-end)
●Because backbone networks are separated, they don’t share weights, hence
the inference takes too much time
Panoptic Feature Pyramid Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●Most panoptic segmentation networks rely on separated backbone networks,
due to the network architecture difference (non end-to-end)
●Because backbone networks are separated, they don’t share weights, hence
the inference takes too much time
Panoptic Feature Pyramid Network
Solve semantic segmentation tasks with instance
segmentation network with simple modifications!
SOLVED
Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. Panoptic Feature Pyramid Networks

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Panoptic Feature Pyramid Network
Each RoI is used for
classification and pixel
segmentation by the instance
segmentation branch
Feature maps from FPN are
used for pixel level
classification by the semantic
segmentation branch
Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. Panoptic Feature Pyramid Networks

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Panoptic Feature Pyramid Network
●ResNet FPN as the backbone network
●Backbone network is pre-trained on ImageNet dataset
●Output strides are set 32, 16, 8 and 4
●Feature maps are used for both the instance segmentation
branch and the semantic segmentation branch
256 x 1/32
256 x 1/16
256 x 1/8
Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. Panoptic Feature Pyramid Networks

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Panoptic Feature Pyramid Network
●RoI pooling, box refinement and pixel level segmentation
are applied for each RoI from the RPN
●Same design as the original Mask R-CNN
●The goal of this branch is to produce a single
feature map by merging different sized
feature maps
●3x3 conv, GN, ReLU and bilinear
interpolation are used to make feature maps
become the same size and depth
●Feature maps are added by using
element-wise addition, and finally 1x1 conv,
bilinear interpolation and softmax are
applied for pixel level classification
Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. Panoptic Feature Pyramid Networks

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●Evaluated mIoU for semantic segmentation tasks using Mask R-CNN +
semantic branch (Semantic FPN)
●Evaluated mIoU and AP for semantic and instance segmentation tasks using
Mask R-CNN + semantic / instance branch
●Evaluated PQ and compared to other single panoptic segmentation networks
Panoptic Feature Pyramid Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Experiments?

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Panoptic Feature Pyramid Network
●Semantic FPN performed competitive results on Cityscapes and MS COCO datasets with less memory usage
●The results suggest that instance segmentation network architecture can be transformed into a semantic
segmentation network with a relatively small change
●Because semantic FPN doesn’t use DC (Dilated Conv), it is more efficient than other networks which use DC
Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. Panoptic Feature Pyramid Networks

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Panoptic Feature Pyramid Network
●Balancing the parameter λ is important for the end-to-end network to perform well on both instance and
semantic segmentation tasks
●The results suggest that if λ is set properly, the instance segmentation results benefit from the semantic
segmentation network and vice versa
Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. Panoptic Feature Pyramid Networks

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Panoptic Feature Pyramid Network
●The proposed network outperformed other single panoptic segmentation networks by a large margin on both
Cityscapes and MS COCO
●The margin of thing classes is more significant than the stuff classes, due to the fact that Panoptic FPN is
basically Mask R-CNN with a semantic branch
Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. Panoptic Feature Pyramid Networks

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Panoptic Feature Pyramid Network
Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár. Panoptic Feature Pyramid Networks

Copyright © DeNA Co.,Ltd. All Rights Reserved.
One more panoptic segmentation network!

Copyright © DeNA Co.,Ltd. All Rights Reserved.

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●An end-to-end panoptic segmentation network from Uber ATG
●Mask R-CNN is used for instance and semantic segmentation by attaching a
semantic segmentation head to it
●Outputs from the instance segmentation head and semantic segmentation
head are merged by a newly proposed head, called the panoptic head
●Achieved higher PQ when compared to other panoptic segmentation networks
on COCO and Cityscapes datasets
UPSNet: A Unified Panoptic Segmentation Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
UPSNet: A Unified Panoptic Segmentation Network
●Like Panoptic FPN, UPSNet uses a single
backbone network
●ResNet50 FPN is used for the backbone
network
●The output stride of FPN is 4, 8, 16 and 32
●These feature maps are used for the instance
head and semantic head
Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. UPSNet: A Unified Panoptic Segmentation Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
UPSNet: A Unified Panoptic Segmentation Network
●Deformable conv is used on the output of FPN
●Upsampling is used to make all feature maps
be the same size after DC
●Concat is applied followed by 1x1 conv to
classify each pixel
●The goal of the semantic head is classifying
every pixel in the image, while not affecting
thing class predictions
●Cross entropy loss and RoI loss are used for
the semantic head
●Thing classes will be classified and detected
by this branch, just like as in the Mask R-CNN
●The goal of this head is classifying thing
classes by extracting instance-aware features
Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. UPSNet: A Unified Panoptic Segmentation Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
What’s new in this network?

Copyright © DeNA Co.,Ltd. All Rights Reserved.
UPSNet: A Unified Panoptic Segmentation Network
●Xstuff from the semantic branch is used to classify stuff classes, and directly
mapped to panoptic logits, Z
●Xmask is retrieved by cropping Xthing from the semantic branch with GT’s
bounding box region
●Output of the instance branch (Yi) is added with Xmask to get a pixel level
classification result
●Class category has been determined by taking argmax on channel axis
Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. UPSNet: A Unified Panoptic Segmentation Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●Evaluated on MS COCO and Cityscapes dataset using PQ
●Compared with other panoptic segmentation networks on Cityscapes dataset
●Compared with ensemble panoptic segmentation network model using MS
COCO dataset
UPSNet: A Unified Panoptic Segmentation Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
UPSNet: A Unified Panoptic Segmentation Network
●UPSNet performed competitive results on
the COCO dataset on the figure above
with significantly fewer parameters
(almost half, as mentioned in the paper)
●Even though other networks use the
ensemble technique, UPSNet resulted in
competitive PQ on MS COCO dataset
(below figure)
●The thing class especially benefitted from
Mask R-CNN architecture
Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. UPSNet: A Unified Panoptic Segmentation Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
UPSNet: A Unified Panoptic Segmentation Network
Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. UPSNet: A Unified Panoptic Segmentation Network

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Let’s summarise!

Copyright © DeNA Co.,Ltd. All Rights Reserved.
●Panoptic segmentation is a relatively new task, and gaining popularity more
and more over the years
●Panoptic segmentation networks will be available in PyTorch in a future
release
●Large scale datasets are publicly available for panoptic segmentation networks
(MS COCO, Cityscapes etc.)
Summary

Copyright © DeNA Co.,Ltd. All Rights Reserved.
Thanks!