Ego2Hand Pose: A dataset for Egocentric Two hand 3d global pose estimation
Phuong71699
17 views
20 slides
Oct 07, 2024
Slide 1 of 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
About This Presentation
Ego2Hand Pose
Size: 2.2 MB
Language: en
Added: Oct 07, 2024
Slides: 20 pages
Slide Content
Ego2HandPose: A dataset for Egocentric Two – hand 3D Global Pose Estimation Paper’s authors: Fanqing lin , tony martinez Presenter: Trương thị ngọc phượng
Content Main Contributions Ego2HandPose Dataset ManoFit Algorithm Quantitative Improvements and Cross – Dataset Evaluation.
Main Contributions In this paper, the authors: 1) Ego2HandPose Dataset Extend Ego2Hand dataset by including 3D hand pose annotations, addressing a significant gap in available data for non – laboratory environments. Specially designed for egocentric two – hand 3D global pose estimation using RGB camera in real world settings 2) Development of ManoFit Algorithm Enable 3D hand pose annotation from a single image Allow for automatic conversion of 2D hand poses to 3D Support accurate, temporally consistent two – hand tracking. 3) Quantitative Improvements and Cross – Dataset Evaluation: Demonstrate significant improvements in hand pose estimation accuracy when models are trained on the Ego2HandPose dataset compared to existing datasets. Introduce a synthetic dataset, MANO3DHand. Link: https://github.com/AlextheEngineer/Ego2Hands
3D Two – hand pose estimation datasets
Hand pose estimation datasets RGB-based 3D hand pose datasets STB: Stereo Tracking Benchmark 1 RHD: Rendered Hand Pose Dataset 2 PanHand3D: Panoptic Hand 3 FreiHAND 4 Single – hand 3D pose datasets 1 Two – hand 3D pose datasets 2
Hand pose estimation datasets Two – hand 3D pose datasets 2 RGB2Hand : using RGB-D data & labeled with a depth – based two hand tracker. Annotation can be erroneous 2 ContactPose : use 7 RGB cameras, 3 RGB-D cameras, 1 thermal Camera. Use estimated 2D keypoints and extracted object pose and contact locations 3D hand pose annotation. 3 InterHand2.6M : focus on 2 hand close interaction. 80-140 camera from multiple views. Annotation in 2 stage pipelines. 4 Tzionas Dataset : using RGB-D data & manual annotated finger tips. Less two – hand interaction 1 From the view point of third person Multiple cameras setup Have limited accuracy
Hand pose estimation datasets Two – hand 3D pose datasets 2 RGB2Hand : using RGB-D data & labeled with a depth – based two hand tracker. Annotation can be erroneous 2 ContactPose : use 7 RGB cameras, 3 RGB-D cameras, 1 thermal Camera. Use estimated 2D keypoints and extracted object pose and contact locations 3D hand pose annotation. 3 InterHand2.6M : focus on 2 hand close interaction. 80-140 camera from multiple views. Annotation in 2 stage pipelines. 4 Tzionas Dataset : using RGB-D data & manual annotated finger tips. Less two – hand interaction 1 6 H2O : 5 RGB-D cameras with focus on importance of egocentric data. 5 Ego2Hand : composite two – hand instances at training times with excellent generation to unseen environment. It does not contain hand pose annotation. (egocentric data) From the view point of third person Multiple cameras setup Have limited accuracy Methods trained on these existing datasets cannot generalize to the real world domain. Some applications need egocentric data. Ego2HandPose
Ego2Hand This training dataset consists of: * 188,362 annotated frames in the training set for the right hand. * Flipped images for left hands. * Segmentation masks are obtained by automatically removing the background in a green screen setting. * 22 participants with diverse skin colors and hand features are instructed to perform free hand motion that covers a wide range of location/poses while recording. Data composition for training * randomly selected pairs of right hand images (with one flipped as the left hand) * Use 19,216 images [1] with 14,997 [2][3] for background images. hand scene combination to data augmentation.
Ego2Hand For testing, Ego2Hand Provide a set containing of 8 sequences collected with diverse scenes, lighting and skin tones. Limitation : Ego2Hand does not provide any hand pose annotation . Need an Annotation Tool Main focus of Ego2Hand: two hand segmentation and detection without pose annotations.
Ego2HandPose 1. Egocentric and Real – world focus. 2. Extension of Ego2Hands Dataset: - Extend Ego2Hand by including 3D hand pose annotations - Make it the first to support comprehensive two – hand 3D tracking using a single RGB camera. 3. Dataset Composition and Annotation Process: - From Ego2Hand Dataset, around 9,000 frames were selected, with about 7,000 from the training set and 2000 from test set, ensuring a diversity of hand poses. - The annotation was facilitated by the ManoFit parametric fitting algorithm, allowing for the conversion from 2D to 3D poses and enabling accurate hand tracking with minimal manual input.
Two – hand 3D Global Pose Estimation Pipeline 1) Segmentation & detection : estimate hand bounding boxes and activation energy. This step ensures accurate localization of hands in the image, which is essential for effective pose estimation. 2) 2D & 3D Canonical Hand Pose Estimation : Heat maps are generated from the cropped input to assist in this estimation. The ManoFit algorithm use these heatmaps to compute an initial 3D pose estimation, leveraging the predefined and adjusted parameters of the MANO hand model. 3) ManoFit used for Optimization for Temporal Consistency - Using the estimated 2D and 3D canonical joints location to guide the continuous adjustment of the hand model across sequential frames.
MANOFIT ALGORITHM
Manofit Algorithm A tool developed to fit the MANO Hand Model to 2D joint locations with minimal manual input. Enable 3D hand pose annotation using a single RGB image.
Parametric Fitting Process 1. The algorithm starts with an initial parameter guess (denote P0), which is refined to minimize the difference between the predicted and actual 2D joint locations. 2. It utilizes a supervised approach to adjust the model parameters (𝛼,𝛽,𝛾 α , β , γ )—representing shape, articulation, and global orientation respectively—towards an optimal fit. 3. Loss functions used include 𝐿2𝑑for the sum squared error between observed and projected 2D keypoints , and Lreg for regularization, enforcing physical plausibility in joint rotations.
2D Pose Estimation Results PCK: Percentage of correct keypoints
3D canonical hand pose estimation Result
3D canonical hand pose estimation Result
References [1] F. Lin, C. Wilhelm, and T. Martinez. Two-hand Global 3D Pose Estimation Using Monocular RGB. arXiv preprint arXiv:2006.01320 , 2020. 1 , 2 , 3 , 5 , 6 [2] F. Perazzi , J. Pont-Tuset , B. McWilliams , L. Van Gool , M. Gross, and A. Sorkine -Hornung. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In CVPR , 2016. [3] J. Pont-Tuset , F. Perazzi , S. Caelles , P. Arbel´aez , A. Sorkine - Hornung, and L. Van Gool. The 2017 DAVIS Challenge on Video Object Segmentation. arXiv:1704.00675 , 2017