Ground truth for monocular localization CL25_0763.pdf

houston3dgraphics 3 views 20 slides Mar 11, 2025
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Ground truth for monocular localization


Slide Content

MLGTT:
An Open-Source
Tool to Generate
Camera-Relative
Ground Truth for
Monocular
Localization
Jorge Enriquez
1
, Philip Bailey
1
, Tu-Hoa Pham
1
,
and Kyle Dewey
2
The decision to implement Mars Sample Return will not be finalized until
NASA’s completion of the National Environmental Policy Act (NEPA) process.
This document is being made available for information purposes only.
Copyright 2025 California Institute of Technology. U.S. Government
sponsorship acknowledged.
1
NASA Jet Propulsion Laboratory, California
Institute of Technology
2
California State University, Northridge

jpl.nasa.gov
Introduction
-We have developed algorithms for autonomous vision-based
object localization on Mars
-Goal: given a single image as input, estimate the pose of a given
object with respect to the camera
-To evaluate the accuracy of such algorithms, we need, for a given
image:
-The pose estimated from the localization algorithm
-The ground truth pose of the object (defined or obtained
independently)
-Problem: getting ground truth is difficult
-Fiducials are most common but also cumbersome to use (Vicon, April
Tags, laser-based metrology…)
-They can also distract the localization algorithm as they appear in the
camera field of view
-We propose a Monocular Localization Ground Truth Tool (MLGTT)
that makes it easy to calculate a ground truth pose by manually
annotating images with select feature points
3/9/2025 Reviewed and determined not to contain CUI 2

jpl.nasa.gov
Acronyms [1]
•PnP: Perspective N Point
•Algorithm that takes a subset of 2d points in the image, 3d points in the world, and a camera
model, calculating a best fit camera position that allows those rays to coincide
•RANSAC: Random Sample Consensus
•Sampling method that randomly selects a small subset of 2d points, just large enough to run PnP
(about 3 points)
•Creates a model and checks how many inliers fit in the estimated model.
•Iterates over a parametrized number (10,000)
•Inliers / Outliers
•After calculating a camera pose, all 3d test coordinates are reprojected into the image such and
compared to their corresponding 2d coordinate
•The error in pixels between the resulting projected 2d point and the original input 2d point is
compared against a parameterizable reprojection error defined in pixels.
•Points that have a delta of less than the reprojection error are considered Inliers, and the rest are
considered Outliers
3/9/2025 Reviewed and determined not to contain CUI 3

jpl.nasa.gov
Acronyms [2]
•TSM: Template Synthesis (and) Matching
•The current primary localization algorithm for SRL’s STS Vision System that takes in a 3d model,
renders an image of said model in the expected current pose relative to the camera, then locally
searches for predefined patches (templates) from the rendered image in a larger area in the test
image
•Template 2d coordinates on the rendered image, have a corresponding 3d depth value provided by
the renderer. These 2d (Test Image) to 3d (Rendered Image) correspondences are fed to PnP to
generate a pose
3/9/2025 Reviewed and determined not to contain CUI 4

jpl.nasa.gov
Motivation [1]
•In simulation, we have
perfect ground truth poses
associated to rendered
images
•Running TSM and comparing
estimated poses to perfect
ground truth results in
extremely low errors:
•0.009 mm normal
translational
•0.09 mm lateral
translational
•0.1 mrad in-plane rotation
•1 mrad out-of-plane rotation
3/9/2025 Reviewed and determined not to contain CUI 5

jpl.nasa.gov
Motivation [2]
We also collected real images on
a physical testbed, with ground
truth provided from April Tags.
•Running TSM on these images
resulted in estimated poses
that qualitatively look correct
(good overlap between real
and rendered images), but
quantitatively were poor
compared to simulated results
• 2 mm Normal Translation
•5.3 mm Lateral Translation
•2.2 mrad in-plane rotation
•9.3 mrad out of plane rotation
3/9/2025 Reviewed and determined not to contain CUI 6
Question: could this stem from poor ground truth rather than poor localization
accuracy?

jpl.nasa.gov
MLGTT: reference and test image loading
3/9/2025 Reviewed and determined not to contain CUI 7
Prerequisite: define a set of
reference points in the object
to localize (e.g., 3D coordinates
in the CAD)
Then load:
•A reference image of the
object to localize, annotated
with the 3D points to select
•The test image to localize

jpl.nasa.gov
MLGTT: reference point selection
3/9/2025 Reviewed and determined not to contain CUI 8
In the test image,
manually click the
reference points
defined previously
We now have:
•3D reference
coordinates
from the CAD
•2D pixels where
these points are
in the image
This lets us
calculate the
camera to object
pose using PnP

jpl.nasa.gov3/9/2025 Reviewed and determined not to contain CUI 9
MLGTT: pose estimation
By solving PnP, we
have:
•A pose estimate
between camera
and object
•The set of points
that contributed
to this
calculation
(inliers, green)
•The set of points
that reprojected
outside a
chosen
threshold
(outliers)

jpl.nasa.gov
Render Images
Visualization features [1]
•If a rasterizer tool is present,
GTT can send commands to
generate image
•Rendered Image can be
toggled, displaying the
original image and the
rendered image like a gif.
•Rendered Image is
automatically saved with
original image
3/9/2025 Reviewed and determined not to contain CUI 10

jpl.nasa.gov
Slider Tool
Visualization features [2]
•Takes two images and
overlays them in a way that a
slider can blend seamlessly
•Needs rasterizer to produce a
rendered image if utilized
within the GTT
•Can run independently
3/9/2025 11
Pre-Decisional Information – For Planning and Discussion Purposes
Only
Reviewed and determined not to contain CUI
Reviewed and determined not to contain CUI
(this is a video)

jpl.nasa.gov
Save Files
Pose and metrics export
•Saves points (Reference, Test,
Projected)
•Saves the transformation
matrix generated by PnP
•Saves the current canvas on
display
3/9/2025 12Reviewed and determined not to contain CUI

jpl.nasa.gov
Results [1]
•Errors now in family with simulated
results:
•0.3 mm normal error; 2 mm lateral;
28 mrad in out of plane
Douglass (low fidelity) Mars 2020
3/9/2025 13
•We now have vision-based ground
truth when until now we only had
pose information from encoders and
forward kinematics
Reviewed and determined not to contain CUI

jpl.nasa.gov
Results [2]
Hi-Fidelity Douglass: all results in family and within
requirements
3/9/2025 14
Reviewed and determined not to contain CUI

jpl.nasa.gov
Conclusion
•We introduced a new tool for manual ground-truthing of vision-
based localization that is easy to use and easy to extend
•Currently ongoing open-sourcing process
•Reach out to [email protected] for questions and
updates
3/9/2025 15
Reviewed and determined not to contain CUI

jpl.nasa.gov
3/9/2025 16
Reviewed and determined not to contain CUI

jpl.nasa.gov
Prerequisites
Perspective N Point
Camera Matrix:
3D Points (w/r the world):
2D Points (w/r the frame):
Rotation / Translation Matrix:
3/9/2025 17
Pinhole Camera
Given by CAD Model (3D Reference Points)
Given by GTT (2D Test Points)
Goal
Reviewed and determined not to contain CUI

jpl.nasa.gov
Requirements
3/9/2025 18
Reviewed and determined not to contain CUI

jpl.nasa.gov
Other Images
3/9/2025 19
Reviewed and determined not to contain CUI

jpl.nasa.gov
Other Images
3/9/2025 20
Reviewed and determined not to contain CUI