Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

SafeSoftware 144 views 92 slides May 08, 2024
Slide 1
Slide 1 of 92
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92

About This Presentation

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar.

In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support fo...


Slide Content

Cloud Frontiers: A Deep
Dive into Serverless
Spatial Data and FME

Dean
Hintz

Technical Support Team
Lead, Strategic Solutions
Safe Software
Kailin
Opaleychuk

Technical Support Specialist,
FME Form
Safe Software

Welcome to Livestorm.
A few ways to engage with us during the webinar:


Audio issues? Click this for 4 simple
troubleshooting steps.

Agenda
1Introduction
2What is Cloud Native?
3Perspectives from Radiant Earth, Planet
4STAC & COGs
5FlatGeoBuf
6COPC & Zarr
7GeoParquet
8Lessons learned
9Q&A
Agenda

Poll:

What’s your leading motive for
considering the use of cloud native?

1
Introduction

W
●Cloud native formats = cloud-optimized
●Specifically designed to optimize the storage, access, and processing of
geospatial data in cloud computing environments
●Supports data chunking, indexing, tiling and targeted metadata to minimize
response footprint
●Optimize access by thin web clients, whether browser or mobile base
●Partial & Parallel reads
●Read just what you need
What does Cloud-Native Mean?
Introduction

●Lazy access and intelligent subsetting
●Integrates well with high level analysis and distributed systems
●Scalable performance - increases the applicability of cloud-scale tools
●Decreases the burden and costs for data providers
●Tailored to leverage the scalability, flexibility, and processing power of cloud
infrastructure, enabling efficient handling of large data volumes.


Article: Cloud native data formats


Benefits of Cloud-Optimized Data
Introduction

Data Inspector COG Example: Canada DEM
COG Canada DEM on S3: full width, lowest zoom by default
Search envelope & CRS needed to limit request:
●CRS of dataset on server
●Search envelop CRS - same units as extent, can be
different than source dataset
●Pyramid level to read. In this case level 1 = 30m resolution

Data Inspector COG Example: Canada DEM
2 seconds to access DEM for all of Canada from 31GB COG source dataset
level 10 = 16 km x 16 km grid cells

Data Inspector COG Example: Canada DEM
4 seconds to access 30m DEM for all Fraser Valley from 31GB COG source dataset

Part 1
Recap
Chris Holmes

VP of Product, Strategy, Partnerships
Planet

Michelle Roby

Developer Advocate
Radiant Earth

Cape Town, South Africa • March 19, 2017
Planet / Cloud Native Geo Foundation / Taylor Geospatial Engine
Cloud Native Geospatial Origins

Chris Holmes

About Radiant Earth
About:
●An incubator of data-driven initiatives, services, and 21st century institutions needed to
foster shared understanding of our world
Initiatives:
●Cloud-Native Geospatial Foundation → Aim to increase adoption of highly efficient
approaches to working with geospatial data on the Internet.
●Source Cooperative → Data publishing utility for easy data sharing over the web.

What makes cloud-optimized challenging?
From Task 51 Study:
“There is no
one-size-fits-all
packaging for data, as
the optimal packaging is
highly use-case
dependent.”

Authors: Chris Durbin, Patrick Quinn, Dana
Shum

New Cloud-Native Format Support
Format Support Version Available
Cloud Optimized Geotiff R / W 2023.0
Cloud Optimized Point Cloud R / W 2023.1 / 2023.2 (2024.0)
FlatGeoBuf R / W 2023.0
GeoParquet R / W 2023.1
SpatioTemporal Asset Catalog
(Metadata + Asset)
R 2024.0 (FME Hub)*
ZARR R / W 2023.1 (2024.0)

2
STAC
(SpatioTemporal
Asset Catalog)

●Spatio-Temporal Asset Catalog
is a format that stores cloud-based
assets that relate to a
geographic area or time.
●The assets are templated in a
JSON catalog/collection.
●Supports raster and vector
assets
○For example, a STAC
Collection can have Assets
that store geopackage layers
or COG bands as asset
items.




STAC

STAC Package (FME Hub)
-STAC Package V2.1.1 now available on the FME Hub.
-STAC Metadata Reader*
-STAC Asset Reader
-V2.0.0 requires FME 24.0 minimum build 24134

STAC Metadata Reader
Images demonstrating how to use
the STAC Metadata Reader to dig
down into a STAC Collection
https://spot-canada-ortho.s3.amazonaws.com/catalog.json
Catalog > Collection > Item > Asset

Slide Title
Consume a
GeoTIFF in
STAC and
convert to Cloud
Optimized
GeoTIFF

Goal Key Result
Working with STAC Asset Reader in FME Form
Use the FME
platform to refine
and translate data
from one location
to another

Output Cloud
Optimized
Geotiff ready for
further analysis
on S3

Demo

●Use raster transformers to post-process STAC assets
○Combining raster bands
○Setting & removing no data
●FME’s S3Connector can publish COGs to the cloud



Demo Summary
Removing no data
FME Form Workspace

Demo Results

FME & STAC Overview
●One set of item assets can be read or
accessed by a single reader feature type
●STAC Metadata Reader can be used to
filter and select assets of interest
●Coming Soon
○The ability to access authentication
required assets
○Pre-defined popular STAC API
options to improve usability

3
COGs
(Cloud Optimized
GeoTIFFs)

●Supports raster data
●Built off the GeoTIFF
specification, which offers
functionality for
compression and
pyramiding
●Benefits from partial reads,
through MinMax extents or
clipping.



COG

COG Reader
●Search Envelope
●Pyramid level options

COG Writer
●Writer feature type
○Compression
○Layout: Cloud
Optimized Tiles
○Pyramid level
options

COG Reader in FME Form
https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/36/Q/WD/2020/7/S2A_36QWD_20200701_0_L2A/TCI.tif

COG Reader - Search Envelope
Reading entire dataset
Reading with Search Envelope constraint

Slide Title
Create an
insightful report
on recent fires
West of Kelowna

Goal Key Result
Current Fire Mapping for West Kelowna
Use transformers
to extract, combine
& reformat data

An interactive
HTML report
with embedded
images and links

Demo

Demo Results

●FlatGeoBuf and COG readers support
spatial filter operations
●Use polygon mask to refine points on
Nodata areas
●XMLTemplater can be used to help format
HTML elements, such as tables


Key Demo Takeaways

4
FlatGeoBuf

●Vector format built on
Google’s Flatbuffers library
●A buffer is considered a file
and everything within it.
●Although it is not required,
FlatGeobuf uses indexing to
help reduce the amount of
data that would need to be
transferred over a potentially
slow network.


FlatGeoBuf

FlatGeoBuf Reader
●Verify file buffers
●Search envelope

FlatGeoBuf Writer
●Create spatial index

Slide Title
Create a service that
automatically
uploads a range of
vector data to S3 as
FlatGeoBuf

Goal Key Result
FlatGeoBuf S3 Uploader App
Generic Reader
paired with user
parameters

Uploaded
buffers and an
upload html
upload report

Demo

●User parameters help make workspaces
more dynamic
●PROJReprojector with online grids
enabled

Summary of FlatGeoBuf S3
Uploader App

5
COPC
(Cloud Optimized
Point Cloud)

●Point cloud storage
optimized for the web
●Based on the LAS standard
●Only read what you need.
This is especially powerful for
point clouds given 3d data
data volumes can be huge
●Query XY min/max
●Essentially uses the LAS
reader / writer but with the
COPC structure


COPC

●Point cloud
generated from
drone imagery
using dense point
matching: ODM
●1.1 million points
●Converted from
LAS to COPC and
loaded to S3


COPC - White Rock Pier Post Storm from Drone Survey

●Uses the LAS reader / writer
but with the COPC structure
●LAZ compression
●Select Write as Cloud
Optimized Point Cloud
●Set CRS


COPC Writing

●Use S3Loader to upload
COPC to S3 bucket
●Compressed vs
uncompressed tradeoff
COPC S3 Loader

COPC Reading: Convert PointCloud to Polygon Features
1.Read COPC from S3
2.Filter out ground
points
3.Convert to raster
4.Classify raster
5.Convert raster to
polygons
6.Filter out donuts &
small polygons
7.Write result to
geojson

Read only points close to pier


COPC Reading - Extents filter

Convert pier point cloud to areas, calculate distance of collapse (47m)


COPC Reading: Pier Polygons Written to GeoJSON

●Point cloud storage
optimized for the web
●2024.0 fixes and
enhancements (round trip)
●Extents query supported and
optimized
●Coming: FeatureReader
clipping by extents


COPC Summary

6
ZARR
Format

●Multidimensional raster array /
time series storage optimized for
the web
●Based on NetCDF / HDF data
cube formats
●Only read what you need
●Particularly powerful for raster
time series, as multidimensional
arrays often mean huge volumes
●Query XY & band* extents
●Zarr reads cube with each time
step as a separate band with
properties - easy to work with


ZARR

●Time series raster storage
optimized for the web
●Based on NetCDF data cube
●NetCDF reads cube as multigrid
with 1 band for each time step
(hundreds of bands) and
properties in attribute lists
●Zarr reads cube with each time
step as a separate band with
properties - easier to work with
●Default translation from NetCDF
to Zarr just works*


NetCDF to ZARR

ZARR CMIP5 Climate Model Temp Analysis: Winnipeg, MB

ZARR CMIP5 Climate Model Temp Analysis: Hotspots in Winnipeg, MB

ZARR Climate Model Band Range Request: Python Parameters

OGC Climate Resilience Pilot 2023
Pilot Goals:
●Build climate resilience
●Expand audience for climate
services
●Demonstrate the value of OGC
standards and SDI’s (FAIR)
●Show how OGC can support
international climate change goals
●Build a community of stakeholders

better understand the range of possible
impacts - allows us to better prepare and
compensate for them
https://www.ogc.org/initiatives/crp/

How to provide the data needed for climate impact and
disaster indicators to a wider audience?
●Goal: Connect Climate and Disaster Pilots
●Data: Current situational awareness
○Base map: physical, land use, infrastructure, pop
○EO data: hazards and impacts
○Drought & hydrologic monitoring
●Data: Future change awareness - risk scenarios due to
climate change
○Climate model outputs - time series data cubes
○Temperature, precipitation and moisture projections
○Analysis Ready Data (ARD) model results summary
○Climate services known in climate community but not well
known or utilized across affected impact domains
NetCDF from Environment Canada
Disaster Pilot 2023:
Disaster and Climate Data Sources to ARD & Impacts

MB Drought Risk: Combined Precip Temp Query
OGC API Features Query Parameters:
Start Year: 2020
End Year: 2060
BBox: -100.0,49.0,-96.0,50.5
Limit: 2,000,000
MinPeriodValue: 0 (PrecipDelta)
MaxPeriodValue: 0.75 (PrecipDelta)
MinTemp: 23C (Min Mean Monthly Temp)

Find all time step points over the next 40
years for southern Manitoba where
projections indicate:
●> 25% dryer than historical mean
AND
●mean monthly temperature > 23C

MB Precipitation: Future Delta
PrecipDelta = PrecipFuture / PrecipHistoricalMean
/=
Yields normalized value from 0 to N where 0 = no precipitation and 1.0 = 100% of historical mean

MB Drought Risk: Combined Precip Temp Output

●Multidimensional raster array / time series
storage optimized for the web
●Based on NetCDF / HDF data cube formats
●Only read what you need
●Zarr reads cube with each time step as a
separate band
●Query XY extents
●Band range - supports max not min
●Be careful with feature cache
●Data Inspector refresh needed with stacked
rasters: select range and then select cell again


ZARR Summary

7
GeoParquet

GeoParquet
●Cloud native / cloud friendly vector data storage
●Built on & follows Parquet standards
●Column oriented
●Highly optimized for accessing very large data
volumes where you need access to a few columns
and geometry, such as for analysis
●Benefits from a mature set of applications, libraries
& tools available for Parquet
●Supports a range of geometries
●Not spatially indexed yet (use partitioning, duckDB)

Slide Title
Optimize reading
and analysis of
published large
vector dataset

Goal Block Key
GeoParquet reader performance demo
Result
Internet
bandwidth and
local processing
limitations

Structure data so
you only read
what you need

Test case:
Geoparquet is 2 - 3
X faster than other
alternatives

Demo

Performance: Geoparquet vs OSM, Geopackage
1 millions records, select and spatially analyze 107k water areas

GeoParquet Partitioning
Nested structure with folder by feature type and
separate files for each value for selected attribute

GeoParquet Partitioning
Only read the features with the
feature type and values you want
Nested structure with folders by
feature type and separate files for
each value for selected attribute

Reader Local S3 Cloud -> localS3 Cloud -> FME Hosted
OSM reporter* 23.2 60.4 38.1
Geopackage
reporter*
1.2 102.8 14
GeoParquet
reporter*
1.3 37.5 7.2
GeoParquet
partitioned*
0.3 15.2 4.9
Performance: Geoparquet vs OSM, Geopackage
*1 millions records, select and spatially analyze 100k
water areas. Process time in seconds

●Column oriented vector format
●Geoparquet test: 2-3X faster than
others
●Cloud native for vector not as easy
as for raster, point cloud
●Adds requirement for appropriate
cataloging
●Additional speed improvements
with more attribute level partitioning
●This addresses some of the debate
around geoparquet as cloud native
●DuckDB with Geoparquet to
improve cloud native performance


Key Lessons
GeoParquet

Other Cloud Data Stores: Cloud Databases
… to name a few

8
Key limitations
& Integration
Strategies

●Start publishing now!
●Keep the processing close to the data
●Minimize traffic footprint - select just what you need
●Leverage data side filtering, microservices, lazy evaluation
●Metadata: enrich and update
●Optimization strategy: transactions volume vs data volume, response time requirements
●Test! Especially your core usage scenarios

Integration Strategies
Key limitations & Integration Strategies

Considerations
●Heavier preprocessing, larger size required to structure and store data for optimized read
●Updates are a challenge - automation helps
●FME’s implementation based on third party libraries - collaboration for fixes,
enhancements
●Newer cloud native formats: less data publicly available so far: COPC, ZARR
●Cloud optimized vector options - choice depend on use case: GeoParquet, FlatGeoBuf
●Supporting infrastructure: duckDB for Geoparquet etc

Key limitations & Integration Strategies

Geoparquet & FlatGeoBuf
Yes

9
Conclusions

Optimize
your web
data flows

Lessons Learned
●Cloud native is all about making it easy to publish data
without a server, optimizing responses to web data
requests: read just what you need!
●No one size fits all: each format has its strengths and
limitations
●STAC: steeper learning curve, collections within
catalogs and vice versa, implementations vary; security
●COG, COPC: perhaps the most intuitive - 2d and 3d
arrays are just easier to manage. STAC/COG most data
●Vector - still evolving: FlatGeoBuf more effective in its
cloud native support but newer, less widely adopted.
Geoparquet has more tooling but design and support
needed to make effectively cloud native
●ZARR - powerful but complex - as a very new and niche
format, support is still growing

Summary
●Cloud native is all about making it easy to publish
data without a server, optimizing responses to
web data requests: read just what you need!
●Safe’s strategy is to track and support emerging
standards across a range of data types so FME
users can stay ahead of evolving web technologies
●FME allows you to integrate between hybrid
environments as needed
●Keep the processing close to the data
●Minimize traffic footprint - reader filtering
●Open standards enable community-wide adoption
and access
●No one size fits all - know your key requirements &
test!

One platform, two technologies
FME Form FME Flow
Build and run data workflows Automate data workflows

FME Flow Hosted
Safe Software managed instance
fme.safe.com/platform
FME Enterprise Integration Platform
Safe & FME

10
Resources

Resources
●Geospatial Cloud Native Overview
●Webinar: Cloud Revolution: Exploring the New Wave of
Serverless Spatial Data
●Webinar: Leveraging FME for Cloud Native Databases
●Cloud Native Databases - Blog
●Radiant Earth Blog: Cloud Native Geospatial Solutions
●Cloud Native Geospatial Foundation & Slack channel
●OGC - Cloud Native Geospatial
●Source Cooperative
●Chris Holmes: FOSS4G NA 2023 | Towards a Cloud
Native Spatial Data Infrastructure
●guide.cloudnativegeo.org
●Cloud Native Data Formats
●Safe’s Participation in OGC Pilots

Data Sources
STAC / COG:
●catalogue.dataspace.copernicus.eu/stac/
●cmr.earthdata.nasa.gov/stac/
●planetarycomputer.microsoft.com/catalog
●usgs.gov/landsat-missions/landsat-collection-2
●planetarycomputer.microsoft.com/api/stac/v1/colle
ctions/sentinel-2-l2a
●https://planetarycomputer.microsoft.com/api/stac/
v1/collections/nrcan-landcover
ZARR:
●https://console.cloud.google.com/marketplace/pro
duct/noaa-public/cmip6
COPC:
●github.com/PDAL/data/tree/master/autzen
●copc.io/#example-data

11
Next Steps

Next Steps
●Coming:
○Blogs
○Tutorials
●Community involvement: Cloud Native
Geospatial Foundation, OGC
●Events:
○cncf.io/events/
○safe.com/company-updates/2024-upcomi
ng-events/
○ogcmeet.org/
●New functionality: what are your priorities?
○DuckDB?
○ZARR band range?
○Other STAC media types?

Get our Ebook
Spatial Data for the
Enterprise

fme.ly/gzc


Guided learning
experiences at your
fingertips
academy.safe.com


FME Academy
Resources
Check out how-to’s &
demos in the knowledge
base
community.safe.com
/s/knowledge-base
Knowledge Base Webinars
Upcoming &
on-demand webinars

safe.com/webinars

Claim Your Community Badge
●Get community badges for watching
webinars!
●fme.ly/WebinarBadge
●Today’s code: SLMWB

Join the Community today!

12
Q&A

Thank You
Recap of Next Steps

1Contact us: [email protected]
2Check out the new landing page to get started!

Please fill out our
webinar survey
Tags