Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf

Andrea Aime
Simone Giannecchini
GeoSolutions

Scaling GeoServer
in the cloud:
clustering state of the art

GeoSolutions
Enterprise Support
Services
Deployment
Subscription
Professional
Training
Customized
Solutions
GeoNode
•Offices in Italy & US, Global Clients/Team
•30+ collaborators, 25+ Engineers
•Our products

•Our Offer

Affiliations
We strongly support Open
Source, it Is in our core
We actively participate in
OGC working groups and
get funded to advance new
open standards
We support standards
critical to GEOINT

Introduction

GeoServer, can in run in the cloud
•Not cloud native

•But ready to deploy in the cloud
•Databases
•Blob storage support
•COG

•GeoServer cloud: kubernetes, microservice and
more cloud readiness facilities → see dedicated
presentation

GeoServer clustering
•OGC services are (mostly!) stateless

•Exception(s)
•WPS async requests
•Importer async requests

•Do we need a clustering plugin?
•Most of the time, no

Static configuration case

The backoffice/production model
•Backoffice environment (aka TEST or STG)
•Set up new layers
•Design their styles for optimal output and
performance
•Setup metadata and descriptions correctly
•Test everything carefully before going live

The backoffice/production model
•Production environment
•Static, shared configuration
•For the few async requests, shared state in
external database
•Auto-scale as you see fit

Putting it all together
•Version control data directory (checkout when
ready): git, svn, whatever

•Rolling reload production

But pay attention to
•Keep data separate from
configuration
•Keep logs separate, and
each node uses a different
file

•Number of layers, rolling
restart can take some time
DATA
LOGS
TILE
CACHES
CONFIGURATION
ENV
PARAM

Tile cache deployment, opt 1
GS/GWCGS/GWCGS/GWC
Shared file
system
GS/GWC
Seeding
•Most relaxed layout

•Shared filesystem
can have issues with
heavy concurrent
writers

•Can work, if the
cluster mostly reads

•Separate, dedicated
machines for
focused seeding
(temporary docker
images)

Tile cache deployment, opt 2
GS/GWC
FS
•Layout useful for
short lived caches
and fragile network
filesystems

•Duplicates work to
get better stability

•Common cases,
weather forecast

Memory
cache
GS/GWC
FS
Memory
cache
GS/GWC
FS
Memory
cache
Load balancer

Tile cache deployment, opt 3
GWC
Dedicated
filesystem
•Layout useful for few
layers (GWC config
is XML files)

•Optimal hardware
usage
•Double configuration
effort (automate
using REST)

GS GS GS
Load balancer
•Want to have GWC read
configuration directly from
GeoServer instead? Good
idea, funding wanted!

No no, I need to change config in
production all the time!

Do you really do though?
•In our experience,
most of the time, you
do not! (well, not at a
high rate!)

•You can use static
configuration with
dynamic data loading,
filtering and styling

I receive new data continuously!
•Fine, why set up a new layer
for each data batch though?

•If structure is regular
•Use dimensions (time,
elevation, custom ones)
•Use client side filtering

•Much better option to keep
time moving windows (e.g.,
last 3 months of data)
T1
Tn
T2

Work on the mosaic index
•Just record new entries
and remove older ones
in the database

•No need to touch the
configuration
T1
Tn
T2
Tn + 1
OUT!
IN!

“Mosaics” everywhere
•Storage options
•Image mosaic store (STAC index too)
•(Partitioned) database tables
•Vector mosaic store (external storage)

Different uses -> different views
•You “just” need to filter data
•GeoFence plugin can filter layer based on the
user by alphanumeric/spatial

Client limitations? Trick them!
•Clients that cannot deal with dimensions, or
vendor parameters (e.g. CQL_FILTER)
•Use the “parameter extractor” community
module
/geoserver/tiger/wms/H11?SERVICE=WMS…

/geoserver/tiger/wms?SERVICE=WMS
&CQL_FILTER=CFCC=’H11’&...

Client allows to change style!
•You allow users to change the style of the
maps?
•Just use &sld and &sld_body in your
requests
•Or parametric styles! “env” function is your
friend!

But if you really need to change
configuration all the time, then…

Typical use case
•Case A:
•The application allows users to upload their
custom data
•They are responsible for its configuration
•Hopefully many small data sets

•Case B:
•Hum… wait, haven’t really met another case
yet!
(maybe tell me during the Q&A at the end?)

Clustering community modules
•Not enough traction to have a dedicated
maintainer
•Few deployments use either of them

JMS clustering JDBC clustering

JMS config
•Loads data
directory
from XML
files

•Sends JMS
messages
to
distribute
changes

•Copies the catalog into a database
•Loads configuration on demand
•Caching (too slow otherwise, many queries)
•hz-cluster sibling module sends drop cache
messages
JDBC config

Some testing

Testing WMS requests
•Using ne-styles repository:
•Natural Earth Data
•CSS styling
•Political map

Configuration cases
•Data volumes
•Case A: ne-styles as is (25 layers)
•Case B: “ne” workspace duplicated up to
40.000 layers

•Clustering
•Vanilla/JMS config
•JDBCConfig

•Builds: 2.24.x nightly, June 24th 2023

Load testing results
•Mostly unaffected by number of layers
•JDBCConfig between 10% and 20% slower
•JDBCConfig was 50% slower with with lots of
layers, has improved!
JDBCConfig couple years ago

Startup times with 40k layers
•JDBC Config has constant startup time,
does not load config
→ 13 seconds

•Vanilla/JMS proportional to number of
layers:
→ 56 seconds
(would take longer on a completely cold disk)

•Experimental new XML config loader in GeoServer
cloud that could do better than this.

Administrative GUI access, 40k layers
•Access to home page as
admin:
•JDBCConfig: 72 seconds
•Vanilla/JMS: 1 second

•Access to layers page as
admin
•JDBCConfig: 300 secs!
•Vanilla/JMS: 2 seconds

The future:
eat your cake and have it too!

Reality check - 1
•GeoServer code built to have the full catalog
in memory
•Code expects it’s quick to:
•Get any configuration, eventually multiple times
per request
•Get the full list of anything (layers, workspaces,
stores, styles, …)

•Makes it hard to have any solution based on
external storage

Reality check - 2
•A small amount of deployments actually need
dynamic configuration changes in production

•Core developers are already busy with GIS

•Configuration data structures changes over
time (core modules) → don’t want a clustering
plugin that needs to be constantly aligned
with core changes

Idea: Hazelcast distributed memory
•Will use the same serialization as GeoServer
configuration for messaging (maintained across
configuration changes)

•Declare distributed Maps, let HZ do the message
passing for us

•Distribution library, among other things:
•Distributed data structures, with “near
cache”
•Distributed locks
•Messaging
•Integration with various clouds

Idea: Hazelcast distributed memory
GS1 GS2 GS..n
Hazelcast
Distributed
catalog near
cache
Distributed
catalog near
cache
Distributed
catalog near
cache

Conclusions

Let’s summarize
•Most of the time, use static data directory,
share, load balance, auto-scale, live happily

•When you really need to change configuration
at runtime, JMS cluster or JDBCConfig, with
some limitations

•Moving forward, we’re going to develop a new
plugin that hopefully matches low
maintenance with good performance

The End

Questions?
[email protected]
[email protected]

Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......