How to build more fault tolerant API's to withstand scale, security and maintainability challenges
Size: 44.6 MB
Language: en
Added: Jul 26, 2024
Slides: 11 pages
Slide Content
Failure Engineering - API Edition
•
•
•
•
Requires first principles thinking
Simplicity at it's core
Paranoia is good
Tool - Imagine an impossible scenario
and hold your system up to that
Failure Engineering
Planning to fail, so that you react better during
incidents
API - The Usual Stuff
•
•
•
•
•
•
Status Codes
Latency
Resource utilisation - only
because your CFO makes you :)
Scaling Ladders
Performance Testing
Endurance Testing
•
•
•
Miss the forest for the trees - Focus
on per service, not the whole system
Serviceability, not uptime
Some services can die
What do we miss usually?
Enough with the suspense!
•
•
•
•
Observability - Where is it
smoking?
Causality with Topology - It's
smoking here, but broken there
Lazy Origins - Do less at origin,
create run-off's
Escape Hatches - Whats your plan
B, C, X?
•
•
•
•
•
Dashboard - Visualise your topology
Monitor - Know your breaking points &
monitor them
Noise Management - Traffic patterns are
temporal + seasonal
Alerting Maturity - Build a mature alert &
incident response SOP.
Vendors - Vendors are part of your fabric as
well, don't skip monitoring them
Observability
Protect your origin
•
•
•
•
Scale Selectively - Answer new
questions only
Walled Garden - Only whitelisted
requests, create a perimeter for
security
Create Runoffs - Nobody needs
to see your origin is down
Pre-Warm - Anticipate the wave
•
•
•
•
•
•
•
Is it cacheable? - Then Yes
Leverage the middle tier
Leverage the edge device
CDN = Shock Absorption + Hooks to inject
behaviours
Some high performance use-cases should skip
CDN
Evaluate value of "drag" - Intermediate blocks
allow for more “in-transit” decision making,
but they also add some drag
Nothing is free - CDN's require tuning &
monitoring too
Should you CDN?
Danger - Here be dragons
•
•
•
Row of houses fires - Cooperating
API's fail, mid tiers fail
Dam Bursts - Complete failure
coupled with untested "side-
mitigations" bring tsunami's
Poor Eviction Strategies - Don't
forget the basics
•
•
•
•
Delegate : Don’t come to the origin,
resolve known answers closer to the
client. Move behaviors to edge.
Facade: Create escape hatch, if
origin has troubles
Principle : Don’t answer questions
for which answers are given already
Bare Metal - Monitor as close to
metal, monitor what matters
Patterns - To Take Away
There is no replacement for rigour & discipline in production