Apidays New York 2024 - The subtle art of API rate limiting by Josh Twist, Zuplo

APIdays_official 89 views 16 slides May 23, 2024
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

The subtle art of API rate limiting
Josh Twist, Co-founder & CEO at Zuplo

Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024)

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apiday...


Slide Content

The subtle art of API rate
limiting
Josh Twist
CEO @zuplo

Me
Founded several services in Microsoft Azure:
API Management
Logic Apps
Mobile Services
Power Automate Pro
Head of Product at Stripe and Facebook
Been a vendor and a customer of API gateways /
management

Agenda
Why do we need this?
What’s the canonical approach?
Private or Public
Latency tradeoffs and optimizing the customer experience
Observability
Demo
Content

Am I actually going
to be attacked?

Am I actually going
to be attacked?
Yes, but not by who
you think.

Why do we need rate limiting?
Protect resource consumption
Protect resource starvation
Ensure good experience for all customers
Discourage abusers
Enforce business model

A canonical response
Status - 429
Status Text - Too Many Requests
Header - Retry-After: 3600
Body - Use Problem Details format (IETF RFC 7807)

A canonical response
Status - 429
Status Text - Too Many Requests
Header - Retry-After: 3600
Body - Use Problem Details format (IETF RFC 7807)
Do you want to do
this?

Private or public limits?
Disclosing limits allows malicious users to easily maximize their consumption
of your resources without hitting the limit
Not disclosing limits makes it hard for consumers to avoid hitting your rate limit
Informal Partnership 
(e.g. Free
Tier)
Formal Partnership 
(e.g.
Contracted Customer)
Hide limits
Disclose
limits

How to apply rate limiting
IP based?

Rate limiting in
distributed systems
is hard

The latency / accuracy
tradeoff
Being accurate means completing the rate
limit check before allowing the request to
proceed == a slower API for everyone
You incur the latency on every request,
whether an problem is afoot or not
Consider a more lenient approach where you
run the check asynchronously and cache
blocked consumers
Here’s how

Async rate limiting
Have a local cache to store
known limited consumers (store
the retry time also)
Check the rate limit in parallel with
performing the primary request
If rate limit check comes back
blocked, update the cache and, if
the primary request not compete,
override the response.
Otherwise, the next request will
be blocked anyway.

Observability?
RPS (requests per second)
report
See rate limited responses
Break down by bucket (IP,
user etc)
What are the minimum requirements

This content available online now
Blog post: zuplo.link/rate-limiting-blog
Video: zuplo.link/rate-limiting-video