Get Low (Latency) by Benjamin Cane and Tyler Wedin

ScyllaDB 702 views 17 slides Oct 11, 2024

Slide 1 of 17

About This Presentation

Building a real-time, low-latency card payments system is a challenge. Join the Amex Payments Network team to learn about their 100% containerized, globally distributed platform powered by Kubernetes. Discover how they tackled latency with HTTP/2, local affinity, and more. #DevOps #Kubernetes

Size: 1.89 MB

Language: en

Added: Oct 11, 2024

Slides: 17 pages

Slide Content

A ScyllaDB Community
Get Low (Latency)
Benjamin Cane
Distinguished Engineer
Tyler Wedin
VP Global Payments
Network SRE

Introductions: Tyler Wedin
▪Vice President of Core Platforms SRE at American Express
▪My primary focus is engineering and instrumenting high
availability and resiliency into our most critical
customer journeys
▪I spent a considerable amount of my career building high-
speed and fault-tolerant infrastructure
▪My favorite greeting is a three-way handshake

▪Distinguished Engineer at American Express
▪I work on our core payments platforms
▪Throughout my career I’ve held roles in both infrastructure and
software engineering
▪Building fast, scalable, and reliable distributed systems is my
passion
Introductions: Benjamin Cane

In 2018, American Express started an initiative to rebuild its
payment network from the ground up.
We wanted to build a platform that could adapt with the future of
payments. We designed the system to be ﬂexible as we continue to
enable new products and capabilities.

So, we chose:
▪Microservices-based architecture
▪Modern API-based interactions for internal communications and
integrations
▪Containers and Kubernetes
Payment Network Modernization

Payment Network Characteristics
Scalable Resilient Low-Latency

Understanding the Problem with
Microservices

▪Each service-to-service call increases
–Network overhead
–Latency
–Chances of request failures

▪Cross-region calls make these problems
exponentially worse
–~60 milliseconds of latency between New York
and Los Angeles*
–~260 milliseconds of latency between Singapore
and Los Angeles*
* As per https://wondernetwork.com/pings
Death by a Thousand Paper Cuts

How American Express Optimized It’s Payment
Network Architecture
ACHIEVING SCALE, LOW LATENCY, AND RESILIENCY

▪Local aﬃnity and cross-region routing
▪HTTP/2-based protocols for service-to-service requests
▪Caching ensuring data is locally available before transactions arrive

▪Asynchronous logging and emphasis on metrics over logging

▪Local disks for databases instead of software-deﬁned storage
▪Go as the language of choice for critical routing services
Optimizations
Today’s Focus

Design Principles:
▪Each cell is independent
▪Cells must leverage local data
▪Transactions fully processed within the nearest available cell
▪Communication across cells, availability zones, or regions
are limited to a custom router
– Microservices are unable to communicate across
availability zones via network controls
Keeping Transactions Localized by Design

Pod-to-Pod Communications

Caching and Data Locality
Preloaded,
Read-through Caching
Message-based
Replication
17
Transaction
Aﬃnity
TO LOCALIZE DATA, WE FOLLOW THREE PATTERNS

A signiﬁcant optimization for our platform was the selection of HTTP/2 and
gRPC (leverages HTTP/2).
▪HTTP/1.1: Synchronous Requests
HTTP/1.1 by design is a synchronous protocol. This means with each request;
the server must respond before the next request is sent.
▪HTTP/2: Asynchronous Requests with Connection Reuse
HTTP/2 is an asynchronous protocol. Multiple requests are sent via the same
connection.
Optimize Service Request Performance

While the use of HTTP/2 for service-to-service calls reduced our
latency, it also added a level of complexity around load balancing.
Kube-proxy is a layer 4 connection-based load balancer. With
HTTP/2, multiple transactions can be sent down a single
connection, which will result in overloading single pods.
To properly distribute load across pods we introduced a service
mesh, deploying Envoy sidecars.
Service Mesh

▪Get Low
–By focusing on locality, taking the most direct path
▪Get Low
–By limiting dependencies and pushing data ahead of time
▪Get Low
–By using asynchronous communications
▪Get Low
–By making latency and resiliency ﬁrst-class features of your platform
Summary

Outro
Does any of this sound interesting? American Express is hiring! Check out our open positions at americanexpress.com/techcareers
Benjamin Cane – Distinguished Engineer
bencane.com
linkedin.com/in/bencane

Tyler Wedin – Vice President, Core Platforms Site Reliability
Engineering
linkedin.com/in/tyler-wedin-47304ba/

Get Low (Latency) by Benjamin Cane and Tyler Wedin

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Get Low (Latency) by Benjamin Cane and Tyler Wedin

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......