Route It Like It’s Hot: Scaling Payments Routing at American Express by Benjamin Cane

ScyllaDB 505 views 17 slides Mar 05, 2025
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Join the American Express Payment Acquiring and Network team as they share insights on building their Global Transaction Router, which powers payment routing at Amex scale. Learn how they design, build, and operate it to handle record-breaking shopping days, ticket sales, and unpredictable demand.


Slide Content

A ScyllaDBCommunity
Route It Like It’s Hot
Scaling Payments Routing at American Express
Benjamin Cane
Distinguished Engineer

▪Distinguished Engineer at American Express.
▪I work on our core payments platforms.
▪Throughout my career I’ve held roles in both
infrastructure and software engineering.
▪Building fast, scalable, and reliable distributed
systems is my passion.
Introductions: Benjamin Cane

In 2018, American Express started an
initiative to rebuild its payment network
from the ground up.
We wanted to build a platform that could
adapt to the future of payments.
We designed the system to be Cloud-ready,
Flexible, Secure, Resilient, Low-Latency,
and Scalable.
Payment Network Modernization

An essential component of this new network is the Global Transaction Router. It manages
connectivity and routes payment transactions to and from partner financial institutions. Of
course, it must perform this role at the scale of American Express Payments.
American Express Global Transaction Router

Sitting at the edge of our Payment Network presents
several challenges.
Challenges:
▪Managing and load balancing long-lived TCP sessions.
▪Instant, rapid-fire, asynchronous ISO8583 messages.
▪ISO8583 is an international standard for payment
messaging.
▪Unexpected behavior and traffic spikes.
5
Unique Challenges

Design
SECURE, RESILIENT, LOW-LATENCY & SCALABLE

There is a lot to like about Go as a language. It’s easy to
learn, has excellent tooling, has a strong standard library
(especially for networking-based applications), and is a
language built for software engineering.
It also fits nicely with challenges commonly faced
by Payments Platforms:
▪Concurrency: Enables us to handle high volumes of
connections and transactions.
▪Ahead-of-time-compilation: Enables us to be ready to
handle volume immediately without warm-up time.
7
Design: Selection of Go

While we support multiple protocols and message
formats externally (like any good payment switch)
internally, we opted for gRPC.
We selected gRPC because it offers the convenience of
a modern protocol without sacrificing performance.
▪Protobuf: Binary format is more efficient during
transmission and serialization.
▪HTTP/2: Provides asynchronous communications
with a well-understood protocol.
8
Design: gRPC for Internal Communications

Logging can be a silent killer.
▪We leverage Asynchronous
instead of Synchronous
Logging.
▪We reduced the logs per
Transaction while balancing
manageability.
▪Transaction-based logs are
always Debug, which is off by
default.
9
Design: Asynchronous Logging

Build
A GREAT DESIGN CAN BE POORLY IMPLEMENTED

As we build the platform, we leverage tools like PProf and Go’s
Benchmark Tests to incrementally improve the system.
Doing so has led us to find opportunities for optimization:
▪Replacing Mutex with Reader/Writer Mutex.
▪Reducing goroutine creation in key areas.
▪Avoiding Tickers in favor of Afterfunc.
12
Build: Incremental Optimizations

14
Build: Reader/Writer Mutex vs. Mutex

15
Build: Overusing Channels

Operate
POORLY MANAGED PLATFORMS DEGRADE OVER TIME,
WELL MANAGED PLATFORMS IMPROVE OVER TIME

We continuously monitor and improve the system to ensure our
payment router stays scalable and performant.
With every release, we:
▪Performance test to identify any degradation.
▪Chaos test to identify any resiliency challenges and verify
failover and recovery capabilities.
▪Simulate bad behavior to ensure we can handle unexpected
use of the system.
17
Operate: Continuous Improvement

When building a high-performance, low-latency transaction
router, every design decision and implementation detail matters.
But the key is not to get it right the first time; it's about
constantly iterating, testing, and fine-tuning the system to stay
ahead of challenges.
In our world, milliseconds matter. Payments don't wait.
18
Conclusion

Benjamin Cane – Distinguished Engineer
EOF
Does any of this sound interesting? American Express is hiring! Check out our open positions at
americanexpress.com/techcareers
bencane.com
linkedin.com/in/bencane
Tags