"$10 thousand per minute of downtime: architecture, queues, streaming and fintech", Max Baginskiy

fwdays 321 views 43 slides Jun 18, 2024
Slide 1
Slide 1 of 53
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53

About This Presentation

Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.

As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage...


Slide Content

$10 thousand per minute
of downtime: architecture,
queues, streaming and fintech
Max Baginskiy
Solidgate

About me
Head of Engineering
Previously Tech Lead and Platform engineer
10 yrs in Software Engineering
Last 6 years Go, fan of DevOps
Build teams (5 teams, 30+ people hired)
And architecture

Agenda
*%Company intro%
%Architecture of the system%
%Queues and Streams to choose from%
%Low latency streaming using outbox%
%CDC to our solution - comparison%
'%Questions.

About company
7+ years online 70 engineers
50 SW engineers
20 Infra + Data engineers + AQA
PCI DSS Compliant European Acquirer

Business figures
2.5b$
annually
15-18m
tx monthly
10k$
1 min of downtime
40+
integrated payment methods

and providers

ALB Traffic
We have 100x less traffic on ALB

during high season than Shopify
Stripe served 250mil API calls

in 2020 per day

Kafka Producer
We started integrating kafka last year 20 rps average 2 mil events per day

RabbitMQ Producer
100-120 rps average 10 mil events per day

Logs
1.5-2k rps of logs 150 mil events per day. 200-300 GB of logs daily

Architecture

Let the story

begin
Go

Project “Taxer”

Non functional requirements
Durability out of the box Queue replay
Single active consumer support
Easy to setup and to maintain Partitioning
Easy scaling for publisher and consumer
Extensiblity: schema registry support, dynamic routing, enrichment

NFR - explanation
What if message is lost in between services

while processing and we retry payment?
What if message is lost in between
callback service and callback processor?
What if message is lost in between
payment and finance systems?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?
what if … what if … what if …?

RabbitMQ dive in
Erlang
Written in Erlang. Erlang made by Ericsson which
makes telecommunication devices.
Proof of fail-safety
ATM AXD301 example, Calculated uptime
99,9999999%, only one problem per many years.
Mnesia as storage
Mnesia doesn’t support recovery from split brain and
other types of failures.

RabbitMQ Durability
Mechanisms
7 Publisher confirms is a MUST have!
7 RabbitMQ can store data to Disk and
different autoheal modes!
7 Different types of queues: Quorum,
Mirrored!
7 Have Streaming in “beta”.
What if publisher confirms
disabled?
7 Delivery after exchange might not
happen!
7 Persistence might not happen!
7 Few replicas might not acknowledge
message in Quorum!
7 Overwhelmed Cluster will not accept
messages but publisher will not
know.

Quorum queues
+ Pros
( Have Consensus built in
( Data written to disk, metadata in memory
( Can easily handle restarts.
− Cons
( Doesn’t scale well - millions of messages after
restart can replicate hours.>
( Doen’t have “replay” mechanism
( Consumers doesn’t scale
( Doesn’t preserve order of messages.

Split brain

Split brain - autoheal
ignore
Use when network reliability is the highest practically possible and node availability is of topmost importance.
pause_minority
Appropriate when clustering across racks or availability zones in a single region and the probability of losing a majority
of nodes (zones) at once is considered to be very low.
autoheal
Appropriate when are more concerned with continuity of service than with data consistency across nodes.
Summary - no way to guarantee that autoheal will work properly

RabbitMQ streaming, problem #1

RabbitMQ streaming. Go client, issue #2

RabbitMQ streaming. Go client, issue #3

RabbitMQ + RabbitMQ streaming
New feature that not a lot of companies use.
Go client is not ready, what about Python or Node.js I’m aftaid to ask.
Hard to support. Requires updates of Erlang and then RabbitMQ.
Streaming is a plugin that requires specific version of RabbitMQ.
Not made for fintech: lack of proper durability, lack of functionality.

Kafka dive in
Java
Written in Java by Linkedin and then
opensourced and licenced under
Apache licence.
Highly available and durable
Has WAL, works in cluster, saves data to
disk by default.
️ Blazing fast
Sequential writes, zero copy.

Kafka dive in
Kafka uses optimizations around Sequential writes to optimize disk usage with zero copy.
Has WAL log for replication and durability.
Zookeeper as separate system tracks health of the cluster.
Can work even without Zookeeper.
Chaos engineering shows that Kafka is highly available and durable solution.

Debezium
Debezium how to: Create a replication slot Run Debezium Java service in cluster Configurate it with Groovy

Debezium
+ Pros
Uses WAL directly - doesn’t create
additional load to WAL(no additional
data is written).%
Production ready, tested solution)
Low latency. ️− Cons
How to replay data? Can you specify
Log Sequence Number? What if you
need to stream only a fraction of what
is written in WAL?
Missing Buf(protobuf on steroids))
Low flexibility and hard configurability)
DB Isolation.
Groovy which is not easy to use)
Random disconnects

and need to restart.

Transactional outbox
Why to use Transactional
Outbox?
Nor Kafka nor CDC can flexibly
re-stream data.
Without specific instruments
you can’t remove specific
events from Kafka.
Replay with Kafka will require
setup of additional services.
Consistent state with the usage
of Transactions.

Outbox table - WAL
ID - ulid (sortable uuids). Bucket - partitioning. Read/Write partitioning. Payload - json body of domain model.

Choosing Go lib
confluent-kafka-go - CGO + librdkafka
ibm/sarama
segmentio/kafka-go

Outbox table - WAL
BatchSize - usually 10. Batch Timeout - 100ms. RequiedAcks - all nodes should confirm message.
Async - alse for synchronous error handling.

Kafka write latency

Schema registry -
“Speca first” approach - speedup development.
Backward compatibility support - linters.
Reusable “menthal model” = simplified migration from api to stream.
Client, server and models are generated for various language.
Simplified versioning.

Taxer v1 option we built
Update payment in Gate(kotlin).
Transaction: Save payment update
and create a record in Outbox.
Order streamer(Go) - reads batch
from outbox.
Publish data to Stream.
Update Offset in meta table.

Streamer

Meta table

Architecture comparison

v1 comparison with typical architecture
+ Pros
G We have a full transaction log that can
be replayed, reworked, saved, fixed,
G Only 1 new tech - kafkaD
G Streamer + Leaser = 200 lines of code +
800 lines of tests. It can be used as
library not a serviceD
G Buf/Go/PostgreSQL - everything
reused - maintenance simplified.− Cons
G WAL amplification - 2x. Transactional
outbox requires 1 more write to each
operationD
G High delay - 2 min for eventsD
G More CPU load than just reading from
WAL.

The end.

Nah, I’m kidding.

Order streamer delay: 2min

You learned about 2 min delay

Order streamer delay: 2min
ULIDs - doesn’t allow us to understand the commit order of events and missing parts.

Solution - Logical Clock + Auto increment Logical clocks allow a distributed system to enforce a partial ordering of events without physical clocks.

You also can detect missing events with them.

v2 Implementation
Auto increment instead of ULID will help you to report
and look for missing IDs. It’s more like “logical time”.
Look for missing ids for, save them in meta table for 2
mins and restream them when they appear.

v2 Summary
+ Pros
* Reduces delay time from 2 mins to
literally seconds
* We can use this approach not only in
reports/taxes but also in processing.
− Cons
* Ordering can be broken, but we can
support several models of eventual
consistency
* Higher DB CPU utilization.

The end.

Nah, I’m kidding x2

v3 reading WAL
WAL “reading” can be
implemented in just 200 lines

of code along with replication
slot creation and publication
creation.
In PostgreSQL replication slots
you have an access to
, .

It seems like you can replay
changes.
received_lsnlatest_end_lsn

V3 WAL lib - Next Time
See you at the next Highload!
.....

making online payments simple