From 1M to 1B Features Per Second: Scaling ShareChat's ML Feature Store

ScyllaDB 474 views 33 slides Jun 26, 2024

Slide 1 of 33

About This Presentation

ShareChat's Ivan Burmistrov walks through how they built a low latency ML Feature Store based on ScyllaDB which initially failed to meet the scalability requirements and failed on 1 million features per second load, but has been successfully scaled 1000 times to handle 1 billing features per sec...

Size: 20.82 MB

Language: en

Added: Jun 26, 2024

Slides: 33 pages

Slide Content

From 1M to 1B Features Per Second: Scaling ShareChat’s ML Feature Store Ivan Burmistrov Principal Software Engineer Andrei Manakov Staff Software Engineer ShareChat ShareChat

Context

Moj, a short video app

What are the features anyway?

The story

The architecture

Tiles

High-level architecture

Why it failed?

ScyllaDb Schema (Bad) CREATE TABLE features ( entity_id string, tile_time timestamp, feature_name string, value blob, PRIMARY KEY ((entity_id), tile_time, feature_name))

Tiling Configuration (Bad)

Let’s do some math

Optimisations

ScyllaDb Schema (Good) CREATE TABLE features ( entity_id string, tile_time timestamp, features blob, PRIMARY KEY (entity_id, tile_time))

Tiling Configuration (Good)

Compaction Strategy

4x Scylla Would Be Enough?

Cache locality

Consistent hashing

Consistent hashing: ingress nginx.ingress.kubernetes.io/upstream-hash-by: "$bucket-value" nginx.ingress.kubernetes.io/upstream-hash-by-subset: "true" nginx.ingress.kubernetes.io/upstream-hash-by-subset-size: "3"

Consistent hashing: ingress

Consistent hashing: subset

Consistent hashing ingress: path rewriting nginx.ingress.kubernetes.io/use-regex: "true" nginx.ingress.kubernetes.io/rewrite-target: /$1 nginx.ingress.kubernetes.io/upstream-hash-by: $2 hosts: - host: feature-service.internal paths: - path: /(method)(/hash-by-.*)

Consistent hashing: ingress

Improve cache locality: 27 deployments

What’s next?

Envoy proxy

Feature Service Optimisation

Envoy proxy - result

Conclusion Robust proven technologies pay off (ScyllaDB, Flink,...) Every next step is harder than previous one The simplest and practical solution does work The most optimized solution isn’t human-friendly Don’t be scared to fork a lib and adjust it for your system

Ivan Burmistrov burmistrov.ivan @gmail.com @isburmistrov [email protected] Thank you! Let’s connect. Andrei Manakov andection @gmail.com @AndreyManakov andection@threads

From 1M to 1B Features Per Second: Scaling ShareChat's ML Feature Store

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

From 1M to 1B Features Per Second: Scaling ShareChat&#39;s ML Feature Store

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

From 1M to 1B Features Per Second: Scaling ShareChat's ML Feature Store