Feature Store Evolution Under Cost Constraints: When Cost is Part of the Architecture

ScyllaDB 274 views 46 slides Oct 15, 2024
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

ShareChat's scaling ML Feature Store to handle 1B features/sec was just the start. Next challenge: cutting costs while keeping quality. Join Ivan & David to explore cloud cost optimization, Kubernetes waste reduction, and autoscaling Apache Flink. Perfect for #ML & #CloudDev. #P99Conf


Slide Content

A ScyllaDB Community
Feature Store Evolution Under Cost
Constraints: When Cost is Part of
the Architecture
David Malinge
Sr. Staff Software Engineer
ShareChat
Ivan Burmistrov
Principal Software Engineer

Previously on P99 Conf…
■ScyllaDB ftw
■Smart data model
■Various optimizations
■Cache locality
■…

Leadership:
“Great system! Can you
now build the one that
does the same, but 10x
cheaper? Thanks!”
Our reaction:

Cloud wants to rob you is complicated
■Billing is confusing: both AWS and GCP have ~40K SKUs
■Providers are not motivated to make it easy to understand and debug
■Easy to lose track of cost components


Step 1: ruthless cost inventory / attribution

Cost savings funnel

Cost savings funnel: Cloud traps

k8s wastage

Cloud traps: k8s wastage

Cloud traps: k8s wastage

Cloud traps: k8s wastage

Solution: Advanced k8s scheduler
■Open Source:



■Cloud specific:




■Commercial / Multi Cloud

“Cloud Tax”

Cloud traps: cross-AZ network egress (NIZE) cost
■3 zones
■Replication Factor = 3
(data gets copied to
all zones)

Naive reads: no settings in the client driver

Smarter reads: use token-aware routing

Smartest reads: + zone-aware routing

Smartest writes: everything we learned

Smartest+ writes: 2 zones

Smartest++ writes: 2 datacenters

Smartest+++ writes: a different cloud

Cost savings funnel: Compute optimization

ShareChat’s Feature Store
Case Study

Feature store: Architecture
More on compute: The Harsh Reality of Building a Realtime ML Feature Store (slides), QCon 2024

Feature Store

#1 Database scalability
■ScyllaDB clusters
■Flat-ish cost, no autoscaling
■Why flat-ish? Not autoscalable != not scalable
●Don’t scale for yearly peak!
●Kudos to Scylla support ??????
■True autoscaling incoming?
●Tablets in ScyllaDB 6.0

#2 Different workloads
■Read/Write pattern and how they affect each other

LIST ALL SERVICE_LEVELS;
service_level | timeout | workload_type | shares
---------------+---------+---------------+--------
serving | null | null | 1000
computing | null | null | 100
manual | null | null | 50
Read latency over time

#2 Different workloads

■Solutions
●Overscale Scylla cluster $$$
●Dual-datacenter - the classic advice: “isolate read and writes”
●usually leads to underutilized resources $$
●What if we could choose the latency level we want for reads vs writes?
■Scylla Workload prioritization $

#2 Different workloads

■Scylla only: use Workload Prioritization (WP)
●WP is great when different workloads are clear and consistent






■Other example: compaction strategy, see Scaling ShareChat’s ML Feature Store
LIST ALL SERVICE_LEVELS;
service_level | timeout | workload_type | shares
---------------+---------+---------------+--------
serving | null | null | 1000
computing | null | null | 100
manual | null | null | 50

Feature Serving Layer

Feature store costs: serving features
■Distributed GRPC service deployed on K8s
■Vast majority of the cost in compute
●Note: network costs can also be a problem
■For compute heavy K8s deployments, once you are past the the obvious pod
autoscaling (HPA/VPA), time to look into optimizations.
■Don’t optimize blindly: SETUP CONTINUOUS PROFILING!

Feature store costs: serving features
■The usual suspect: ser/de
■95%+ of our requests are cached
■Hottest path: request multiple entities => cache hits => merge => return
●aka: read from cache, deserialize, merge, serialize

Naive solution

Feature store costs: serving features
■The usual suspect: ser/de
■95%+ of our requests are cached
■Hottest path: request multiple entities => cache hits => merge => return
●aka: read from cache, deserialize, merge, serialize
■Biggest Win: partial deserialization!

Proto serde trick: encoding primer
■A message is series of key-value pairs (“record”)
●the key is the field number
●the value is encoded depending on type ⇒
●repeated fields emit one record per element
●maps are a repeated field under the hood
map<T, U> ⇔repeated entry<T,U>

{
“name” : “hello”,
“inner”: [{“val”: 1}, {“val”: 2}]
}

// protoscope output
3: {7: 1} // inner: {val: 1}
3: {7: 2} // inner: {val: 2}
source: https://protobuf.dev/programming-guides/encoding/#structure

Proto serde trick: encoding primer
message InnerMessage {
int32 val = 7;
}

message OuterMessage {
string name = 1;
repeated InnerMessage inner = 2;
}




// Example OuterMessage (as JSON)
{
“name” : “hello”,
“inner”: [{“val”: 1}, {“val”: 2}]
}


// protoscope output, 3 records:
1: {"hello"} // name: “hello”
2: {7: 1} // inner: {val: 1}
2: {7: 2} // inner: {val: 2}

Proto serde trick: how?
■Trick enabler:
●embedded messages and bytes
representation are the same (len-delimited)
■Why is this good?
●both are interchangeable!
●bytes deserialization is just a copy
■Ok.. but what do I do with bytes..?
●operations on maps and repeated fields!
●append, delete, swap, insert..
●All done without deserializing elements!
source: https://protobuf.dev/programming-guides/encoding/#structure

Proto serde trick: back to example
message InnerMessage {
int32 val = 8;
}

message OuterMessage {
string name = 1;
repeated InnerMessage inner = 2;
}






// OuterMessage can also
// be deserialized as:
message LazyOuterMessage {
bytes name = 1;
repeated bytes inner = 2;
}

Proto serde trick: back to cache

■Our cache stores serialized protos Features for a given entity
■Our response is basically a collection of Features for each entity requested

Proto serde trick: simple bench
■Repo: https://github.com/david-sharechat/lazy-proto
■Benchmarking “naive” merge vs lazy merge
●Appending 2 serialized protos with map and repeated field

Proto serde trick: simple bench

│ Naive │ Lazy |
sec/op | 16.821m ± 3% | 2.677m ± 6% -84.09% (p=0.000 n=10)

B/op | 8.479Mi ± 0% | 5.993Mi ± 0% -29.31% (p=0.000 n=10)

allocs/op | 141.16k ± 0% | 40.08k ± 0% -71.61% (p=0.000 n=10)

●6x faster with ⅓ of allocs!

Conclusion

Learnings

Before we part…
■We could not cover everything, please reach out if interested in these topics
●Computing windowed counter features (slashed costs by 5x)
●Flink autoscaling & state recovery (2x cost lever)
●Specialized Golang cache library (will likely open-source soon)
●… anything cheap and performant ;)

A ScyllaDB Community
Thanks for watching!
ShareChat

A ScyllaDB Community
Thanks for watching!
ShareChat

Feature Compute

Feature store costs: computing features
■Multiple Apache Flink jobs
■Problem: Flat cost, scaled for peak
■Obvious solution: autoscaling



Before CPU util chart
placeholder
After CPU util chart
placeholder
More on autoscaling Flink jobs: The Harsh Reality of Building a Realtime ML Feature Store (slides), QCon 2024
Tags