Using Change Point Detection to Fight Noisy Benchmark Results by Matt Fleming

ScyllaDB 273 views 23 slides Oct 16, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Discovering performance regressions in modern systems is tough due to inevitable noise. Change Point Detection (CPD) algorithms are gaining traction for tackling this challenge. This talk covers how CPD works and shares examples of real regressions found in open source projects. #DevOps #CPD


Slide Content

A ScyllaDB Community
Matt Fleming
Co-founder at Nyrkiö / Systems Engineer at Cloudflare
Using Change Point Detection to
Fight Noisy Benchmark Results

■Former Linux kernel maintainer
■Focused on performance of OS, DB, and dist sys
■Co-authored papers on Change Point Detection
and testing distributed systems
■Enjoys running
<Matt Fleming> (he/him)

Co-founder at Nyrkiö

What is noise?

What is noise?
■Noise means “high variability”
■Large range of possible benchmark values without changing the system

What is noise?

What is noise?
This is noise

What is noise?
■Noise comes from indeterminism
■All modern systems have indeterminism
■Usually because of peak performance

Examples of noise

Examples of noise

All Systems Contain Noise

Fighting Noise

Fighting noise
■Eyeball it
●Possible to spend hours every month staring at graphs
●Laborious and time consuming
●Difficult to onboard new people because of institutional knowledge

Fighting noise
■Eyeball it
●Possible to spend hours every month staring at graphs
●Laborious and time consuming
●Difficult to onboard new people because of institutional knowledge
■Thresholds
●Hard to calculate
●Hard to maintain (doesn’t scale)

Change Point Detection

■CPD has existed for decades
■Used in many areas
●Climate change
●Fraud detection
●Biomedical research
CPD History

CPD Algorithm Steps
search() cmp() filter()

■Signal-processing-library (MongoDB)
■Perfolizer (Used in .NET)
■Ruptures
■Hunter
Open Source CPD libraries

Example

Linux Kernel syscall

Drawbacks
■CPD needs multiple data points
■There’s a “delay” in when change points are identified
■Computational complexity can be prohibitive

Resources
■A Select Review of offline change point detection methods
■A Survey of Methods for time series change point detection
■MongoDB paper
■Hunter paper

Thank you! Let’s connect.
Matt Fleming
[email protected]
@fleming_matt
Tags