Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throughput Data Pipelines
ScyllaDB
107 views
26 slides
Jul 02, 2024
Slide 1 of 26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
About This Presentation
In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data ingestion workflows. Instead, we showcase the power of simple yet clever methods that can uncover hidden performance limitations.
Attendees will discover unc...
In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data ingestion workflows. Instead, we showcase the power of simple yet clever methods that can uncover hidden performance limitations.
Attendees will discover unconventional techniques, including clever logging, targeted instrumentation, and specialized metrics, to pinpoint bottlenecks accurately. Real-world use cases will be presented to demonstrate the effectiveness of these methods. By the end of the session, attendees will be equipped with alternative approaches to identify bottlenecks and optimize their low-latency data ingestion workflows for high throughput.
Size: 2.31 MB
Language: en
Added: Jul 02, 2024
Slides: 26 pages
Slide Content
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throughput Data Pipelines Zamir Paltiel Head of Engineering at Hyperspace
Zamir Paltiel Head of Engineering at Hyperspace Performance enthusiast F ocused on search performance in recent years Love building things from scratch Father to 4 children
Intro - performance optimization process How developers measure their pipelines today The issues one might miss Case study at Hyperspace A new approach to find bottlenecks Agenda
Intro - performance optimization process
How to increase speed of your code Run test Fix/Optimize Analyze Measure Repeat
Pareto Principle
T he bottleneck syndrome
What should I measure? CPU time Cycles Context switches Cache misses Memory Disk Network Mutexes
How devs measure code speed today
CPU Profilers CPU Profiler shows what functions consume what percent of CPU time. Event based profilers - use built-in hooks that the runtime framework provides Applicable to managed code like Java, Go, C# Instrumentation profilers - add code to the program that collects required info Inject in runtime or d uring compilation phase Adds overhead to code execution that sometimes creates distortion of reality Sampling profilers - collects call stack of the process in timely intervals Less intrusive - the code runs as usual Very good approximation of the amount of CPU time each function took in percentage No knowledge about the actual time an operation took - missing off cpu time
Flame Graph
T he issues one might miss
Off CPU Time
Case study at Hyperspace
Sample Ingestion Pipeline
Let’s profile it!
Flame Graph Analysis JSON parsing seems like the most significant part - can be replaced by binary serialization Challenge - Using Top command we see that CPU utilization is 2/8 CPUs - seems like there is a bottleneck that prevents us from reaching maximum ingestion speed
A new approach to find bottlenecks
Finding the real bottleneck
What we found If you have an off CPU bottleneck - the more load you add, it will take higher percentage of the time. Redundant lock - turns that FPGA SDK is thread safe. Usage of a single file descriptor to communicate with PCI interface Underutilizing the speed of the PCI bus - no usage of multithreading Not utilizing the 4 channels that exists in the PCI
Increase of ~30% in ingestion speed
Summary There are important factors to measure apart from CPU time We reviewed a new methodology to find off-cpu bottlenecks This new process yielded significant improvement in Hyperspace ingestion speed