Apache Cassandra makes it possible to execute millions of operations per second in scalable fashion. Harnessing the power of C* leaves many developers pondering about the following:
- Is my data model appropriate and not going to end up as wide partition(s) causing heap pressure and other issues?
-...
Apache Cassandra makes it possible to execute millions of operations per second in scalable fashion. Harnessing the power of C* leaves many developers pondering about the following:
- Is my data model appropriate and not going to end up as wide partition(s) causing heap pressure and other issues?
- How do I tune my connection pool configuration? What are the optimal settings for my environment ?
- What is my C* cluster capacity in terms of number of IOPs for a given 95th and 99th latency?
- How do I perf-test my data access layer?
In this talk, Vinay Chella, Cloud Data Architect @ Netflix, will share open source tools, techniques and platform(NDBench) that Netflix uses to perf-test their C* fleet with simulations millions of operations per second.
About the Speaker
Vinay Chella Cloud Data Architect, NETFLIX Inc
About Vinay Chella, Cloud Data Architect at Netflix having deeper understanding of Cassandra and other RDBMS. As an Engineer and Architect, working extensively on data modeling, performance tuning and guiding best practices of various persistence stores. Helping various teams @ Netflix building next generation data access layers.
Size: 23.03 MB
Language: en
Added: Oct 02, 2016
Slides: 50 pages
Slide Content
Honest Performance Testing with NDBench Netflix Data Benchmark
Vinay Chella Cloud Data Architect Cassandra MVP Cloud Database Engineering @ Netflix
CDE, Cloud Database Engineering Providing data stores as a service Cassandra Dynomite Elastic s earch and RDS Who are we?
98% of streaming data is stored in Cassandra Data ranges from customer details to Viewing history / streaming bookmarks t o billing and payment Cassandra @ Netflix
Agenda Background Why NDBench? Architecture Usage Achievements Roadmap Take away
Perf testing persistence layer?
Capacity in my existing fleet?
Why? Why not already existing Perf testing tools?
What is NDBench? Netflix Data Benchmark (NDBench) is a P luggable cloud-enabled benchmarking tool that can be used across any data store system .
Side by Side comparison
Different d river/ software versions
Different i nstance types
Dynamically tunable configurations
Varying data models
Pluggable Patterns & Loads
Different Client APIs
Netflix homegrown Well integrated with netflix OSS infrastructure
Architecture
What is Pluggable? Load Patterns Load tests
Load Patterns Random Sliding Window
Load Tests Cassandra-JavaDriver Elastic Search Dynomite Cassandra-Astyanax In-Memory Test
What can be configured? ndbench.driver.numKeys - 1000000 ndbench.driver.dataSize - 200 bytes ndbench.driver.numWriters - 1 ndbench.driver.numReaders - 1 ndbench.driver.writeRateLimit - 100 ndbench.driver.readRateLimit - 200 ndbench.driver.useVariableDataSize - false Many more….
Dynamic Script
How to use it REST API UI
REST API Initialization Initialize: /pappy/driver/init/{DriverName} Init Script: /pappy/driver/initfromscript Perf API Start writes: /pappy/driver/startWrites Start reads: /pappy/driver/startReads Stop everything: /pappy/driver/stop Sanity check Verify Read: /pappy/driver/readSingle/key Verify Write: /pappy/driver/writeSingle/key Verify Read: /pappy/driver/readSingle/key Backfill Data Backfill: /pappy/driver/startDataFill DataBackfill Async: /pappy/driver/startDataFillAsync Status API /pappy/driver/{getRead/Write}Status /pappy/driver/getserverstatus
NDBench Demo...
NDBench @ Netflix As a Benchmarking Tool Integration Tests Deployment Validation
NDBench’s Achievements @ Netflix
N+1
C* 1.2 → C* 2.0, C*2.0 → C* 2.1
C* 2.0 vs C* 2.1 (Reads - Thrift)
C* 2.0 vs C* 2.1 (Writes - Thrift)
CentOS ---> Trusty
CentOS -> Trusty Migration
LCS on CentOS vs Trusty (writes)
LCS on CentOS vs Trusty (Reads)
Java 7 → Java 8
C* on Java 7 vs Java 8 (Writes)
C* on Java 7 vs Java 8 (Reads)
C* AMI Certification Pipeline
Dynomite benchmarking Generating Millions of Ops/ sec
Dynomite benchmarking
Roadmap Performance profile management Automated metrics analysis Dynamic load generation based on destination schema
https://github.com/Netflix/ndbench
Take away “Test the honesty of your data models, persistence layers in Cloud ecosystem using NDBench”