Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
ScyllaDB
192 views
31 slides
Mar 04, 2025
Slide 1 of 31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
About This Presentation
Freshworks migrated from Cassandra to ScyllaDB to handle growing audit log data efficiently. Cassandra required frequent scaling, complex repairs, and had non-linear scaling. ScyllaDB reduced costs with fewer machines and improved operations. Using Zero Downtime Migration (ZDM), they bulk-migrated d...
Freshworks migrated from Cassandra to ScyllaDB to handle growing audit log data efficiently. Cassandra required frequent scaling, complex repairs, and had non-linear scaling. ScyllaDB reduced costs with fewer machines and improved operations. Using Zero Downtime Migration (ZDM), they bulk-migrated data, performed dual writes, and validated consistency.
Size: 2.39 MB
Language: en
Added: Mar 04, 2025
Slides: 31 pages
Slide Content
A ScyllaDB Community
Freshworks Migration Journey
from Cassandra to ScyllaDB
Premkumar Patturaj
Senior Manager
Prem Kumar Patturaj
■Senior Engineering Manager at Freshworks with 15 years of
IT experience, with 10 years at Freshworks.
■Expertise in Relational and NoSQL databases, specializing
in designing and optimizing scalable, high-performance
systems.
■Experienced in solving complex technical challenges,
mentoring teams, and fostering a culture of continuous learning.
■Committed to engineering excellence, leveraging best
practices to create efficient and reliable software solutions.
Freddy AI InsightsFreddy AI Copilot
Integrate & Extend
Developer toolsMarketplace
Unify
Data Analytics Admin Security
Manage & Secure
Employee Experience Customer ExperienceSOLUTIONS
Freshservice Customer
Service Suite
FreshdeskFreshchatFreshsales FreshmarketerFreshservice
for Business Teams
Device42
PLATFORM
AI
Freddy AI
for Customer Service, Sales,
Marketing, IT & Developers
for Business Leaders
Freshworks Neo
Freddy AI Self Service
for Customers & Employees
■Background and Motivation
■Goals
■Approach
■Challenges
■Optimization
Presentation Agenda
We manage all databases in Freshworks
■Availability
■Reliability
■Monitoring
■Recovery
■Keep Current
■RDS MySQL, Postgres; Redis; MongoDB; Kafka; ClickHouse; …
■A mix of self-hosted and cloud solutions
■Identify the best balance for Freshworks
■Uber goal for Dataverse
■Application teams agnostic to the underlying database
■eg, use Cassandra client but backend is ScyllaDB
Background
Hypertrail
■Hypertrail aims to provide a scalable, cost-effective, and fault-tolerant timeline solution that enables
products to capture and query activity and audit logs for any custom entity, with flexible filtering
capabilities to meet specific business needs
Workflow Automator
■Workflows can be configured for project and task creation and associating them to tickets/changes.
Users can configure the workflow using any condition they want for tickets/changes, This is currently used
for alerts module right now.
Hypertrail
Cassandra Overview
Cassandra Cluster Overview:
■24TB of unreplicated data.
■Spread across 56 Cassandra nodes.
Challenges in Cassandra:
■Repair & Consistency Issues
■High Tailend Latencies
■Backup & Restore Overheads
■Manual Toil with more nodes
Performance Benchmark
Motivation
ScyllaDB Advantages Over Cassandra:
Hardware Efficiency:
■Few large machines replace many small ones.
Operational Simplicity:
■Reduced overhead for repairs, compactions, and scaling.
Cost Reduction:
■Lower infrastructure costs due to fewer machines.
Goals
Goals
Zero Downtime:
■Ensure the application remains fully operational during migration.
Low Latency Overhead:
■Minimize the impact on application latency during the process.
Accuracy:
■Validate the migrated data for completeness and correctness.
Efficiency:
■Perform the migration in the shortest duration possible to reduce infrastructure costs.
■Complete migration and validations in a time and cost-efficient manner.
Migration Approach
Migration Approach
Historical Data Migration:
■Bulk migration of existing data from Cassandra to ScyllaDB cluster.
Dual Writes:
■Writing data to both Cassandra and ScyllaDB clusters while the migration is
in progress using ZDM(Zero Downtime Migration) proxy
Data Validation:
■Validating data consistency between the source and destination using CDM
(Cassandra Data Migrator)
Historical Data Migration
Evaluated options for bulk data migration
■Datastax CDM Tool
■Stream SSTables via Tools
■Load and Stream using nodetool
Advantages of Load and Stream
■Fastest approach
■Minimal impact on ScyllaDB cluster.
Dual Writes
■ZDM Proxy performed dual-writes, handling all use-cases required for the migration process.
■Latency added by ZDM Proxy was benchmarked under 10 milliseconds,
Infrastructure Setup
Hosted on EC2 c6.2xlarge instances with 3 replicas distributed across availability zones (AZs).
■Prometheus Metrics:
■Exported by ZDM Proxy by default.
■Node exporter service ran alongside ZDM to monitor system-level bottlenecks.
ZDM Proxy
Reads from Source Only:
■Used during the initial migration phase.
Async Reads to Target:
■Enabled after historical data migration and validation.
■Allowed performance measurement of ScyllaDB before switching the traffic.
Migration Workflow:
■ZDM Proxy initially operated with reads coming from the source only.
■After completing bulk data migration and validation, reconfigured ZDM Proxy
to async read from the target.
■Measured ScyllaDB performance before fully transitioning application traffic.
Data Validation
CDM for Data Validation
■Validating terabytes of data is time-intensive.
■Optimized validation to reduce time by 80%
Validation Steps
■CDM reads from the source in bulk.
■Compares corresponding data in the target cluster.
■Repeats for the entire partition range.
Tuning CDM Properties:
■Enabled spark.cdm.autocorrect.missing
spark.cdm.autocorrect.mismatch
■Bridges gaps in data consistency automatically.
Challenges
Challenges
Large Partition
■CDM migrator processes large partitions by loading entire slices into memory - OOM Error
Large-Scale Validation:
■Validating over 20TB of unreplicated data estimated to take weeks.
■CDM jobs scanned partitions, retrieving rows individually.
■High I/O latency due to individual select operations for each row.
Optimization
Optimization
Large Partition
■Split partition range into smaller chunks
■Controls the amount of data loaded into memory for each slice
Large-Scale Validation
■Adopted range-based reads.
■Bypassed value validation by only checking key presence.
Range-Based Reads from Target
Customized CDM validation
Optimization Outcome
■Reduced validation times by over 80%, ensuring efficiency for large-scale data validations.
■Enhanced scalability and practicality for production environments.
■Achieved significant cost savings, particularly in infrastructure expenses.
■Enabled faster and more frequent validation cycles, ensuring data accuracy and consistency.
Future Usecases
■BLOB Store
■UCR
■DynamoDB usecases