Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj

ScyllaDB 192 views 31 slides Mar 04, 2025
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

Freshworks migrated from Cassandra to ScyllaDB to handle growing audit log data efficiently. Cassandra required frequent scaling, complex repairs, and had non-linear scaling. ScyllaDB reduced costs with fewer machines and improved operations. Using Zero Downtime Migration (ZDM), they bulk-migrated d...


Slide Content

A ScyllaDB Community
Freshworks Migration Journey
from Cassandra to ScyllaDB
Premkumar Patturaj
Senior Manager

Prem Kumar Patturaj
■Senior Engineering Manager at Freshworks with 15 years of
IT experience, with 10 years at Freshworks.
■Expertise in Relational and NoSQL databases, specializing
in designing and optimizing scalable, high-performance
systems.
■Experienced in solving complex technical challenges,
mentoring teams, and fostering a culture of continuous learning.
■Committed to engineering excellence, leveraging best
practices to create efficient and reliable software solutions.

© 2024 Freshworks Inc. All rights reserved.
Freshworks at a glance
2010
Founded
4,500
Employees
$700M+
67,000+
Total Customers 3 Gartner Magic Quadrants
Leader in 3 Major Peer Reviews
Recognition
2024 Annual Revenue Guidance IPO September 2021
FRSH

© 2024 Freshworks Inc. All rights reserved.
Neo Platform and Freddy infuse AI across all products
Freshworks Solutions





Freddy AI InsightsFreddy AI Copilot
Integrate & Extend
Developer toolsMarketplace
Unify
Data Analytics Admin Security
Manage & Secure
Employee Experience Customer ExperienceSOLUTIONS
Freshservice Customer
Service Suite
FreshdeskFreshchatFreshsales FreshmarketerFreshservice
for Business Teams
Device42
PLATFORM
AI
Freddy AI
for Customer Service, Sales,
Marketing, IT & Developers


for Business Leaders
Freshworks Neo
Freddy AI Self Service
for Customers & Employees

■Background and Motivation
■Goals
■Approach
■Challenges
■Optimization
Presentation Agenda

We manage all databases in Freshworks
■Availability
■Reliability
■Monitoring
■Recovery
■Keep Current
■RDS MySQL, Postgres; Redis; MongoDB; Kafka; ClickHouse; …
■A mix of self-hosted and cloud solutions
■Identify the best balance for Freshworks
■Uber goal for Dataverse
■Application teams agnostic to the underlying database
■eg, use Cassandra client but backend is ScyllaDB

Dataverse

Databases at Freshworks
Database Servers Data Processed Req/s Data persistedAvailability
MySQL 1200 7.9Gb/s 1.4M 4.5 PiB 99.992
Redis 869 1GB/s 2M 550 GiB 99.991
Kafka 65 1GB/s 0.7M 420 TiB 99.99
ClickHouse 16 400Mb/s 2M 33 TiB 99.99
Memcached 72 12Mb/s 2M 257 GiB 99.99
Postgres 110 2.2Gb/s 0.22M 210 TiB 99.99
ScyllaDB 45 750Mb/s 0.05M 270 TiB 99.99
Scale

ScyllaDB at Freshworks
Clusters Nodes IOPS Storage
10 45 500k 270TB

Background and Motivation

Background
Hypertrail
■Hypertrail aims to provide a scalable, cost-effective, and fault-tolerant timeline solution that enables
products to capture and query activity and audit logs for any custom entity, with flexible filtering
capabilities to meet specific business needs
Workflow Automator
■Workflows can be configured for project and task creation and associating them to tickets/changes.
Users can configure the workflow using any condition they want for tickets/changes, This is currently used
for alerts module right now.

Hypertrail

Cassandra Overview
Cassandra Cluster Overview:
■24TB of unreplicated data.
■Spread across 56 Cassandra nodes.

Challenges in Cassandra:
■Repair & Consistency Issues
■High Tailend Latencies
■Backup & Restore Overheads
■Manual Toil with more nodes

Performance Benchmark

Motivation
ScyllaDB Advantages Over Cassandra:

Hardware Efficiency:
■Few large machines replace many small ones.
Operational Simplicity:
■Reduced overhead for repairs, compactions, and scaling.
Cost Reduction:
■Lower infrastructure costs due to fewer machines.

Goals

Goals
Zero Downtime:
■Ensure the application remains fully operational during migration.
Low Latency Overhead:
■Minimize the impact on application latency during the process.
Accuracy:
■Validate the migrated data for completeness and correctness.
Efficiency:
■Perform the migration in the shortest duration possible to reduce infrastructure costs.
■Complete migration and validations in a time and cost-efficient manner.

Migration Approach

Migration Approach
Historical Data Migration:
■Bulk migration of existing data from Cassandra to ScyllaDB cluster.
Dual Writes:
■Writing data to both Cassandra and ScyllaDB clusters while the migration is
in progress using ZDM(Zero Downtime Migration) proxy
Data Validation:
■Validating data consistency between the source and destination using CDM
(Cassandra Data Migrator)

Historical Data Migration
Evaluated options for bulk data migration
■Datastax CDM Tool
■Stream SSTables via Tools
■Load and Stream using nodetool

Advantages of Load and Stream
■Fastest approach
■Minimal impact on ScyllaDB cluster.

Dual Writes
■ZDM Proxy performed dual-writes, handling all use-cases required for the migration process.
■Latency added by ZDM Proxy was benchmarked under 10 milliseconds,

Infrastructure Setup

Hosted on EC2 c6.2xlarge instances with 3 replicas distributed across availability zones (AZs).
■Prometheus Metrics:
■Exported by ZDM Proxy by default.
■Node exporter service ran alongside ZDM to monitor system-level bottlenecks.

ZDM Proxy
Reads from Source Only:
■Used during the initial migration phase.
Async Reads to Target:
■Enabled after historical data migration and validation.
■Allowed performance measurement of ScyllaDB before switching the traffic.

Migration Workflow:
■ZDM Proxy initially operated with reads coming from the source only.
■After completing bulk data migration and validation, reconfigured ZDM Proxy
to async read from the target.
■Measured ScyllaDB performance before fully transitioning application traffic.

Data Validation
CDM for Data Validation
■Validating terabytes of data is time-intensive.
■Optimized validation to reduce time by 80%
Validation Steps
■CDM reads from the source in bulk.
■Compares corresponding data in the target cluster.
■Repeats for the entire partition range.
Tuning CDM Properties:
■Enabled spark.cdm.autocorrect.missing
spark.cdm.autocorrect.mismatch
■Bridges gaps in data consistency automatically.

Challenges

Challenges
Large Partition

■CDM migrator processes large partitions by loading entire slices into memory - OOM Error

Large-Scale Validation:

■Validating over 20TB of unreplicated data estimated to take weeks.
■CDM jobs scanned partitions, retrieving rows individually.
■High I/O latency due to individual select operations for each row.

Optimization

Optimization
Large Partition
■Split partition range into smaller chunks
■Controls the amount of data loaded into memory for each slice

Large-Scale Validation
■Adopted range-based reads.
■Bypassed value validation by only checking key presence.

Range-Based Reads from Target
Customized CDM validation

Optimization Outcome
■Reduced validation times by over 80%, ensuring efficiency for large-scale data validations.
■Enhanced scalability and practicality for production environments.
■Achieved significant cost savings, particularly in infrastructure expenses.
■Enabled faster and more frequent validation cycles, ensuring data accuracy and consistency.

Future Usecases
■BLOB Store
■UCR
■DynamoDB usecases

Thank you

Stay in Touch
Prem Kumar Patturaj
[email protected]

https://x.com/iam_prem
https://www.linkedin.com/in/prem-kumar-
patturaj-27217933/
Tags