Revolutionizing Sleep: Scaling IoT Telemetry to 30+ Billion Daily Events by Deepika Sikri & Vikas Talegaonkar

ScyllaDB 115 views 17 slides Mar 07, 2025

Slide 1 of 17

About This Presentation

Sleep Number processes 30B+ sensor events daily to enhance sleep technology. This talk details a journey in scaling an IoT telemetry pipeline to handle exponential data growth, real-time processing, and high reliability—advancing sleep science for millions.

Size: 1.63 MB

Language: en

Added: Mar 07, 2025

Slides: 17 pages

Slide Content

A ScyllaDB Community
Revolutionizing Sleep:
Scaling IoT Telemetry to
30+ Billion Daily Events
Deepika Sikri
Head of Engineering
Vikas Talegaonkar
Director of Engineering

Deepika Sikri (She/Her)
Head of Engineering

Driving innovative solutions and
scalable technologies to enhance
sleep experiences

Vikas Talegaonkar (He/His)
Director of Engineering

A Leader with experience in building,
scaling, and transforming from vision
to execution

Introduction
In a world powered by big data, Sleep Number is transforming sleep
technology—processing over 30 billion sensor events daily to improve
the lives of millions.

This talk explores our journey to scale an IoT telemetry pipeline to
meet the challenges of exponential data growth, a rapidly expanding
user base, and real-time processing demands, all while ensuring
unwavering reliability.

Discover how we turned data into dreams.

Scale or Fail

The Challenge
Scaling Sleep Number’s IoT cloud infrastructure wasn’t just
about handling more data but about rewriting the playbook
for cost-eﬃcient, resilient growth. As millions of smart
beds come online, the stakes couldn’t have been higher:

■Data Explosion: Billions of sensor events daily,
demanding real-time insights.
■User Surge: An ever-expanding customer base expecting
seamless performance.
■Resilience Under Pressure: A pipeline that could
withstand failures without skipping a beat.

The objective was clear: SCALEX—scale smarter, not
harder. The goal wasn’t just to keep up and stay ahead,
balancing cost and performance while building for the
future.

Identifying the Critical Bottleneck
Our telemetry datastore was a ‘pet’—high-maintenance
and ill-suited for scaling to meet evolving demands.

■Frequent Incidents
■Complex Debugging
■Escalating Costs
■High Operational Overhead
■Dependence on Specialized Skills
■Growing Big Data
■Data Strategy Evolution

Path Forward

Art of Datastore Selection
Telemetry datastore from ‘pet’— ‘cattle’

■Horizontal & Dynamic Scaling
■Managed/Serverless
■Multi-Tier Storage (Hot, Warm, and Cold)
■Data Strategy Evolution – Data Lakes
■CAP theorem

Scaling for Immunity Against Failures
Proactive Protection (Like Vaccines)
■Sharding and multi-write
■Redundancy

Reacting to External Events (Like Antigens)
■ Event-Driven Architecture

On-Demand Protection (like boosted Immune
System)
■Elastic Scaling
■Data Replication

Proactive System Design
Building for Resilience

Scaling Write Eﬃciency and Consistency
■Data Partitioning and User Stickiness
We implement a user ID-based sharding strategy to distribute data across multiple clusters. Each
shard is responsible for a speciﬁc subset of data, reducing the likelihood of write conﬂicts. User
stickiness is ensured through a consistent hashing function:
shard_to_write = user_id % shard_count This formula guarantees that a user's data always goes to
the same shard, maintaining consistency and improving read/write performance.
■Write Distribution
Write operations are routed to the appropriate shard using the sharding key (user ID). This ensures
that writes for a particular data subset always go to the same cluster, minimizing cross-cluster
operations and potential conﬂicts.

Managing Increasing Demand of Data Streaming
The Power of Asynchronous Data Transfer in IoT Pipelines Using Kafka
Asynchronous data transfer in IoT pipelines using Kafka is a crucial element for eﬃciency. It enables
multiple devices to transmit data concurrently, reducing latency and ensuring timely data processing.
Kafka's robust message queuing system effectively manages high data volumes, maintaining system
responsiveness and guaranteeing reliable delivery. This approach enhances scalability and improves
the resilience of IoT systems, making it an ideal solution for meeting the growing demands of data
streaming in dynamic environments.

Read Replicas and Resource Optimization
■Read Replicas and Scalability Each cluster can have multiple read replicas to handle read traﬃc, improving overall
system performance and scalability. This allows us to scale read capacity independently of write capacity.
■Divide and Conquer Strategies To optimize resource allocation and workload distribution, we employ the following
formulas:
■Partitions per Shard: partitions_per_shard=
partition_count/shard_countpartitions_per_shard=partition_count/shard_cou
■Consumers per Shard: consumers_per_shard=
consumer_count/shard_countconsumers_per_shard=consumer_count/shar
■_countPartitions per Consumer:partitions_per_consumer=
partition_count/consumer_countpartitions_per_consumer=partition_count/consumer_countThese calculations
help us balance the workload across shards and ensure eﬃcient utilization of resources.

Sharding & Partitioning in Action

Smooth Transition
■Multi-Stage Development and Release
■Gradual Multi-phase release, followed ﬁnal cutover (the Big Bang)
■Dual Pipeline
■Smooth transition by running old and new systems in parallel, enabling gradual migration and risk
mitigation.
■Feature Flag
■Enables dynamic control over feature activation, allowing gradual rollouts, testing, and risk-free rollbacks.
■Traﬃc Controller
■Gateway Traﬃc Controller
■Incrementally shifting traﬃc from the old to the new system at controlled rates
■Move traﬃc gradually from old to new pipeline successively from 1, 5, 10, 25, 40, 80, 90 & 100 percent.
■Conﬁdence Building

Conclusion
Our journey in scaling Sleep Number's IoT infrastructure to process over 30 billion
sensor events daily has been transformative. We've created a highly scalable,
reliable, and cost-effective solution by transitioning our data store and implementing
innovative strategies such as eﬃcient data aggregation, asynchronous data transfer
with event driven architecture, smart sharding, and a dynamic consumer architecture.

Stay in Touch
Deepika Sikri
[email protected]
Vikas Talegaonkar
[email protected]

Revolutionizing Sleep: Scaling IoT Telemetry to 30+ Billion Daily Events by Deepika Sikri & Vikas Talegaonkar

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Revolutionizing Sleep: Scaling IoT Telemetry to 30+ Billion Daily Events by Deepika Sikri &amp; Vikas Talegaonkar

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

Revolutionizing Sleep: Scaling IoT Telemetry to 30+ Billion Daily Events by Deepika Sikri & Vikas Talegaonkar