Evolving Atlassian Confluence Cloud for Scale, Reliability, and Performance by Bhakti Mehta

ScyllaDB 72 views 25 slides Mar 05, 2025
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

Explore how Confluence Cloud scaled to handle billions of requests while ensuring performance & reliability. Learn about microservice sharding, dependency scaling, failure isolation, and optimizing metrics for real customer impact.


Slide Content

A ScyllaDB Community
Evolving Confluence Cloud for
Scale, Reliability and
Performance
Bhakti Mehta
Senior Chief Architect
Confluence Cloud

Bhakti Mehta
■Senior Chief Architect Confluence Cloud.
■Working at Atlassian for past 8.5 years
■Author of two books and regular conference speaker
■Past experiences:
■Platform lead at BlueJeans Network
■Sun Microsystems/Oracle for 13 years

Agenda

Confluence Cloud architecture

Architecting for Scale

Reliability at Scale

Performance
Balancing Cost with Reliability and
Scale

Confluence Cloud at 30000 ft
Forked codebase
from on-prem
offering
Multi-tenant
environment
13 region
deployments
Billions of requests
per day with low
latency
requirements

Initial high level
architecture

Architecting for Scale

Architecting for Scale
Challenges we faced
■Noisy neighbor issues
■Resource starvation, high CPU utilization on Databases
■Only vertically scalable solution initially
■Dependencies bringing down our services
■Scaling challenges during peak traffic

Where are we now?

Support for 150K users on a single
tenant

Architecting for Scale
Made the following changes
■Transitioned from Amazon RDS for Postgres to Amazon Aurora and
leverage read replicas
■Decomposed a few services to Dynamo DB for low latency
microservices
■Revisited scaling policies to handle peak traffic

Made the following changes
■Feature gates for all code changes
■Resiliency for failures from dependencies via timeouts and circuit
breakers

Architecting for Scale

Best practices for
Scale

Architecture reviews for newer features and identify bottlenecks
earlier
Constant monitoring of critical metrics such as tenant placement,
CPU utilization, DB connections
Load testing for features to address bottlenecks before
release to production

Reliability at Scale

Reliability at Scale
How we address?
■Public SLA for core experiences with 99.95 % reliability
■Budget is 21 min (downtime + experience failures per tenant)
■Focussing on end to end customer experience and tracking failures

Best practices for
Resiliency

Focus on early detection and
monitoring
Clear dashboards to
identify impact
Renderinging metrics to be
insightful of customer pain
Focus on daily errors
proactively

Performance

Performance
How do we architect for performance?
■Focus on Time to Visually Complete (TTVC metric)
- metric for measuring the performance of web page loads, page
transitions and interactions.
■How long it takes for everything to on page to fully render
without visually changing?
■We have detailed dashboards to measure and quantify impact

Performance
How do we architect for performance?
■Cumulative Layout Shift (CLS) a metric used to measure the
visual stability of a webpage.
■It quantifies the largest burst of unexpected layout shifts that
occur during the entire lifecycle of a page.
■These shifts can negatively impact user experience by causing
elements to move unexpectedly, leading to a jarring browsing
experience.

Performance
How do we architect for performance?
■Reducing JS bundle size to reduce time to download (network),
evaluate (CPU) and memory consumption
■React 18 and SSR streaming to push rendering start time close
to request start time.
■Parallel fetching of macros along with Content
■Prioritization score for which items to pick based on Traffic
percentage, view port size and TTVC
■Fixing CLS for some elements which helped in TTVC

Balancing Cost with Reliability and Scale
Special projects aimed
at cost reduction
Sharing successes across teams to
adopt similar practices

Metrics & Measurement
With attribution to the right
microservices

Stay in Touch
Your Bhakti Mehta
[email protected]
bhakti_mehta
www.linkedin.com/in/bhaktihmehta
Tags