Evolving Atlassian Confluence Cloud for Scale, Reliability, and Performance by Bhakti Mehta

ScyllaDB 72 views 25 slides Mar 05, 2025

Slide 1 of 25

About This Presentation

Explore how Confluence Cloud scaled to handle billions of requests while ensuring performance & reliability. Learn about microservice sharding, dependency scaling, failure isolation, and optimizing metrics for real customer impact.

Size: 1.58 MB

Language: en

Added: Mar 05, 2025

Slides: 25 pages

Slide Content

A ScyllaDB Community
Evolving Conﬂuence Cloud for
Scale, Reliability and
Performance
Bhakti Mehta
Senior Chief Architect
Conﬂuence Cloud

Bhakti Mehta
■Senior Chief Architect Conﬂuence Cloud.
■Working at Atlassian for past 8.5 years
■Author of two books and regular conference speaker
■Past experiences:
■Platform lead at BlueJeans Network
■Sun Microsystems/Oracle for 13 years

Agenda

Conﬂuence Cloud architecture

Architecting for Scale

Reliability at Scale

Performance
Balancing Cost with Reliability and
Scale

Conﬂuence Cloud at 30000 ft
Forked codebase
from on-prem
offering
Multi-tenant
environment
13 region
deployments
Billions of requests
per day with low
latency
requirements

Initial high level
architecture

Architecting for Scale

Architecting for Scale
Challenges we faced
■Noisy neighbor issues
■Resource starvation, high CPU utilization on Databases
■Only vertically scalable solution initially
■Dependencies bringing down our services
■Scaling challenges during peak traﬃc

Where are we now?

Support for 150K users on a single
tenant

Architecting for Scale
Made the following changes
■Transitioned from Amazon RDS for Postgres to Amazon Aurora and
leverage read replicas
■Decomposed a few services to Dynamo DB for low latency
microservices
■Revisited scaling policies to handle peak traﬃc

Made the following changes
■Feature gates for all code changes
■Resiliency for failures from dependencies via timeouts and circuit
breakers

Architecting for Scale

Best practices for
Scale

Architecture reviews for newer features and identify bottlenecks
earlier
Constant monitoring of critical metrics such as tenant placement,
CPU utilization, DB connections
Load testing for features to address bottlenecks before
release to production

Reliability at Scale

Reliability at Scale
How we address?
■Public SLA for core experiences with 99.95 % reliability
■Budget is 21 min (downtime + experience failures per tenant)
■Focussing on end to end customer experience and tracking failures

Best practices for
Resiliency

Focus on early detection and
monitoring
Clear dashboards to
identify impact
Renderinging metrics to be
insightful of customer pain
Focus on daily errors
proactively

Performance

Performance
How do we architect for performance?
■Focus on Time to Visually Complete (TTVC metric)
- metric for measuring the performance of web page loads, page
transitions and interactions.
■How long it takes for everything to on page to fully render
without visually changing?
■We have detailed dashboards to measure and quantify impact

Performance
How do we architect for performance?
■Cumulative Layout Shift (CLS) a metric used to measure the
visual stability of a webpage.
■It quantiﬁes the largest burst of unexpected layout shifts that
occur during the entire lifecycle of a page.
■These shifts can negatively impact user experience by causing
elements to move unexpectedly, leading to a jarring browsing
experience.

Performance
How do we architect for performance?
■Reducing JS bundle size to reduce time to download (network),
evaluate (CPU) and memory consumption
■React 18 and SSR streaming to push rendering start time close
to request start time.
■Parallel fetching of macros along with Content
■Prioritization score for which items to pick based on Traﬃc
percentage, view port size and TTVC
■Fixing CLS for some elements which helped in TTVC

Balancing Cost with Reliability and Scale
Special projects aimed
at cost reduction
Sharing successes across teams to
adopt similar practices

Metrics & Measurement
With attribution to the right
microservices

Stay in Touch
Your Bhakti Mehta
[email protected]
bhakti_mehta
www.linkedin.com/in/bhaktihmehta

Evolving Atlassian Confluence Cloud for Scale, Reliability, and Performance by Bhakti Mehta

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Evolving Atlassian Confluence Cloud for Scale, Reliability, and Performance by Bhakti Mehta

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 4

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 25

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

FM I Chapter 3: Time Value of Money .ppt

Financial Management I Chapter Three.pdf

Md. Sirajuddwla Vs. The State and Ors. [2016] 1LNJ(HCD)177.

business finance-2nd quarter week 1.pptx

bank-reconciliation-2-240226133520-7f66bb8a.pptx

Create_a_Portfolio_Website_Showcasing_Projects_Skills_and_Contact_Info_Presentation.pdf.pdf