Circuit Breaker Pattern

vkodati 1,208 views 15 slides Jul 14, 2016
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Circuit Breaker Pattern with problem statement, solution landscape and demo using microservices


Slide Content

FRONTLINE SYSTEMS Circuit Breaker Pattern Vikash Kodati 13 th July 2016

AGENDA 4/6/2016 T-Mobile Confidential 2 Problem Statement Circuit Breaker Definition Solution Landscape Live Demo Q&A

CHARACTERISTICS OF MICROSERVICE 6/13/2016 T-Mobile Confidential 3 Componentization via services Organized around business capabilities Products not projects Smart endpoints and dump pipes Decentralized Data Management Infrastructure Automation Design for failure

DESIGN FOR FAILURE 6/13/2016 T-Mobile Confidential 4 Typical first year for a new cluster: ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover) ~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back) ~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours) ~1 network rewiring (rolling ~5% of machines down over 2-day span) ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) ~5 racks go wonky (40-80 machines see 50% packet loss) ~8 network maintenances (4 might cause ~30-minute random connectivity losses) ~12 router reloads (takes out DNS and external vips for a couple minutes) ~3 router failures (have to immediately pull traffic for an hour) ~dozens of minor 30-second blips for dns ~1000 individual machine failures ~thousands of hard drive failures slow disks, bad memory, misconfigured machines, flaky machines, etc. Note: Data taken from Jeff Dean’s slides

PROBLEM STATEMENT 4/6/2016 T-Mobile Confidential 5 Given the types of failures that can occur, we need a Fault-Tolerant system such that it System to continues to operate in event of failure of a subset of its components System needs to be Highly Available (HA) Handles failure gracefully

SOLUTION LANDSCAPE 4/6/2016 T-Mobile Confidential 6 Development Phase Avoiding Cascading failures Circuit breaker Timeouts Retry Bulkhead Cache optimizations Avoid malicious clients Rate limiting Pre-Deploy P hase Load test A/B test Longevity Post-Deploy Phase Health check Metrics

CIRCUIT BREAKER PATTERN 4/6/2016 T-Mobile Confidential 7 If a power surge occurs in the electrical wiring, the breaker will trip. (“ On” to “Off ”) Netflix Hystrix follows circuit breaker pattern If a service’s error rate exceeds a threshold it will trip the circuit breaker and blocks the requests for a specific period of time Threshold configurable: End point taking > 1 sec to respond End point returns a 500 error End point returns a 500 error 6 times in a row

CIRCUIT BREAKER ILLUSTRATION 4/6/2016 T-Mobile Confidential 8

CIRCUIT BREAKER STATE TRANSITIONS 4/6/2016 T-Mobile Confidential 9 Closed Open Half-Open Success Trip Breaker Calls failing fast Attempt Reset Trip Breaker Reset Breaker

DEMO TOPOLOGY 4/6/2016 T-Mobile Confidential 10 Web browser Zuul (Proxy) Eureka Server Reading Service BookStore

ROLES 6/13/2016 T-Mobile Confidential 11 The pattern includes Service Discovery ( Eureka ), Circuit Breaker ( Hystrix ), Intelligent Routing & Reverse Proxy ( Zuul ) and Microservices ( Spring Cloud )

HYSTRIX DASHBOARD 4/6/2016 T-Mobile Confidential 12

HYSTRIX DASHBOARD DRILL DOWN 4/6/2016 T-Mobile Confidential 13

SUMMARY 6/13/2016 T-Mobile Confidential 14 Like a physical circuit breaker, the circuit breaker pattern allows a subsystem to fail gracefully without a complete system failure Failure is inevitable, be prepared for it Primarily used in aggregation scnearios

THANK YOU & QA 6/13/2016 T-Mobile Confidential 15 Vikash Kodati Email: [email protected] Yammer: https ://www.yammer.com/t-mobile.com/users/vikashkodati Github : https:// github.com/vikashkodati LinkedIn: /in/vikashkodati Twitter: @vikashkodati Blog: https :// tmobileusa.sharepoint.com/portals/hub/personal/vikashkodati