Terraform: Tales from the Trenches

RobertFox5 337 views 81 slides Feb 22, 2019
Slide 1
Slide 1 of 81
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81

About This Presentation

Terraform is an infrastructure-as-code software solution that enables businesses to safely and predictably create, change, and improve infrastructure. It is all the rage right now in the DevOps community. In this presentation, experts from HG Data share some hands-on experiences and introduce some p...


Slide Content

Terraform: Tales from the Trenches Santa Barbara DevOps Copyright © HG Data 2019

Presenters Rob Fox CTO Sam Chapin Chief Architect Brendan Keane DevOps Engineer Chris Deutsch DevOps Architect

The Story Thus Far... Rob Fox

Background Product: https://discovery.hgdata.com ( SaaS Platform and related services) Issues Includes: Previous iteration of infrastructure as code fell way short Rigid configuration Ansible+bash hell No automation No ability to validate No visibility as to what was actually running and where Holes around secrets management Had to run on a specific version of Ubuntu Copyright © HG Data 2019

Why Terraform @ HG Data? Infrastructure as Code Immutable Infrastructure Declarative Client Only Solves a lot of our current issues Copyright © HG Data 2019

Success so far... Customer facing platform completely refactored using Terraform, Kubernetes, Vault Blue/Green Deployment tied into CI/CD Many shared modules and libraries in github created and distributed to rest of engineering Platform rock solid (easy to deploy, easy to manage and monitor) Big Data team now adopting Terraform and related tooling to manage large-scaled Spark clusters and pipeline automation/orchestration Engineers much happier and way more productive!! Copyright © HG Data 2019

What we learned along the way... ...A lot!! What follows are three mini-presentations based on our learnings and experiences to help you be successful Enjoy! Copyright © HG Data 2019

Evolution of a Terraform project Sam Chapin

me.tf :

Step 1. Experimentation (Someone allowed me to play with cool new toys) The starting state of the tf project: Single developer Toy deployment No "environment" No modularization No shared state Copyright © HG Data 2019

Step 1. Experimentation (The halcyon days) main.tf: Hierarchy: terraform apply -auto-approve Example usage: Copyright © HG Data 2019

Step 1. Experimentation - What Hurts? no sandbox isn't parameterizable hard to test non-DRY can't execute anywhere Copyright © HG Data 2019

Step 2. Reality sets in (My boss tells me I need to actually deploy it) The OLD State of the tf project: Single developer Toy deployment No "environment" No modularization No shared state The NEW State of the tf project: Single developer Toy deployment Prod/Stage/Dev deployment No "environment" Environment management No modularization No shared state Copyright © HG Data 2019

Step 2. Reality sets in - Environment via configuration prod.tfvars: Hierarchy: terraform apply \ -var-file=prod.tfvars Example usage: Copyright © HG Data 2019

Step 2. Reality sets in - Environment via symlinks conf.auto.tfvars: Hierarchy: cd prod terraform apply Example usage: Copyright © HG Data 2019

Step 2. Reality sets in - Environment via workspace conf.auto.tfvars: Hierarchy: terraform workspace new prod terraform apply Example usage: Copyright © HG Data 2019

Step 2. Reality sets in - Environment via workspace & consul If you’re lucky enough to already have consul... Copyright © HG Data 2019

Step 2. Reality sets in - What Hurts? no sandbox isn't parameterizable hard to test non-DRY can't execute anywhere Copyright © HG Data 2019

Step 3. Refactor (I panic at how much I've been copy-pasting) The OLD State of the tf project: Single developer Prod/Stage/Dev deployment Environment management (confs, copies, or workspaces) No modularization No shared state The NEW State of the tf project: Single developer Prod/Stage/Dev deployment Environment management (confs, copies, or workspaces) No modularization Modularization ( inner modules / module library ) No shared state Copyright © HG Data 2019

Step 3. Refactor - Modules Hierarchy: modules/instance/main.tf:

Step 3. Refactor - Modules Hierarchy: ./main.tf: Copyright © HG Data 2019

Step 3. Refactor - Modules Hierarchy: More on this from Brendan! Copyright © HG Data 2019

Step 3. Refactor - External Modules What if we want them to be external libs? Copyright © HG Data 2019

Step 3. Refactor - What Hurts? no sandbox isn't parameterizable hard to test non-DRY can't execute anywhere Copyright © HG Data 2019

Step 4. Make it easy (Someone is going to see the mess I made) The OLD State of the tf project: Single developer Prod/Stage/Dev deployment Environment management (confs, copies, or workspaces) Modularization (inner modules or reusable module library) No shared state The NEW State of the tf project: Single developer Multiple developers Prod/Stage/Dev deployment Environment management (confs, copies, or workspaces) Modularization (inner modules or reusable module library) No shared state Shared state Copyright © HG Data 2019

Step 4. Make it easy - Shared state across environments Hierarchy: ./main.tf: Modularize this!

Step 4. Make it easy - State locking Lock your damn state. All the standard backends support state locking. Copyright © HG Data 2019

Backend Step 4. Make it easy - State locking - A tale of two terraformers .tfstate Apply Apply Copyright © HG Data 2019

Step 4. Make it easy - What Hurts? no sandbox isn't parameterizable hard to test non-DRY can't execute anywhere Collaboration with others (PR plan workflow) CI/CD locking your state all the time (Build queue) Check out https://www.runatlantis.io/ Okay, okay... What Still Hurts? Copyright © HG Data 2019

In summary... Experiment and get familiar with your new declarative friend Solve your environment problem with convention & opinions Make your codebase small and testable with modules Ease collaboration & FUD via state sharing & locking Copyright © HG Data 2019

Terraform, Behave! Brendan Keane

What we needed Shared modules Certainty on shared code. Ability to build on top of shared code, we need a high level of certainty Copyright © HG Data 2019

What would make us more certain? Tests! But what kind of tests in this scenario? Copyright © HG Data 2019

Do what I say not what I mean Terraform does this for you with every action. resource “aws_instance” “my_server” { ami = “ami-0c32356aac847d7e8” } resource “aws_rds_instance” “my_database” { ami = “ami-0c223918e1” } Copyright © HG Data 2019

Do what I mean not what I say Terraform does not test for what you mean . subject { “my_server” } it “should be able to reach the database” do # db connection test end Copyright © HG Data 2019

Do what I say not what I mean Great for auditing known good configuration Do what I mean not what I say Great for iterating towards a good configuration Copyright © HG Data 2019

Base Case: Assert: A can ping B B A Tests more than at first sight: security groups, subnets, routing, dns… Terraform tells me everything is as I declared it. Terraform does not tell me if it does what I want it to do. Copyright © HG Data 2019

Building Blocks

Building Blocks - Bats! https://github.com/bats-core/bats-core Rspec-ish.No ‘shelling out’, can run where infra is stood up Good helpers for stdout, stderr, and exit status. Before and after hooks if needed. Copyright © HG Data 2019

Building Blocks - Templating https://www.terraform.io/docs/providers/template/d/file.html ssh driven: this is a plus and minus. Many approaches: scripts, inline, powerful when combined with templating. Copyright © HG Data 2019

Building Blocks - Remote Exec https://www.terraform.io/docs/provisioners/null_resource.html Copyright © HG Data 2019

Building Blocks - Structure Copyright © HG Data 2019

Spec Run Host A Install bats Run bats Dev Machine Terraform apply 1. Spin up 3. Runs tests in cloud context 2. Remote-exe Bats Host B Copyright © HG Data 2019

Proof Pudding (Demo Time) Copyright © HG Data 2019

Concluding Thoughts Keep the tests simple, too many null_providers == bad Positive assertions are usually functionality: A can ping B Negative assertions are usually security: B cannot ping A Future: dockerize, dashboard of health tests orthogonal from application health checks. Copyright © HG Data 2019

Terraforming Blue/Green Deployments Chris Deutsch

The problem... “One of the challenges with automating deployment is the cut-over itself, taking software from the final stage of testing to live production.” https://www.martinfowler.com/bliki/BlueGreenDeployment.html Copyright © HG Data 2019

Example basic setup Copyright © HG Data 2019

Blue and green deployments Copyright © HG Data 2019

Users are routed to green (live) Copyright © HG Data 2019

Dev/QA are routed to blue (next) Copyright © HG Data 2019

Bringing a new release live Copyright © HG Data 2019

Oh no! Copyright © HG Data 2019

Recovery: flip live back to green Copyright © HG Data 2019

Considerations While the idea is relatively simple, here are three important things to keep in mind. Copyright © HG Data 2019

1. Infrastructure We have two clusters: blue and green Each cluster should be configured in the same way Each cluster should operate independently and be easy to maintain In addition, the more we split things up, the more value we can get out of blue/green deploys for our infrastructure . But we also increase cost and complexity. Copyright © HG Data 2019

2. Configuration Management Applications should have the same configuration regardless of whether they are blue or green. Applications should have the same configuration regardless whether they are the live version or the next version. (Well… as much as is reasonable.) Copyright © HG Data 2019

3. Application State “Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database.” https://12factor.net/processes Copyright © HG Data 2019

Terraform + ELB + Kube + RDS + ElastiCache Copyright © HG Data 2019

Module 1: Terraforming a Load Balancer Copyright © HG Data 2019

Starting the app_load_balancer module Copyright © HG Data 2019

Add an aws_lb_target_group... Copyright © HG Data 2019

And an aws_lb_listener_rule... Copyright © HG Data 2019

Okay! Time to use it Copyright © HG Data 2019

Module 2: the blue deployment

Starting the app_cluster module Copyright © HG Data 2019

Attaching the Instance to a Target Group my_arns = { next = “some_next_arn” live = “some_live_arn” } lookup(var. my_arns, “next”) => “some_next_arn”

Excellent! Time to use it Copyright © HG Data 2019

Module 3: the green deployment Copyright © HG Data 2019

(It’s actually almost the same as blue) Copyright © HG Data 2019

(It’s actually almost the same as blue) Color flipping , Terraform style Copyright © HG Data 2019

1. Change Green From Next to Live Copyright © HG Data 2019

Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes Copyright © HG Data 2019

Now both are live! Copyright © HG Data 2019

Verify With Curl $ curl http://live.example.com/ I'm an app running in the production environment on green $ curl http://live.example.com/ I'm an app running in the production environment on blue Copyright © HG Data 2019

2. Move Blue From Live To Next Copyright © HG Data 2019

Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes Copyright © HG Data 2019

Now green is live, but blue is next Copyright © HG Data 2019

Verify With Curl $ curl http://live.example.com/ I'm an app running in the production environment on green $ curl http://next.example.com/ 503 Service Temporarily Unavailable # Wait a couple of minutes... $ curl http://next.example.com/ I'm an app running in the production environment on blue Copyright © HG Data 2019

Deploy Complete! Copyright © HG Data 2019

Q&A Thank you! Follow us online @ https://www.hgdata.com