BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
fenichawla
62 views
61 slides
Apr 30, 2024
Slide 1 of 61
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
About This Presentation
Data security is rapidly gaining importance as the volume of data companies collect, analyze and monetize grows exponentially. New data processing tools and platforms are emerging at an increasing rate, as are the ways in which an organization consumes data. In this presentation Mukund Sarma and Fen...
Data security is rapidly gaining importance as the volume of data companies collect, analyze and monetize grows exponentially. New data processing tools and platforms are emerging at an increasing rate, as are the ways in which an organization consumes data. In this presentation Mukund Sarma and Feni Chawla talk about the unique technical and cultural challenges of running a data security program and share some practical solutions that have worked well at our company.
These slides were presented at the BSides Seattle 2024 conference.
Size: 22.56 MB
Language: en
Added: Apr 30, 2024
Slides: 61 pages
Slide Content
Stopping Ethan Hunt from Taking Your Data! Feni Chawla Mukund Sarma
Who are we Mukund Sarma Senior Director of Product Security, Chime Security Engineer turned manager Chime ← Credit Karma ← Synopsys There are no Security problems - They are all engineering and culture problems! Feni Chawla Senior Security Engineer, Chime Data engineer turned security engineer Chime ← Rally Health ← Microsoft ← Teradata Passionate about keeping data safe and user information private
< ¡S poiler Alert!> How Ethan Hunt Steals Data from CIA Gets past the following controls to get to the database: Retinal scan Double key card Thermal and pressure sensors Laser rays 🗣 Feni
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA When he gets to the database: 🗣 Feni
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA When he gets to the database: 🗣 Feni
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA When he gets to the database: 🗣 Feni
<¡Spoiler Alert!> How Ethan Hunt Steals Data from CIA When he gets to the database: 🗣 Feni
Agenda Defining Data Security Unraveling the Roles and Responsibilities within Data team Why working with Data teams is different for Security teams Practical challenges of running a Data Security program How we approached building a pragmatic Data Security program How does “ STRIDE” look for Data Security Closing thoughts Questions 🗣 Feni
Things That Could Be Their Own Talks (What’s not in scope for this talk) Privacy engineering is not in scope AI and the i mplications of AI in engineering and Security “Data Perimeters” or concepts of that in the world of Public Cloud 🗣 Feni
Definitions (or rather a very brief outline) Data warehouse - a very large database that stores integrated data from multiple sources Snowflake - a popular, SaaS data warehouse ETL - a process of e xtracting data from various sources, t ransforming it into a format suitable for analysis, and l oading it somewhere, typically to a data warehouse Looker - a reporting & visualization tool that can connect to any database or data warehouse Data Lake - a large centralized repository that stores vast amounts of data, typically in their native format 🗣 Feni
Defining Data Security Data security is the practice of protecting Data from unauthorized access, corruption, or theft throughout its lifecycle. 🗣 Mukund
But why Data Security? 🗣 Mukund
What Makes Data Security Challenging Data is often intangible It can mutate and be derived It can easily flow across boundaries No intrinsic constraints on data handling 🗣 Feni Hi, how can I help you? Can you provide your order number? I need help canceling my last order Sure, my SSN is 123-45-6789
What Makes Data Security Challenging S cope is wide, and growing Data is everywhere Ownership spans multiple teams Data team often has multiple functions & goals 🗣 Feni
What Makes Data Security Challenging There aren’t many precedences one can learn from Traditional Security teams don’t understand the data domain More often it's seen as a Compliance function Not enough “security” focused tutorials / documentation 🗣 Mukund Data Security Infra Security App Security
Unraveling the roles and responsibilities within data team 🗣 Feni
Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Goal : Gather insights from datasets Datasets must be: Acquired Transformed Maintained Insights must be: Reliable Timely Easy to consume Granular Functional Roles Responsibilities Unraveling Roles and Responsibilities within Data Team 🗣 Feni
Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Functional Roles Responsibilities Unraveling Roles and Responsibilities within Data Team Using data to improve marketing results Build real-time view into performance of digital ads by demographic Improve performance amongst the 21-25 year olds in metropolitans Provide executive reporting on Q1 results 🗣 Feni
Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Unraveling Roles and Responsibilities within Data Team Build real-time view into performance of digital ads by demographics Campaign Data Proprietary Data Data Lifecycle Management Indexing & Cataloging APIs & Schedulers Scrub, Normalize, Ingest 🗣 Feni
Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Unraveling Roles and Responsibilities within Data Team Improve performance amongst the 21-25 year olds in metropolitans Testing & Sampling Modeling Analysis Data Lake 🗣 Feni
Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Unraveling Roles and Responsibilities within Data Team Provide executive reporting for Q1 results Data Lake Curated Data 🗣 Feni
Engineers Processing Data Platform Ops & SRE Scientists Modeling ML & AI Analysts Business Reporting Functional Roles Responsibilities Bringing It All together From a Tooling Perspective 🗣 Feni
Working wit h data engineers is different for security teams 🗣 Mukund
Working with Data Engineering Teams is Different from Infra and Software Engineering Teams 🗣 Mukund Raise your hands if your company has: A dedicated security onboarding program A security training program that covers OWASP Top 10 or similar for your engineers Built specific guardrails or tools for your developer A DevEx or DevRel team
Working with Data Engineering Teams is Different from Infra and Software Engineering Teams 🗣 Mukund Continue to keep your hands r aised if: any of those were built keeping your data teams/data engineers in mind
Working with Data Engineering Teams is Different from Infra and Software Engineering Teams Software & Infra Teams Data Teams Ownership Own the software, products and infra systems they build and maintain Typically do not own the data itself, just process it or manage the underlying platform Culture Generally start with high specificity about what they want to build or design Generally start with low specificity and need to explore the data to solidify requirements Testing Rarely need access to prod data, except for specific troubleshooting purposes Almost always need access to prod data for modeling and validation 🗣 Mukund
Tooling - Common Security Tools Don’t Apply Controls Product Engineering Data Teams Tools the team uses Most tools in use are mature and established, especially in production (eg Argo for K8s deployment, Rails, etc) No easy way to regulate tools operating on datasets Security Design Reviews Clear, established practice coupled with availability of frameworks like STRIDE No established processes & frameworks Vulnerability Detection Code s canners, SAST/DAST, pentests, bug bounty etc No way to detect problems in tools and datasets Asset Inventory Generally easy to itemize services, compute, infra, etc No easy way to itemize datasets and models 🗣 Mukund
Working with Data Engineering Teams is Different from Infra and Software Engineering Teams 🗣 Mukund We ought to come to terms with the fact that most companies and security teams have not considered their data scientists and Data engineers as First class citizens. This must change!
Practical challenges of running a data security program IAM, Data Inventory (or lack thereof), SDLC, Culture 🗣 Feni
Practical Challenges of Running a Data Security Program #1: IAM Most databases and platforms use their own RBAC Securing service accounts is complex Limiting user permissions hinders productivity Role 1 Role 2 Role 3 Role 4 Service Role 🗣 Feni
Practical Challenges of Running a Data Security Program #2: Data Inventory & Data Discovery Finding sensitive data is not easy Hard to get classification right 🗣 Feni
Practical Challenges of Running a Data Security Program #2: Data Inventory & Data Discovery Finding sensitive data is not easy Hard to get classification right 🗣 Feni In case you were wondering, Waldo is here!
Practical Challenges of Running a Data Security Program #3: Redefining your SDLC to include the Data team Existing frameworks and processes don’t work well Lack of an OWASP Top 10 equivalent What does design reviews look like for a researcher? Traditional Security tooling is not built for identifying Data Security issues Lack of security training programs catered to data teams Pragmatic data governance guidelines stop using catch-all words like “PII” - complement with right tools 🗣 Mukund
Practical Challenges of Running a Data Security Program #4: Culture Data team’s culture of exploration runs counter to security team’s culture of enforcing least privilege Data teams aren’t used to Security being involved in their development lifecycle Security has focused a lot on Shift left and guardrails for our application and infrastructure engineers to do things right. What about data? Is your team grounded in pragmatism? 🗣 Mukund
What should one do to solve data security 🗣 Feni
What Should One Do? The technical stuff… Approach Notes Inventory Data Discovery Understand where all sensitive data is Role Discovery Understand who has what access privileges Access Control Isolate Individual Services Ensure all ETL jobs use unique credentials JIT Access Approvals Extend JIT access controls to all data users Least Privileged Access Require users to assume least privilege role Segmentation Data Segregation Restrict data to specific locations based on risk Client Segmentation Limit specific clients to specific data based on risk Increased complexity 🗣 Feni
What Should One Do? The technical stuff… Approach Notes Inventory Data Discovery Understand where all sensitive data is Role Discovery Understand who has what access privileges Access Control Isolate Individual Services Ensure all ETL jobs use unique credentials JIT Access Approvals Extend JIT access controls to all data users Least Privileged Access Require users to assume least privilege role Segmentation Data Segregation Restrict data to specific locations based on risk Client Segmentation Limit specific clients to specific data based on risk Increased complexity 🗣 Feni
What Should One Do? The technical stuff… Approach Notes Inventory Data Discovery Understand where all sensitive data is Role Discovery Understand who has what access privileges Access Control Isolate Individual Services Ensure all ETL jobs use unique credentials JIT Access Approvals Extend JIT access controls to all data users Least Privileged Access Require users to assume least privilege role Segmentation Data Segregation Restrict data to specific locations based on risk Client Segmentation Limit specific clients to specific data based on risk Increased complexity 🗣 Feni
What Should One Do? The technical stuff… Approach Notes Inventory Data Discovery Understand where all sensitive data is Role Discovery Understand who has what access privileges Access Control Isolate Individual Services Ensure all ETL jobs use unique credentials JIT Access Approvals Extend JIT access controls to all data users Least Privileged Access Require users to assume least privilege role Segmentation Data Segregation Restrict data to specific locations based on risk Client Segmentation Limit specific clients to specific data based on risk Increased complexity 🗣 Feni
What Should One Do? The process stuff… Approach Notes Data Environment Data Minimization Only collect data that is absolutely needed Terraform Modules with Secure Defaults Shift-left, emulate what worked in AppSec and Cloud Security Tooling IAM Helper Tooling Provide tools that enable least privilege role selection Documentation Tooling Documentation Ask vendors for best practices for using their tools securely Tutorials Ask vendors to provide tutorials on security configurations Runbooks Provide practical operational guidance through runbooks Increased complexity 🗣 Feni
What Should One Do? The process stuff… Approach Notes Data Environment Data Minimization Only collect data that is absolutely needed Terraform Modules with Secure Defaults Shift-left, emulate what worked in AppSec and Cloud Security Tooling IAM Helper Tooling Provide tools that enable least privilege role selection Documentation Tooling Documentation Ask vendors for best practices for using their tools securely Tutorials Ask vendors to provide tutorials on security configurations Runbooks Provide practical operational guidance through runbooks Increased complexity 🗣 Feni
What Should One Do? The process stuff… Approach Notes Model Builder Environment Data Minimization Only collect the data absolutely needed for model building Terraform Modules with Secure Defaults Shift-left, emulate what worked in AppSec and Cloud Security Tooling IAM Helper Tooling Provide tools that enable least privilege role selection Documentation Tooling Documentation Ask vendors for best practices for using their tools securely Tutorials Ask vendors to provide tutorials on security configurations Runbooks Provide practical operational guidance through runbooks Increased complexity 🗣 Feni
What Should One Do? The process stuff… Approach Notes Model Builder Environment Data Minimization Only collect the data absolutely needed for model building Terraform Modules with Secure Defaults Shift-left, emulate what worked in AppSec and Cloud Security Tooling IAM Helper Tooling Provide tools that enable least privilege role selection Documentation Tooling Documentation Ask vendors for best practices for using their tools securely Tutorials Ask vendors to provide tutorials on security configurations Runbooks Provide practical operational guidance through runbooks Increased complexity 🗣 Feni
What Should One Do? The people stuff … Approach Invest in building bridges Collaboration > Security Transparency - One can’t fix the issues they can’t see Help do the job - builds empathy and confidence Understand that we as an industry have neglected Data teams all along Plan in the open Teach them to fish - don’t serve them fish Increased complexity 🗣 Mukund
Collaborative Threat Modeling with STRIDE Threat modeling is not just for applications and infra 🗣 Mukund
Threat Modeling is not just for Applications and Infra The exercise of collaborative threat modeling helps all the teams involved come to a pragmatic list of risks they can work on. If you’re new to threat modeling or wondering on how that looks for in the world of Data, let’s walk through it
But why run collaborative threat models at all in the first place? Data teams haven’t had to work with the Security teams on an ongoing basis We’re all trying to figure out how we work - Having this as an activity opens up opportunities for collaboration 🗣 Mukund
Collaborative Threat Modeling with STRIDE Threat Spoofing Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges Goal: Apply STRIDE-like model to data, in addition to apps, services and infrastructure Strategically focus on threat patterns that are not covered in application or infra security, e.g.: SQL injection is accounted for in appsec 🗣 Feni
Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing Using service accounts to access data directly Looker user runs query on Snowflake using Looker role Looker admin logs into Snowflake using Looker service account credentials Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing DB admin deleting or modifying data for personal gain Application bug resulting in data being corrupted or deleted Creating false data for hijacking ML model IT admin deleting or overwriting audit logs DBA at a bank modifies balance information Complex ETL job deletes certain records while loading them to destination DBA creates false data for training ML model that causes fraudulent transactions to be undetected Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing All activity managed and monitored using ROLES within databases, instead of user User could successively assume different shared roles, performing operations piecemeal with each role, making repudiation very challenging Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing Sensitive data flows from production to analytics dashboard Sensitive production data used in dev for testing or troubleshooting Sensitive data copied to an accessible location using service account ETL pipeline scrubs PII from SSN column in PG, but not from JSON fields, which results in SSN being present in dashboard Developers creating copy of sensitive data in development for testing Developer running ETL job to copy restricted sensitive data into an S3 bucket that they have access to Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing Processing job blows up due to complex computation on large dataset A single Spark job overwhelms AWS resources, resulting in cascading failures across other jobs Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
Collaborative Threat Modeling with STRIDE Threat Applied to Data Security Examples Spoofing Admin creating backdoors on databases Using service accounts to access data DBA or readwrite user GRANTing increased privileges to an alternate role Looker admin logs into Snowflake using Looker service account credentials, and can read data that they do not have access to directly Tampering Repudiation Information Disclosure Denial of Service Elevation of Privileges 🗣 Feni
Putting it all together.. STRIDE is a valuable framework for security and data teams to work together to conceptualize and subsequently secure against threats There are other frameworks too that one could use, eg: DREAD - developed by the company which is graciously hosting us today PASTA - italian food will not taste the same now that you have heard about this VAST - why did the pond break up with the vast ocean? Because it was too shallow 🗣 Feni
Parting Thoughts Data tools are neither infra tools, nor app tools - need to approach them differently Concepts from infra, cloud, and app security help, but need to properly thought through when applying to data Security is important to everyone, but often not at the cost of productivity 🗣 Feni
Parting Thoughts Start small - You will have to do a lot of heavy lifting yourselves initially Share/include the data team with everything you do Build developer focused tools that will help them do what they need to do securely Be pragmatic Kudos go a long way! Ask for help - Builds empathy Share what worked/didn’t work with the industry - we’re all in this together! 🗣 Mukund