AI-Driven DevOps: How LLMs and Agents Are Changing Software Delivery
AllThingsOpen
12 views
19 slides
Oct 20, 2025
Slide 1 of 19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
About This Presentation
Presented at All Things Open 2025
Presented by Kedar Kulkarni - Apple Inc.
Title: AI-Driven DevOps: How LLMs and Agents Are Changing Software Delivery
Abstract: The rise of Large Language Models (LLMs) and AI Agents is transforming DevOps, enabling faster incident resolution, intelligent automation...
Presented at All Things Open 2025
Presented by Kedar Kulkarni - Apple Inc.
Title: AI-Driven DevOps: How LLMs and Agents Are Changing Software Delivery
Abstract: The rise of Large Language Models (LLMs) and AI Agents is transforming DevOps, enabling faster incident resolution, intelligent automation, and self-healing infrastructure. In this talk, we’ll explore practical ways to integrate LLMs into DevOps workflows—from automating runbooks and generating IaC code to enhancing observability and predictive analytics. Using real-world examples, we’ll discuss how AI-powered agents can reduce toil, improve efficiency, and help teams focus on higher-value tasks.
Takeaways:
How LLMs can assist with troubleshooting, monitoring, and security
Building AI-powered DevOps workflows with chatbots, automation, and self-healing systems
Case studies on using AI for faster deployments, intelligent CI/CD, and cloud automation
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
Bluesky: https://bsky.app/profile/allthingsopen.bsky.social
YouTube: https://www.youtube.com/@allthingsopen
2025 conference: https://2025.allthingsopen.org/
Size: 2.88 MB
Language: en
Added: Oct 20, 2025
Slides: 19 pages
Slide Content
AI-Driven DevOps: How LLMs
and Agents Are Changing
Software Delivery
Kedar Kulkarni
All Things Open 2025
About Me
1Senior DevOps architect at
Apple
Previously VMware and Red
Hat
2Co-founder of Ansible Tower
Config as Code with 1M+
downloads
3Talk to me about: Ansible,
Virtualization, Kubernetes,
Containers, CI/CD,
motorcycles
4Motorcycle enthusiast -
Ninja650 ³ ZX-6R
COTA track day experience
Disclaimer: Not representing Apple; views are my own
Today's Agenda
DevOps & Software Delivery
Context
Decoding AI, LLMs & Agents DevOps Pain Points &
Human Cost
AI as the Great Equalizer Ethics & Limitations in AI
DevOps
Practical AI Solutions &
Demos
Open Source Tools & Getting Started
Reframing "Toil"
Not Just Busy Work, But Cognitive Load
Pattern Recognition
Scanning logs for errors
Identifying resource trends
Correlating across systems
Context Switching
Finding runbook locations
Recalling procedures
Translating between tools
Knowledge Access
Finding the right expert
Searching Slack history
Deciphering legacy docs
The Human Cost of DevOps
Pain
"Toil" Varies By:
Experience
5min vs 2hrs
Domain
Network vs
App
Context
Routine vs
Emergency
Access
Permissions &
Knowledge
Pager fatigue & burnout
15+ scattered tools
Outside-expertise P0s
Brittle automation
Decoding AI for DevOps Teams
LLMs
Advanced pattern matching for text
Like a senior engineer who's read everything
In DevOps Terms:
Smart search + content generation
AI Agents
LLMs + tools + decision making
Reliable teammate that can execute tasks
In DevOps Terms:
Smart search + execution capabilities
AI as the Great Equalizer
Junior Engineers
Instant institutional knowledge
Log explanation in plain language
Guided troubleshooting steps
Senior Engineers
Handles pattern recognition
Synthesizes information
Acts as thinking partner
Non-Native English Speakers
Translates technical jargon
Assists with documentation
Different Learning Styles
Visual: Diagrams & flowcharts
Sequential: Step-by-step guides
Diverse Perspectives
Startup
5-person team, multiple hats
AI helps with knowledge gaps
Focus: Quick wins, rapid iteration
Enterprise
Specialized teams, complex
approvals
AI helps with coordination
Focus: Compliance, security,
standards
Mid-Size
Growing pains, some process, some
chaos
AI helps with scaling operations
Focus: Building sustainable practices
Different contexts require different AI applications
Quick poll: Please indicate which category best describes your team: Startup, Mid-Size, or Enterprise?
Demo 1: AI-Assisted Troubleshooting
(Kubernetes Scenario)
Scenario: Production Kubernetes cluster shows high pod restart rates for a critical
service.
Junior View
Explain the pod restart issue step-by-step for a junior engineer,
including common causes and initial troubleshooting commands.
Senior View
Analyze recent deployment changes and service logs to identify
patterns contributing to the high pod restart rates across similar
incidents.
Security View
Scan for any unusual network activity, failed authentication attempts,
or privilege escalations that could indicate malicious activity related to
the pod restarts.
Business View
Assess the current customer impact of the service degradation and
provide an estimated time to resolution (ETA) based on available data
and troubleshooting progress.
04:23
YouTube
K8sGPT: AI-Driven DevOps Troubleshoot&
Did you catch my presentation at All Things
Open (ATO) 2025? This video is a full&
Demo 2: Self-healing K8s
System
AI agents autonomously discover, diagnose, and
remediate errors to ensure continuous system
stability.
Intelligent Error Detection
AI continuously scans logs, metrics, and traces to identify subtle
anomalies and errors before they can escalate into major incidents.
Automated Root Cause Analysis (RCA)
The AI correlates data across various sources (metrics, logs,
deployment history) to accurately pinpoint the exact cause of
operational issues.
Self-Executing Remediation
AI agents automatically apply known remediation patterns, such as
restarting failing services, scaling resources, or rolling back recent
deployments.
Improved MTTR
AI agents dramatically reduce Mean Time To Recovery (MTTR) by
eliminating human response delays and instantly applying fixes,
cutting incident resolution time from hours down to minutes.
01:51
YouTube
How to Build a Self-Healing Kubernetes &
In this demo, we showcase AI-powered self-
healing for Kubernetes clusters using the&
AI Ethics & Limitations in
DevOps
Key Considerations
Bias in automation
decisions
Transparency of
recommendations
Human oversight for critical
tasks
Guardrails
Operational
Knowledge Gap
AI training is rich in code,
poor in ops wisdom
DevOps knowledge lives in:
Private Slack threads
War room discussions
Tribal knowledge
AI Knowledge +
Human
Experience =
Infinite
Possibility
Open Source AI Tools for DevOps
Infrastructure
tfgpt
k8sgpt
kubectl-ai
AWS Copilot CLI
Development
aider
cline
roo code
Monitoring
keep
Models
Llama
Mistral
DeepSeek
Getting Started:
Implementation Roadmap
1 Week 1: Assessment & Quick Wins
Audit top 3 time-consuming tasks
Try existing tools: GitHub Copilot, ChatGPT
Goal: Save 30 min/person in first week
2 Month 1: Pilot Implementation
Choose: troubleshooting OR documentation
Build simple integration (Slack bot, dashboard)
Goal: 20% faster incidents OR 50% more docs
3 Quarter 1: Scale & Integrate
Expand successful pilots
Build internal knowledge base for AI
Goal: 80% team adoption of AI tools
Different Starting Points
High-Traffic / Large Teams
Focus: Self-service automation and intelligent routing
Specialized / Expert Teams
Focus: Knowledge capture and junior engineer onboarding
Distributed / Remote Teams
Focus: Communication enhancement and async
collaboration
Budget-Conscious Teams
Focus: Free/open-source tools and gradual integration
Risk Mitigation:
Always maintain human oversight " Start with non-production " Establish clear guidelines
Key Takeaways
AI amplifies human capabilities, doesn't replace judgment
Different teams benefit from different AI applications
Start small, measure impact, iterate based on your needs
Ethics and inclusivity should be built in from the beginning
Discussion Questions
What's your biggest "toil" challenge?
How might your team's diverse perspectives
benefit from AI assistance?
What concerns do you have about AI in your
workflows?
Which quick win would be most valuable for
your team?
Thank You
AI-Driven DevOps: How LLMs and Agents Are Changing
Software Delivery
Connect with me to continue the conversation
KEDARKULKARNI.in [email protected]
https://linkedin.com/in/kkulkar3