Developing Kubernetes Integrations for the On-Premises Cloud
AllThingsOpen
8 views
66 slides
Oct 20, 2025
Slide 1 of 66
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
About This Presentation
Presented at All Things Open 2025
Presented by Matthew Sanabria - Oxide Computer Company
Title: Developing Kubernetes Integrations for the On-Premises Cloud
Abstract: Cloud providers develop Kubernetes integrations to deliver a smooth Kubernetes experience on their platform. Integrations that most ...
Presented at All Things Open 2025
Presented by Matthew Sanabria - Oxide Computer Company
Title: Developing Kubernetes Integrations for the On-Premises Cloud
Abstract: Cloud providers develop Kubernetes integrations to deliver a smooth Kubernetes experience on their platform. Integrations that most companies wouldn't build. Unless, of course, if the company is a new cloud provider themselves.
This talk walks through Oxide's journey developing Kubernetes integrations for its on-premises cloud computer. If you've ever wondered how LoadBalancer services work, how node health is reconciled, or what the heck a Cloud Controller Manager is then this talk is for you. We'll discuss the various Kubernetes integrations available, which integrations Oxide developed using Go, and the lessons learned along the way.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
Bluesky: https://bsky.app/profile/allthingsopen.bsky.social
YouTube: https://www.youtube.com/@allthingsopen
2025 conference: https://2025.allthingsopen.org/
Size: 1.34 MB
Language: en
Added: Oct 20, 2025
Slides: 66 pages
Slide Content
Developing Kubernetes
Integrations for the
On-Premises Cloud
Matthew Sanabria
All Things Open 2025
Kubernetes Pop
Quiz
Does This Work In Your
Kubernetes Cluster?
apiVersion: v1
kind: Service
metadata:
name: example
spec:
type: LoadBalancer
Does This Work In Your
Kubernetes Cluster?
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example
spec:
resources:
requests:
storage: 1Gi
Does This Work In Your
Kubernetes Cluster?
apiVersion: v1
kind: Node
metadata:
name: example
spec:
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
What We’ll Cover
●What Kubernetes integrations exist? Understand the
Kubernetes integration ecosystem.
●Which integrations did Oxide build? How we chose which
integrations to build and why.
●How was the integration experience? Discuss lessons learned
and problems encountered.
Kubernetes
Overview
Control Plane Components
●kube-apiserver - Serves the Kubernetes API and provides
different Kubernetes resources (e.g., Pod, Service).
●etcd - Distributed key-value store for API data.
●kube-scheduler - Schedules Pod resources to run on nodes.
●kube-controller-manager - Runs controllers that implement
behavior for built-in Kubernetes resources.
●cloud-controller-manager (optional) - Runs controllers that
integrate with an underlying cloud provider (e.g., Oxide).
Worker Components
●kubelet - Ensures Pod resources are running on the node in
accordance with the data retrieved from the Kubernetes API.
●kube-proxy (optional) - Maintains network rules to implement
Service resources.
●Container Runtime - Software responsible for running
containers on a node (e.g., Docker Engine, containerd).
Kubernetes
Integration
Points
Kubernetes
Integrations
Cloud Controller
Manager
Cloud Controller Manager
Overview
●Deployed as a control plane component.
●Runs controllers to perform cloud provider specific functions
(e.g., create load balancer).
●Uses a standardized cloud provider interface to ease
development.
Cloud Controller Manager
Deployment
1.Implement the cloud controller manager interface.
2.Build the custom cloud controller manager binary.
3.Run Kubernetes control plane components with
--cloud-provider=external.
4.Deploy the custom cloud controller manager to Kubernetes.
a.Make sure to configure it to talk to your cloud provider!
5.Use it!
a.Create LoadBalancer services.
b.Check that node taints were removed.
Cloud Controller Manager
LoadBalancer
apiVersion: v1
kind: Service
metadata:
name: example
spec:
type: LoadBalancer
Cloud Controller Manager
InstancesV2
type InstancesV2 interface {
// These take context.Context. Omitted for space.
InstanceExists(*Node) (bool, error)
InstanceShutdown(*Node) (bool, error)
InstanceMetadata(*Node) (*InstanceMetadata, error)
}
Cloud Controller Manager
Oxide
●Code at oxidecomputer/oxide-cloud-controller-manager.
●InstancesV2 implemented.
○Can be improved once Oxide has resource tags.
●LoadBalancer unimplemented.
○Oxide doesn’t have a native load balancer… yet!
●Routes unimplemented.
○Not needed since we use third-party CNIs.
Cluster API
Cluster API Overview
●Create and manage Kubernetes clusters using Kubernetes
custom resource definitions and controllers.
●Automates cluster lifecycle management for platform operators
(e.g., managed Kubernetes).
●Quite complex.
Cluster API Usage
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: OxideCluster
metadata:
name: example
spec:
# Infrastructure configuration (e.g., instances).
Cluster API Usage
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: example
spec:
# Cluster configuration (e.g., CIDRs).
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: OxideCluster
name: example
Cluster API Oxide
●Unimplemented!
○There are other ways to manage Kubernetes clusters.
○Ways that don’t require an existing Kubernetes cluster.
Rancher Node
Driver
Rancher Node Driver
Overview
●Provisions instances that Rancher uses to launch and manage
Kubernetes clusters.
●Less complex than the Cluster API.
●Uses the docker-machine interface.
●Deployed by configuring Rancher to download the custom node
driver binary.
Rancher Node Driver
Interface
// There are many more methods. Omitted for space.
type Driver interface {
Create() error
DriverName() string
PreCreateCheck () error
Remove() error
Restart() error
Start() error
Stop() error
}
Rancher Node Driver Oxide
●Code at oxidecomputer/rancher-machine-driver-oxide.
●Initially developed by an Oxide customer!
●My first project when joining Oxide.
Omni
Infrastructure
Provider
Omni Infrastructure Provider
Overview
●Made by Sidero Labs, makers of Talos Linux.
●Connects Talos Linux machines to Omni for automatic
management.
●Two types of providers:
○Static
○Dynamic
Omni Infrastructure Provider
Interface
type Provisioner[T Resource] interface {
ProvisionSteps() []Step[T]
Deprovision(context.Context, *Logger, T, *Request) error
}
Omni Infrastructure Provider
Steps
1.Generate Talos Linux schematic ID.
2.Download Talos Linux image.
3.Upload Talos Linux image to Oxide.
4.Start Talos Linux instance with user-data to connect to Omni.
5.Profit!
Omni Infrastructure Provider
Deployment
export OMNI_ENDPOINT='https://omni.example.com'
export OMNI_SERVICE_ACCOUNT_KEY='example'
./omni-infra-provider-custom --example
Omni Infrastructure Provider
Oxide
●Code at oxidecomputer/omni-infra-provider-oxide.
○Still under development. See pull requests!
●Great developer experience after initial ramp up.
●Sidero Labs maintainers answered a bunch of questions in
siderolabs/omni/discussions/1633.
○Found bugs.
○Found answers.
Container
Network Interface
(CNI) Plugin
CNI Plugin Overview
●Responsible for container networking.
●Configures network interfaces on Kubernetes nodes.
●Can be complex.
●Many third-party CNI plugins available.
CNI Plugin Oxide
●Unimplemented!
○There are great third-party CNI plugins.
○Most major cloud providers use these plugins.
●Perhaps we’ll revisit this in the future.
Container
Storage Interface
(CSI) Plugin
CSI Plugin Overview
●Responsible for exposing arbitrary block and file storage to
containers.
●Necessary for workloads that need persistent storage that
remains when containers are terminated.
CSI Plugin Oxide
●Code at oxidecomputer/oxide-csi-plugin.
○Internal only for now, still under development.
○Will be public when we’re done!
●Solves write amplification problem.
●A few blockers:
○Disk attachments require instances to be stopped.
○Instances can only have 8 disks attached.
Lessons Learned
Bootstrapping Problem
●How do you create the first dependency?
○Kubernetes must exist before integrations can be deployed.
●No right answer. Documentation required.
Public Cloud Assumptions
●Kubernetes interfaces were built on public cloud assumptions.
○These assumptions aren't always true on-premises.
○For example, regions, zones, instance metadata.
●Unclear what to do when assumptions are missing.
Oxide Limitations
●The Oxide product is still growing.
●Certain features are missing that make integration difficult.
○Load balancer.
○Resource tagging.
○Instance metadata service.
○Instance identity authentication.
●Can work around these for now.
Documentation Woes
Click to add text
Documentation Woes
●The integration ecosystem is poorly documented.
●Examples are minimal without useful comments.
●Things that are documented are out of date.
●Integrations are critical to the user experience.
○Document them!
Closing
Thoughts
Closing Thoughts
●Learned a ton!
○Oxide
○Kubernetes
●Built user empathy.
●Things aren't as complex after a bit of prototyping.
●Context switching is difficult and costly.
●Check out our RFDs!
○RFD 0493: Initial Kubernetes Integrations
○RFD 0595: Oxide CSI Plugin
About Me
●First Solutions Software Engineer hire at
Oxide Computer Company.
○Generalist team with specialized scope.
○Now I lead a team of 4!
●Extensive Site Reliability Engineer (SRE)
background.
●Blogging at matthewsanabria.dev.
●Podcasting at fallthrough.fm.