Developing Kubernetes Integrations for the On-Premises Cloud

AllThingsOpen 8 views 66 slides Oct 20, 2025
Slide 1
Slide 1 of 66
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66

About This Presentation

Presented at All Things Open 2025
Presented by Matthew Sanabria - Oxide Computer Company

Title: Developing Kubernetes Integrations for the On-Premises Cloud
Abstract: Cloud providers develop Kubernetes integrations to deliver a smooth Kubernetes experience on their platform. Integrations that most ...


Slide Content

Developing Kubernetes
Integrations for the
On-Premises Cloud
Matthew Sanabria
All Things Open 2025

Kubernetes Pop
Quiz

Does This Work In Your
Kubernetes Cluster?
apiVersion: v1
kind: Service
metadata:
name: example
spec:
type: LoadBalancer

Does This Work In Your
Kubernetes Cluster?
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example
spec:
resources:
requests:
storage: 1Gi

Does This Work In Your
Kubernetes Cluster?
apiVersion: v1
kind: Node
metadata:
name: example
spec:
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"

What We’ll Cover
●What Kubernetes integrations exist? Understand the
Kubernetes integration ecosystem.
●Which integrations did Oxide build? How we chose which
integrations to build and why.
●How was the integration experience? Discuss lessons learned
and problems encountered.

Kubernetes
Overview

Control Plane Components
●kube-apiserver - Serves the Kubernetes API and provides
different Kubernetes resources (e.g., Pod, Service).
●etcd - Distributed key-value store for API data.
●kube-scheduler - Schedules Pod resources to run on nodes.
●kube-controller-manager - Runs controllers that implement
behavior for built-in Kubernetes resources.
●cloud-controller-manager (optional) - Runs controllers that
integrate with an underlying cloud provider (e.g., Oxide).

Worker Components
●kubelet - Ensures Pod resources are running on the node in
accordance with the data retrieved from the Kubernetes API.
●kube-proxy (optional) - Maintains network rules to implement
Service resources.
●Container Runtime - Software responsible for running
containers on a node (e.g., Docker Engine, containerd).

Kubernetes
Integration
Points

Kubernetes
Integrations

Cloud Controller
Manager

Cloud Controller Manager
Overview
●Deployed as a control plane component.
●Runs controllers to perform cloud provider specific functions
(e.g., create load balancer).
●Uses a standardized cloud provider interface to ease
development.

Cloud Controller Manager
Interface
type Interface interface {
Initialize(ControllerClientBuilder , <-chan struct{})
LoadBalancer() (LoadBalancer, bool)
InstancesV2() (InstancesV2, bool)
Routes() (Routes, bool)
ProviderName() string
}

Cloud Controller Manager
Interface
type Interface interface {
Initialize(ControllerClientBuilder , <-chan struct{})
LoadBalancer() (LoadBalancer, bool)
InstancesV2() (InstancesV2, bool)
Routes() (Routes, bool)
ProviderName() string
}

Cloud Controller Manager
Interface
type Interface interface {
Initialize(ControllerClientBuilder, <-chan struct{})
LoadBalancer() (LoadBalancer, bool)
InstancesV2() (InstancesV2, bool)
Routes() (Routes, bool)
ProviderName() string
}

Cloud Controller Manager
Deployment
1.Implement the cloud controller manager interface.
2.Build the custom cloud controller manager binary.
3.Run Kubernetes control plane components with
--cloud-provider=external.
4.Deploy the custom cloud controller manager to Kubernetes.
a.Make sure to configure it to talk to your cloud provider!
5.Use it!
a.Create LoadBalancer services.
b.Check that node taints were removed.

Cloud Controller Manager
LoadBalancer
apiVersion: v1
kind: Service
metadata:
name: example
spec:
type: LoadBalancer

Cloud Controller Manager
LoadBalancer
type Interface interface {
Initialize(ControllerClientBuilder, <-chan struct{})
LoadBalancer() (LoadBalancer, bool)
InstancesV2() (InstancesV2, bool)
Routes() (Routes, bool)
ProviderName() string
}

Cloud Controller Manager
LoadBalancer
type LoadBalancer interface {
// These take context.Context. Omitted for space.
GetLoadBalancer(string, *Service) (*Status, bool, error)
GetLoadBalancerName(string, *Service) string
EnsureLoadBalancer(string, *Service, []*Node) (*Status, error)
UpdateLoadBalancer(string, *Service, []*Node) error
EnsureLoadBalancerDeleted (string, *Service) error
}

Cloud Controller Manager
InstancesV2
apiVersion: v1
kind: Node
metadata:
name: example
spec:
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"

Cloud Controller Manager
InstancesV2
type Interface interface {
Initialize(ControllerClientBuilder, <-chan struct{})
LoadBalancer() (LoadBalancer, bool)
InstancesV2() (InstancesV2, bool)
Routes() (Routes, bool)
ProviderName() string
}

Cloud Controller Manager
InstancesV2
type InstancesV2 interface {
// These take context.Context. Omitted for space.
InstanceExists(*Node) (bool, error)
InstanceShutdown(*Node) (bool, error)
InstanceMetadata(*Node) (*InstanceMetadata, error)
}

Cloud Controller Manager
Oxide
●Code at oxidecomputer/oxide-cloud-controller-manager.
●InstancesV2 implemented.
○Can be improved once Oxide has resource tags.
●LoadBalancer unimplemented.
○Oxide doesn’t have a native load balancer… yet!
●Routes unimplemented.
○Not needed since we use third-party CNIs.

Cluster API

Cluster API Overview
●Create and manage Kubernetes clusters using Kubernetes
custom resource definitions and controllers.
●Automates cluster lifecycle management for platform operators
(e.g., managed Kubernetes).
●Quite complex.

Cluster API Usage
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: OxideCluster
metadata:
name: example
spec:
# Infrastructure configuration (e.g., instances).

Cluster API Usage
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: example
spec:
# Cluster configuration (e.g., CIDRs).
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: OxideCluster
name: example

Cluster API Oxide
●Unimplemented!
○There are other ways to manage Kubernetes clusters.
○Ways that don’t require an existing Kubernetes cluster.

Rancher Node
Driver

Rancher Node Driver
Overview
●Provisions instances that Rancher uses to launch and manage
Kubernetes clusters.
●Less complex than the Cluster API.
●Uses the docker-machine interface.
●Deployed by configuring Rancher to download the custom node
driver binary.

Rancher Node Driver
Interface
// There are many more methods. Omitted for space.
type Driver interface {
Create() error
DriverName() string
PreCreateCheck () error
Remove() error
Restart() error
Start() error
Stop() error
}

Rancher Node Driver
Deployment
apiVersion: management.cattle.io/v3
kind: NodeDriver
metadata:
name: oxide
spec:
checksum: f68726fd...
description: "Oxide Rancher node driver."
displayName: oxide
url: "https://example.com/docker-machine-driver-oxide"

Rancher Node Driver
OxideConfig
apiVersion: rke-machine-config.cattle.io/v1
kind: OxideConfig
metadata:
name: oxide-machine-config
namespace: fleet-default
bootDiskImageId: 499487b6-d857-4e91-b749-7b08ebe18cfe
project: example
sshUser: ubuntu

Rancher Node Driver Cluster
apiVersion: provisioning.cattle.io/v1
kind: Cluster
metadata:
name: oxide-k8s-cluster
spec:
rkeConfig:
machinePools:
- machineConfigRef :
kind: OxideConfig
name: oxide-machine-config

Rancher Node Driver Oxide
●Code at oxidecomputer/rancher-machine-driver-oxide.
●Initially developed by an Oxide customer!
●My first project when joining Oxide.

Omni
Infrastructure
Provider

Omni Infrastructure Provider
Overview
●Made by Sidero Labs, makers of Talos Linux.
●Connects Talos Linux machines to Omni for automatic
management.
●Two types of providers:
○Static
○Dynamic

Omni Infrastructure Provider
Interface
type Provisioner[T Resource] interface {
ProvisionSteps() []Step[T]
Deprovision(context.Context, *Logger, T, *Request) error
}

Omni Infrastructure Provider
Steps
1.Generate Talos Linux schematic ID.
2.Download Talos Linux image.
3.Upload Talos Linux image to Oxide.
4.Start Talos Linux instance with user-data to connect to Omni.
5.Profit!

Omni Infrastructure Provider
Deployment
export OMNI_ENDPOINT='https://omni.example.com'
export OMNI_SERVICE_ACCOUNT_KEY='example'

./omni-infra-provider-custom --example

Omni Infrastructure Provider
Oxide
●Code at oxidecomputer/omni-infra-provider-oxide.
○Still under development. See pull requests!
●Great developer experience after initial ramp up.
●Sidero Labs maintainers answered a bunch of questions in
siderolabs/omni/discussions/1633.
○Found bugs.
○Found answers.

Container
Network Interface
(CNI) Plugin

CNI Plugin Overview
●Responsible for container networking.
●Configures network interfaces on Kubernetes nodes.
●Can be complex.
●Many third-party CNI plugins available.

CNI Plugin Oxide
●Unimplemented!
○There are great third-party CNI plugins.
○Most major cloud providers use these plugins.
●Perhaps we’ll revisit this in the future.

Container
Storage Interface
(CSI) Plugin

CSI Plugin Overview
●Responsible for exposing arbitrary block and file storage to
containers.
●Necessary for workloads that need persistent storage that
remains when containers are terminated.

CSI Plugin Deployment
CSIDriver
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
name: csi.oxide.computer
spec:
attachRequired: true
podInfoOnMount: false
fsGroupPolicy: File

CSI Plugin Deployment
StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: oxide-disk
provisioner: csi.oxide.computer

CSI Plugin Deployment
PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example
spec:
resources:
requests:
storage: 1Gi
storageClassName: oxide-disk

CSI Plugin Oxide
●Code at oxidecomputer/oxide-csi-plugin.
○Internal only for now, still under development.
○Will be public when we’re done!
●Solves write amplification problem.
●A few blockers:
○Disk attachments require instances to be stopped.
○Instances can only have 8 disks attached.

Lessons Learned

Bootstrapping Problem
●How do you create the first dependency?
○Kubernetes must exist before integrations can be deployed.
●No right answer. Documentation required.

Public Cloud Assumptions
●Kubernetes interfaces were built on public cloud assumptions.
○These assumptions aren't always true on-premises.
○For example, regions, zones, instance metadata.
●Unclear what to do when assumptions are missing.

Oxide Limitations
●The Oxide product is still growing.
●Certain features are missing that make integration difficult.
○Load balancer.
○Resource tagging.
○Instance metadata service.
○Instance identity authentication.
●Can work around these for now.

Documentation Woes
Click to add text

Documentation Woes
●The integration ecosystem is poorly documented.
●Examples are minimal without useful comments.
●Things that are documented are out of date.
●Integrations are critical to the user experience.
○Document them!

Closing
Thoughts

Closing Thoughts
●Learned a ton!
○Oxide
○Kubernetes
●Built user empathy.
●Things aren't as complex after a bit of prototyping.
●Context switching is difficult and costly.
●Check out our RFDs!
○RFD 0493: Initial Kubernetes Integrations
○RFD 0595: Oxide CSI Plugin

About Me
●First Solutions Software Engineer hire at
Oxide Computer Company.
○Generalist team with specialized scope.
○Now I lead a team of 4!
●Extensive Site Reliability Engineer (SRE)
background.
●Blogging at matthewsanabria.dev.
●Podcasting at fallthrough.fm.

Thank you