From Gatekeeper to Kyverno : Kubernetes Policy Management with Performance by Tanat Lokejaroenlarb
ScyllaDB
1 views
32 slides
Oct 17, 2025
Slide 1 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
About This Presentation
This talk shares our journey migrating from Gatekeeper to Kyverno for Kubernetes policy management at Adevinta. Faced with the need for resource mutation beyond Gatekeeper’s capabilities, we explored Kyverno’s out-of-the-box support for both validation and mutation. We’ll cover the challenges,...
This talk shares our journey migrating from Gatekeeper to Kyverno for Kubernetes policy management at Adevinta. Faced with the need for resource mutation beyond Gatekeeper’s capabilities, we explored Kyverno’s out-of-the-box support for both validation and mutation. We’ll cover the challenges, migration process, key lessons learned, and best practices for testing, performance, and monitoring.
Size: 2.92 MB
Language: en
Added: Oct 17, 2025
Slides: 32 pages
Slide Content
A ScyllaDB Community
From Gatekeeper to Kyverno:
Kubernetes Policy Management
with Performance
Tanat Lokejaroenlarb
Staff Site Reliability Engineer
Tanat Lokejaroenlarb
Staff Site Reliability Engineer at Adevinta
■Runtime team focusing on SRE and Platform
Engineering
■I write blog posts about SRE and real world
incidents on https://tanatloke.medium.com
■I’m from ?????? living in Barcelona ??????
Agenda overview
■What do we do at Adevinta and why do we need “policies”?
■What motivates the evaluation of different tools?
■Kyverno vs OPA: migration and lesson learned
What do we do @Adevinta?
SCHIP
Internal Kubernetes Platform with managed capabilities hosting workloads for
e-commerce marketplaces across Europe
■30+ Production clusters in 4 AWS regions
■2k+ nodes, 80k+ pods, 250k+ rps at peak time
■“Multi-tenant”*
Policy management is a
cornerstone of Multi-tenant
Kubernetes
Policy management is used in different aspects
■[Security] Enforce Ingress hostname convention
●Prevent duplicated ingress hosts across namespaces
■[Abstraction] Prohibit annotations that interfere with our integration
●This ensure we avoid tight coupling with implementation detail, useful for migration
■[Operation] Prevent system tolerations
●Isolates system workloads and user workloads for stability/security
OPA was originally the only sane option. It’s battle-tested.
OPA policy example
Ensure unique hostname inside the cluster
violation[{"msg": msg}]
host := input.review.object.spec.rules[_].host
myns := input.review.object.metadata.namespace
other := data.inventory.namespace[_][otherapiversion]["Ingress"][name]
re_match("^(extensions|networking.k8s.io)/.+$", otherapiversion)
not host in input.parameters.UniqHostExceptions
other.spec.rules[_].host == host
other.metadata.namespace != input.review.object.metadata.namespace
msg := sprintf("ingress host '%v' in namespace %v conflicts with ingress '%v on ns %v' ",
[host, myns, name, other.metadata.namespace ])
}
It works, but….
1.REGO Complexity
violation[{"msg": msg}]
host := input.review.object.spec.rules[_].host
myns := input.review.object.metadata.namespace
other := data.inventory.namespace[_][otherapiversion]["Ingress"][name]
re_match("^(extensions|networking.k8s.io)/.+$", otherapiversion)
not host in input.parameters.UniqHostExceptions
other.spec.rules[_].host == host
other.metadata.namespace != input.review.object.metadata.namespace
msg := sprintf("ingress host '%v' in namespace %v conflicts with ingress '%v on ns
%v' ",
[host, myns, name, other.metadata.namespace ])
}
REGO is hard to understand and increase cognitive load
There’s only a few members in the team who are confident with REGO
2. Limited Mutating capabilities*
With increasing cases, we rely a lot on “Mutating capabilities” in our multi-tenant set up
■[Abstraction] Provide features with annotations
●Add nodeSelector/tolerations automatically based on annotation for nodepool selection
■[Operation] Inject hints based on specific resources types
●Prevent unintended disruptions for Cronjobs/Jobs pods
■[Operation] Provide a sane default configuration
●PodDisruptionBudgets, Resources request/limits
OPA has Assign/AssignMetadata, but less flexible
Assign or replace
3. Resource consumption at Scale!
Accessing real time state of cluster’s objects is common when
writing a more than simple validating policy.
3. Resource consumption at Scale!
Resource grows with more objects being involved
3. Less memory with apiCall
Similar caching style via Global Context Entry is available with more
namespace filter
4. Support Kubernetes native VAP/MAP and CEL
The migration
Made it official, Build consensus (ADR)
Gradual Migration Strategy
■No new policies will be added to OPA (bankruptcy)
●All new policies for both Validating/Mutating will be done with Kyverno
■Gradually migrate existing policies (priority)
●Ease of Moving (Rule Complexity)
■High Priority: Simple or standard policies that can be quickly expressed in Kyverno.
■Lower Priority: Complex Gatekeeper rules with intricate Rego logic.
●Resource Consumption
■High Priority: Rules that track or validate a high volume of resources (e.g., all Pods, or large subsets of
namespaced objects). Migrating these rules first will yield the biggest gains in memory and performance
improvements.
■Lower Priority: Rules that apply to smaller subsets of resources or are rarely triggered.
Summary
■Policies management is important for multi-tenant platform
●OPA is robust but lacks Mutating capability and REGO increases cognitive load
●Kyverno is YAML-based and can work well with both Validating/Mutating scenario
■Migration strategy
●Start with team consensus
●Bankrupt and gradual migration, start from simple and high impact
■Monitor Latency and Denial for smooth operation