From Gatekeeper to Kyverno : Kubernetes Policy Management with Performance by Tanat Lokejaroenlarb

ScyllaDB 1 views 32 slides Oct 17, 2025

Slide 1 of 32

About This Presentation

This talk shares our journey migrating from Gatekeeper to Kyverno for Kubernetes policy management at Adevinta. Faced with the need for resource mutation beyond Gatekeeper’s capabilities, we explored Kyverno’s out-of-the-box support for both validation and mutation. We’ll cover the challenges,...

Size: 2.92 MB

Language: en

Added: Oct 17, 2025

Slides: 32 pages

Slide Content

A ScyllaDB Community
From Gatekeeper to Kyverno:
Kubernetes Policy Management
with Performance
Tanat Lokejaroenlarb
Staff Site Reliability Engineer

Tanat Lokejaroenlarb

Staff Site Reliability Engineer at Adevinta
■Runtime team focusing on SRE and Platform
Engineering
■I write blog posts about SRE and real world
incidents on https://tanatloke.medium.com
■I’m from ?????? living in Barcelona ??????

Agenda overview

■What do we do at Adevinta and why do we need “policies”?
■What motivates the evaluation of different tools?
■Kyverno vs OPA: migration and lesson learned

What do we do @Adevinta?

SCHIP
Internal Kubernetes Platform with managed capabilities hosting workloads for
e-commerce marketplaces across Europe
■30+ Production clusters in 4 AWS regions
■2k+ nodes, 80k+ pods, 250k+ rps at peak time
■“Multi-tenant”*

Policy management is a
cornerstone of Multi-tenant
Kubernetes

Policy management is used in different aspects
■[Security] Enforce Ingress hostname convention
●Prevent duplicated ingress hosts across namespaces
■[Abstraction] Prohibit annotations that interfere with our integration
●This ensure we avoid tight coupling with implementation detail, useful for migration
■[Operation] Prevent system tolerations
●Isolates system workloads and user workloads for stability/security

OPA was originally the only sane option. It’s battle-tested.

OPA policy example
Ensure unique hostname inside the cluster

violation[{"msg": msg}]
host := input.review.object.spec.rules[_].host
myns := input.review.object.metadata.namespace
other := data.inventory.namespace[_][otherapiversion]["Ingress"][name]
re_match("^(extensions|networking.k8s.io)/.+$", otherapiversion)
not host in input.parameters.UniqHostExceptions
other.spec.rules[_].host == host
other.metadata.namespace != input.review.object.metadata.namespace
msg := sprintf("ingress host '%v' in namespace %v conflicts with ingress '%v on ns %v' ",
[host, myns, name, other.metadata.namespace ])
}

It works, but….

1.REGO Complexity

violation[{"msg": msg}]
host := input.review.object.spec.rules[_].host
myns := input.review.object.metadata.namespace
other := data.inventory.namespace[_][otherapiversion]["Ingress"][name]
re_match("^(extensions|networking.k8s.io)/.+$", otherapiversion)
not host in input.parameters.UniqHostExceptions
other.spec.rules[_].host == host
other.metadata.namespace != input.review.object.metadata.namespace
msg := sprintf("ingress host '%v' in namespace %v conflicts with ingress '%v on ns
%v' ",
[host, myns, name, other.metadata.namespace ])
}

REGO is hard to understand and increase cognitive load

There’s only a few members in the team who are conﬁdent with REGO

2. Limited Mutating capabilities*
With increasing cases, we rely a lot on “Mutating capabilities” in our multi-tenant set up
■[Abstraction] Provide features with annotations
●Add nodeSelector/tolerations automatically based on annotation for nodepool selection
■[Operation] Inject hints based on speciﬁc resources types
●Prevent unintended disruptions for Cronjobs/Jobs pods
■[Operation] Provide a sane default conﬁguration
●PodDisruptionBudgets, Resources request/limits

OPA has Assign/AssignMetadata, but less ﬂexible

Assign or replace

3. Resource consumption at Scale!
Accessing real time state of cluster’s objects is common when
writing a more than simple validating policy.

3. Resource consumption at Scale!
Resource grows with more objects being involved

The alternative: Kyverno

1.YAML is SRE/DevOps Best friend
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: prevent-restricted-toleration
spec:
validationFailureAction: Enforce
rules:
- name: prevent-restricted-toleration
match:
any:
- resources:
kinds:
- Pod
operations:
- CREATE
- UPDATE

preconditions:
all:
- key: "{{ request.object.spec.tolerations[*].key }}"
operator: AnyIn
value:
- special-nodes
validate:
message: "Pod is denied because it has a forbidden toleration key
(schip-controller"
anyPattern:
- spec:
tolerations:
- key: "special-nodes"
operator: "*"
value: "*"
effect: "*"

2. Mutating made simple
mutate:
patchStrategicMerge:
spec:
template:
spec:
dnsConfig:
options:
- name: ndots
value:
"{{request.object.metadata.annotations.\"schip/
extended-ndots\”}}"
mutate:
patchesJson6902: |-
- path: "/spec/tolerations/-"
op: add
value:
key:
"alpha.gpu.node.schip.io/gpu"
operator: "Exists"
effect: "NoSchedule"

3. Less memory with apiCall
Similar caching style via Global Context Entry is available with more
namespace ﬁlter

4. Support Kubernetes native VAP/MAP and CEL

The migration

Made it ofﬁcial, Build consensus (ADR)

Gradual Migration Strategy
■No new policies will be added to OPA (bankruptcy)
●All new policies for both Validating/Mutating will be done with Kyverno
■Gradually migrate existing policies (priority)
●Ease of Moving (Rule Complexity)
■High Priority: Simple or standard policies that can be quickly expressed in Kyverno.
■Lower Priority: Complex Gatekeeper rules with intricate Rego logic.
●Resource Consumption
■High Priority: Rules that track or validate a high volume of resources (e.g., all Pods, or large subsets of
namespaced objects). Migrating these rules ﬁrst will yield the biggest gains in memory and performance
improvements.
■Lower Priority: Rules that apply to smaller subsets of resources or are rarely triggered.

Result

The lesson learned

Testing strategy

More robust via integration test

apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
name: deployment-replicas-higher-than-pdb
spec:
steps:
- name: 01 - Create policy
try:
- apply:
file: ../policy.yaml
- name: 02 - Create existing Deployments in cluster
try:
- apply:
file: existing-deployments.yaml
- name: 03 - Create bad PDBs
try:
- apply:
file: bad-pdb.yaml
expect:
- check:
($error != null): true

Start from Audit before Enforce

Webhook can affect your cluster*

It’s important to monitor latency/failure

Summary
■Policies management is important for multi-tenant platform
●OPA is robust but lacks Mutating capability and REGO increases cognitive load
●Kyverno is YAML-based and can work well with both Validating/Mutating scenario
■Migration strategy
●Start with team consensus
●Bankrupt and gradual migration, start from simple and high impact
■Monitor Latency and Denial for smooth operation

Thank you! Let’s connect.
Tanat Lokejaroenlarb
[email protected]
www.linkedin.com/in/tanatloke
https://tanatloke.medium.com

From Gatekeeper to Kyverno : Kubernetes Policy Management with Performance by Tanat Lokejaroenlarb

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

From Gatekeeper to Kyverno : Kubernetes Policy Management with Performance by Tanat Lokejaroenlarb

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx