Use Scenario of TEE on Client Platform
•Primary OS:
•Expect identical UI as native
•Expect manage all platform resource
→Pass-thru platform resource as much as possible
•TEEs:
•Confidential computing
•Not trust primary OS & Other TEEs
→Isolated by hypervisor or TDX etc.
Expected a hypervisor w/ small TCB(Trusted Computing Base) and transparent to most platform resources
Other names and brands may be claimed as the property of othersPrimary OS (act like native)
VM
VM
Hypervisor/TDX/SEV..
Platform Specific Resource
TEE
Other Resource
UI
TEE
Protected KVM (pKVM)
* Dev Repo: https://android-kvm.googlesource.com/linux
•3
rd
party trusted apps (DRM, Crypto) may
not be trustworthy to put together -seek
for isolation
•No valid hypervisor in secure world
→
•Protected KVM
•Initiated by Google, based on ARM,for
Android
•Extend TEE from secure world to the VMs
•Isolation of TEEs based on VMs
•Designed for confidential computing
•Higher privilege to primary OS kernel
•Small footprint -reduced TCB
•Trusted guestDM
EL0
EL1
EL2
Primary OS
Kernel (KVM-high)
KVM-low (pKVM)
App DRM
Trusted
App
Trusted OS
Crypto
Trusted
App *
Trusted OS *
DRM Crypto
Trusted OS
Hypervisor
Firmware / Secure Monitor
Trusted
App *
Trusted
App *
Trusted OS *
Trusted
App *
Trusted
App *
Trusted OS *
sEL0
sEL1
sEL2
sEL3
Non
-
Secure World
Secure World
Other names and brands may be claimed as the property of others
Primary OS
Kernel
KVM-High
pKVM
Guest State
IOMMU
Stage 2 Protection
Memory
Stage 2 Translation
Context Switch
Guest Attestation
TEE VMDevice Model(DM)
pKVM
VM Manage
IO Emulation
(virtio..)
IOMMU pKVM Flow Overview
5
1.Verified Boot 2. De-Privilege 3. Runtime
Post-launch
• Verified boot as one image
• Split privilege from primary OS kernel
Split KVM Module
• KVM-high (w/ primary OS kernel): keep running like native
• KVM-high:keep legacy VMM roles, at the same time provide extra
assistant to pKVM
• KVM-low (pKVM):runs in high-privilege level with super small TCB
Trusted Guest (TEE)
• Attestation for guest
• Protected guest memory and state
• Security enforcement in pKVM
Not in the Scope
• DoS attack from the primary OS, as primary OS
owns VCPU scheduling and VM managementPrimary OS
Kernel
KVM
pKVM
IOMMU Primary OS
Kernel
pKVM
High Privilege
Low Privilege
KVM-High
pKVM
Device Model(DM)
VM Manage
IO Emulation
(virtio..)
IOMMU
De-privilege Kernel (ARM vs. X86)
6
Non-VHE mode: Kernel: EL1 →Kernel: EL1 + pKVM: EL2 VMX operation mode: Kernel: root →Kernel: non-root + pKVM: rootPrimary OS
EL0
EL1
EL2
Primary OS Kernel
Default Vector
UI APP
pKVM Primary OS
EL0
EL1
EL2
Primary OS Kernel
pKVM
UI APP Primary OS
UI App
Primary
OS
Kernel:
Primary
OS
User:
Root
KVM
pKVM pKVM
Primary VM
UI App
Primary
OS
Kernel:
Primary
OS
User:
Root
Non-
Root
KVM
Other names and brands may be claimed as the property of others
Transparent Platform Resource
7Extension for pKVM Reuse Linux/KVM
Common mechanism for ARM & X86:
•Identical memory mapping for primary VM
•Pass-thru system memory except pKVM code/data
•Pass-thru MMIO except IOMMU
•Pass-thru interruptpKVM
Primary VM#0
Interrupt Mgt
Scheduler OSPM
Mem Mgt
ACPI Driver
VMCS#0
EPT
(primary)
Platform Specific Resource Other Resource
Native ACPI
UI
Native Drivers
App
Primary
OS
Kernel:
Primary
OS
User:
Root
Non-
Root
Other names and brands may be claimed as the property of othersFigure only for X86
Memory Protection
8Extension for pKVM Reuse Linux/KVM
Common mechanism for ARM & X86:
•Memory transition[1] shall record page ownership
•Page ownership maintained in page tables
ARM:
•Memory transition done by KVM MMU through HVC
X86:
•Memory transition done during EPT shadowing within
pKVM
oMinimize changes in KVM MMU
o[Discuss] Shall move to use Hypercall?
Add-on:
•Fd-based MM[2] to support guest private memory
[1] https://lwn.net/Articles/872889/
[2] https://lwn.net/Articles/887500/pKVM
(KVM-low)
Primary VM
EPT
(primary)
Primary
OS
Kernel:
Primary
OS
User:
Root
Non-
Root
TEE VM
KVM-high/MMU
Shadow EPT
(TEE)
Virtual EPT
shadowing
Device Model
Fd-based
MM
Other names and brands may be claimed as the property of othersFigure only for X86
Interrupt Handling
9
Common mechanism for ARM & X86:
•Physical interrupt handed in primary VM
•Virtual interrupt managed by KVM-highExtension for pKVM Reuse Linux/KVM pKVM
(KVM-low)
Primary VM#0
VMCS#0
Primary
OS
Kernel:
Primary
OS
User:
Root
Non-
Root
TEE VM#1
KVM-high
Virtual
VMCS
Shadow VMCS#1
Interrupt Mgt
Scheduler
vAPIC
Devices
Other names and brands may be claimed as the property of othersFigure only for X86
MMIO Handling
1
0
Common mechanism for ARM & X86:
•General MMIO emulated by VMM in primary OS
•VIRTIO: TEE VM explicitly share memory to primary
X86:
•Specific for TEE VM:
•Instruction emulation: need access TEE’s memory
→move to TEE VM + Guest Hypercall for IO_REQ
(leverage from TDX)
ARM:
•No need instruction emulation as covered by HWExtension for pKVM Reuse Linux/KVM pKVM
(KVM-low)
Primary VM#0
VMCS#0
Primary
OS
Kernel:
Primary
OS
User:
Root
Non-
Root
TEE VM#1
KVM-High
Virtual VMCS
Shadow VMCS#1
Shadow EPT
(TEE)
Virtual EPT
Normal VM#2
Shadow VMCS#2
Shadow EPT
(normal)
Shared Mem
Device Model
VIRTIO BE
VIRTIO FE
VIRTIO FE
Inst Emulation
Inst Emul
Other names and brands may be claimed as the property of others
General MMIO Emulation for Normal VM
MMIO Emulation for TEE VM
Figure only for X86
DMA Protection
1
1
Common requirements for ARM & X86:
•vIOMMU in primary VM –untrusted dev + protects VM
•pIOMMU in pKVM –protects TEE VM
•[Discuss] Align with page ownership
X86 –based on VTd scalable mode:
•Primary VM owns first level page table
•pKVM owns second level page table
oUnified with page state manage table
ARM:
•S2MPU (not general?)
•SMMU (To be updated)Extension for pKVM Reuse Linux/KVM Primary VM
vIOMMU
pKVM
(KVM-low)
Primary
OS
Kernel:
Root
Non-
Root
TEE VMNormal VM
1
st
level PT
(nGPA->pGPA)
1
st
level PT
(tGPA->pGPA)
IOMMU
2
nd
level PT
(pGPA->HPA)
2
nd
level PT
(tGPA->HPA)
VFIO
normal dev
TEE dev
normal dev
(nested mod e)
TEE dev
(2
nd
level only)
Other names and brands may be claimed as the property of others
shadowing
Figure only for X86
Primary EPT TEE EPT
Key Arch Comparation
1
2
ARM X86
De-privilege Non-VHE mode[1]
Kernel: EL1 -> Kernel: EL1, pKVM: EL2
VMX operation mode
Kernel: root -> Kernel: non-root, pKVM: root
Memory Protection Stage-2 memory translation table
Embedded w/ KVM MMU through HVC
Extended Page Table
EPT shadowing within pKVM
(no change to KVM MMU, but share page state API w/ ARM)
Interrupt Handling Physical interrupt handled in primary VM
Virtual interrupt managed by KVM-high
MMIO Handling General MMIO emulated by VMM
DMA Protection S2MPU
SMMU (TBU)
VTd (Scalable Mode) [2]
Guest Attestation Template bootloader [1]
[1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/slides/kvmforum-2020-edited.pdf
[2] https://cdrdv2.intel.com/v1/dl/getContent/671081
Other names and brands may be claimed as the property of others
pKVM-X86 Arch OverviewpKVM
(KVM-low)
Primary VM#0
VMCS#0
EPT
(primary)
Reuse Linux/KVMExtension for pKVM
Primary
OS
Kernel:
Primary
OS
User:
TEE VM#1
KVM-High
Virtual VMCS
Shadow VMCS#1
Shadow EPT
(TEE)
Virtual EPT
Normal VM#2
Shadow VMCS#2
Shadow EPT
(normal)
vAPIC
Shared Mem
Device Model (#1, #2)
VIRTIO BE
VIRTIO FE
VIRTIO FE
vIOMMU
(GP A->pGPA)
Native Dev & ACPI Drivers
Interrupt
Mgt
Scheduler OSPM
Mem Mgt
VFIO
IOMMU
(pGPA->HPA)
UI
Platform Specific Resource Other Resource
Native ACPI
Security Enforcement
Inst Emulation
Performance Evaluation –Primary VM
99.8% 100.2% 98.8% 100.2%
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
seq read 256Kseq write 256Krand read 4Krand write 4K
NVMePass-thru
99.9% 100.3%
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
CPU: Geekbench(single core) Mem: Stream
CPU & Memory
Primary VM: Close to Native
*normalized data (“primary-VM-result/native-result” )
Performance Evaluation –pKVM VM
100.00% 100.00% 100.00% 100.00%101.18% 101.73%
93.67% 93.77%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
seq_read(256K) seq_write(256K) rand_read(4K) rand_write(4K)
Virtio Block
kvm VM pkvm VM
100.00% 100.00%100.07% 99.02%99.74%
95.27%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
MEM: stream CPU: geekbench (multi core)
CPU & Memory
nativekvm VM pkvm VM
100.00% 100.00% 100.00% 100.00%
94.86%
102.04%
98.55%
100.41%
97.67%
101.88%
98.26% 98.64%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
seq_read(256K) seq_write(256K) rand_read(4K) rand_write(4K)
NVMe Pass-thru
native kvm VM pkvm VM
pKVM VM: Close to KVM VM
Status Update & Next Step
•Current Status
•De-privileged primary OS
•Run normal VM w/ emulated & pass-thru IO (based on vIOMMU)
•Run TEE VM w/ memory protection
•Current LOC: ~13K
•Next Step
•Publish pKVM-IA repo
•Align common framework for pKVM (X86 & ARM etc.)
•Support TEE VM w/ pass-thru IO (based on vIOMMU)
•Support TEE VM w/ VIRTIO (share memory + IO Request)
•Support TEE VM w/ security enforcement
•Support TEE VM w/ guest attestation
•Target LOC: <25K
Other names and brands may be claimed as the property of others
Backup
Performance Evaluation –Configuration
SPR Platform w/ VTd
Scalable mode
Configuration
Native 16 CPUs, 32G Memory (test will limit to 16G), 5.17.0 kernel
pKVM primary VM 16VCPUs, 16G Memory, 5.17.0 kernel
KVM L1 VM 16VCPUs, 16G Memory, 5.17.0 kernel
pKVM L2 VM 16VCPUs, 16G Memory, 5.17.0 kernel
Benchmark Command Comments
CPU: geekbench geekbench –single-core –no-upload –cpu
geekbench –multi-core –no-upload –cpu
Test on TGL: One CPU + multi CPUs
MEM: stream stream Test on TGL: multi CPUs
FIO: seq_read (256K) fio -filename=./seq_read -allrandrepeat=1 -blocksize=256K -direct=1 -iodepth 256 -
rw=read -ioengine=libaio -size=2G -numjobs=8 -name=fio_read
Test on SPR: Multi CPUs, 256K block size,nocache
FIO: seq_write (256K)fio -filename=./seq_write -allrandrepeat=1 -blocksize=256K -direct=1 -iodepth 256 -
rw=write -ioengine=libaio -size=2G -numjobs=8 -name=fio_write
Test on SPR: Multi CPUs, 256K block size, no cache
FIO: rand_read (4K) fio -filename=./rand_read -allrandrepeat=1 -blocksize=4K -direct=1 -iodepth 256 -
rw=randread -ioengine=libaio -size=1G -numjobs=8 -name=fio_randread
Test on SPR: Multi CPUs, 4K block size, no cache
FIO: rand_write (4K) fio -filename=./rand_write -allrandrepeat=1 -blocksize=4K -direct=1 -iodepth 256 -
rw=randwrite -ioengine=libaio -size=1G -numjobs=8 -name=fio_randwrite
Test on SPR: Multi CPUs, 4K block size, no cache
TGL Platform w/ VTd
legacy mode
Configuration
Native 16 CPUs, 16G Memory, 5.17.0 kernel
pKVM primary VM 16VCPUs, 15G Memory, 5.17.0 kernel
KVM L1 VM 16VCPUs, 15G Memory, 5.17.0 kernel
pKVM L2 VM 16VCPUs, 15G Memory, 5.17.0 kernel