2|
Introduction
•Michal Orzel
oMaintainer of Xen on Arm
oSMTS at AMD
oFounding member of Xen Safety certification within AMD
oActive Xen community member (350+ patches authored/reviewed in the last 3 years)
•Ayan Kumar Halder
oSupporting Xen hypervisor on AMD products (as a part of Virtualization team)
oMTS at AMD
oCoordinating Xen functional safety efforts across different teams
o15 years of experience working on low level software stack (kernel, Zephyr, ethos-n-driver, bootloaders) and post
silicon validationof Arm based products
3|
FunctionalSafety
•“Functional safety is part of the overall safety thatdepends on a system or
equipment operating correctly inresponse to its inputs.” (IEC 61508)
•“Absence of unreasonable risk due to hazards caused
bymalfunctioning behavior of E/E systems” (ISO 26262)
•“Safety is freedom from unacceptable risk” (ISO 14971)
4|
However, it provides benefits beyond safety
Safety
Quality
Usability
Maintainability
Upgradability
Diverse
Opensource
CommunitySafety Certification
Efforts
5|
Safety certifying Xen Hypervisor
•AMD is working on making Xen safety-certifiable for AMD platforms
•ARM and AMD x86 platforms
•IEC 61508 SIL 3 (Systematic Capability 3) & ISO 26262 ASIL D
•Certification based on Xen upstream community processes and upstream codebase
•Not working with a private fork -- Ability to update the certification with limited efforts
•Certification docs & artifacts available for AMD customers
•Open to collaborations with other community members upstreaming
•Assumptions and Scope
•Common code and core components in Xen
•AMD x86: AMD-v, AMD-Vi, IOMMU, HPET, vPCI
•ARM: SMMUv3, GICv3, Arch Timer, Hypervisor Extensions, vPCI
•Easy to port to future generations of hardware
•Xen enabling components for Virtio and Xen PV Drivers
•Safe memory sharing using Virtio with grant table;
•Virtio with grant table enables Virtio frontends in Safe VMs
•No OS/hypervisor dependencies: run (multiple) Safe VMs and QM VMs of your choice
6|
Core features of Xen for safety certification
•Microkernel architecture
•Small code size (less than 50K LOC on Arm)
•Dom0less
•Static partitioning
•Each domain has direct hardware access (IOMMU protected)
•Real-time
•Strong isolation between thedomains
•Real-time isolation
•Failure isolation
•By default, each component has only enough privilege to do what it needs
todo
oParallel boot
•Dom0 becomes optional
•Faster time to boot/service
•Real-time with thenullscheduler
•Cache isolation withcache coloring
Xen
SafeOS
e.g. Zephyr
QM
e.g. Linux
HW
HW partition
HW partition
createcreate
accessaccess
QM == Non-Safe
7|
Xen Safety Progress
•Xen MISRA C compliance
•MISRA C: coding guidelines for safe C programming
•Goal: improve the Xen codebase
•MISRA C compliance is never at the expense of quality
•Requirements, Test Cases, Tests
•Define scope and requirement's structure
•“market”, “product” and “software safety” requirements
•Traceability
•Link requirements, tests, and code
8|
Xen Safety Progress: MISRA C
•Preliminary tailoring resulted in the selection of 143 MISRA C rule candidates
•MISRA C rules adoption in progress:
•135discussed among maintainers and Bugseng experts
•114rules adopted and added to docs/misra/rules.rst
•Only 6 rules left to discuss!
•Rules added to docs/misra/rules.rst
•Xen 4.18 release: 148 commits to fix MISRA C violations by
•MISRA C unjustified violations down from 2 million to 90,000!
•ECLAIR MISRA C scanner integrated in the upstream Xen Gitlab CI-loop
•76rules checked with zero unjustified violations (“clean” and checked against regressions)
•9more rules are also clean on arm64 only and 1 rules on x86_64 only
•21additional rules will also be checked against regressions (some violations are present)
9|
Xen Safety Progress: Requirements
•Deriveddirectly from thetechnical safety requirementsallocated to softwareor
arerequirements for safetyfunctions and propertiesthat, if not fulfilled, couldlead to a
violation of the technical safetyrequirements allocated to software
•400 requirements:
•Market Requirements (or L1 reqs)
•Product Requirements (or L2 reqs)
•Software Safety Requirements (or L3 reqs)
Market
requirements
Product
requirements
Software safety
requirements
Test
case
Test
code
Test
job
M to NM to N1 to NN to 11 to 1
10|
Why writing requirements
•Before developing a new feature, we need to
answer three question :-
oWhat the feature is
oWhy is the new feature required
oHow is it designed/implemented
•Currently
oWhy/What/How are explained in the commit message
oOptionally, there may be a design note explaining how
•With requirements being written as a separate entity
oWhy and What is decoupled from the code. Easy to
view the big picture
oHow can still be addressed in the commit message or
design note.
oLinking "Why --> What --> How" ??
Commit message,
design notes, documentation
Rationale (explains
why)
What the
feature is
How the feature is
being implemented
by Xen
11|
Market Requirements
•Identify the scope of the safety certification for Xen
•Defines the expectations of Xen for automotive and embedded use cases
oSo, this is mostly of interest to the product marketing or FAE folks who have expectations from a hypervisor.
•Written with a high-level view of the system
•Example:
Name Description
Static VM definitionXen shall specify the resources required to boot and
run safe and non-safe VMs.
Run Arm64 and AMD-x86 VMsXen shall run Arm64 and AMD x86 VMs.
VM device assignmentXen shall be able to assign devices to each VM. For
e.g.: it should be able to assign GPU to VM1, MMC to
VM2. Only the VM assigned to a device shall have
exclusive access to the device.
12|
1 Market Requirement à N Product Requirements
•Product Requirements explain how Market Requirements are fulfilled by Xen
•Product Requirements are Xen specific
oSo, this of interest to Xen architects who understand how the requirements are fulfilled by Xen.
•Still written with a high-level view of the system
•Product Requirements can sometimes be linked to more than one Market Requirement
Emulated UART
13|
1 Product Requirements à N Software Safety Requirements
Domain shall be able to read the frequency of the system counter (either via
CNTFRQ_EL0 register or "clock-frequency" device tree property if present).
Access virtual timer from a domain
Trigger the physical timer interrupt from a domain
Trigger the virtual timer interrupt from a domain
14|
Characteristics of Software Safety Requirements
•Therequirements arewritten inplain English, from the perspective of what Xen is expected to fulfil
•Thesoftware safety requirements (SSR) are the most granular form
•Engineers are expected to refer to aSSR (and architecture spec) to write a testto validate it
oThis is of interest to the subsystem maintainers or folks working on specific parts of Xen (eggeneric timer).
•Each SSR should be tested independently
•SSR should beunambiguous,complete, consistent, correct
•SSR should be traceable all the way to market requirements
15|
Organizing the software safety requirements
Booting
Domain Creation and Runtime
•Domain Creation
•Domain Fully Emulated Resources
•Domain Partially Emulated Resources
•Hypercalls
•Physical Resources
FirmwarePhysical resources
•Xen shall be able to create a domain using a
specifiedkernel image.
•Domain shall be able to transmit data in polling mode
(i.e. without involving interrupts). (Emulated UART)
•Domain shall be able to access the counter-timer
kernel control register to allow controlling the access to
the timer from userspace (EL0). (Generic Timer)
•Domain shall be able to access
__HYPERVISOR_xen_version passing
XENVER_version as a command.
•Xen shall validate the presence of mandatory SMMU
features ….
•Xen shall be able to configure and use
HPET timer in one shot mode.
•Xen shall be able to receive HPET interrupt.
•Xen shall be able to invoke
SMCCC_VERSION as a parameter to
PSCI_FEATURES to obtain the SMCCC
version.
•Xen shall invoke psci (PSCI_FEATURES)
to obtain the features.
•Xen shall enable Memory Management
Unit.
•Xen shall use 4KB page granularity.
•Xen shall enable instruction cache.
Still something is missing ??
16|
Assumption of Use
Hardware
Firmware
BootloaderDomain
Compiler
Xen relies on
them to fulfill its
functionality
•GCC version should be 5.1 or later (arm64)
•GCC version should be 4.1.2_20070115 or
later (x86)
•Xen shall be loaded at Non-Secure EL2
exception level.
•Bootloader shall pass physical address of
the host device tree in x0 register.
•The hardware needs to have the ARM Generic
Interrupt Controller, version 3
•The hardware needs to have the Arm System Memory
Management Unit, version 3 onwards
•Domain should not write access
GICD_ISACTIVER<n> registers
•Domain should not use physical LPIs
without ITS
•CNTFRQ_EL0 needs to be programmed
with the system timer frequency. Or the
"clock-frequency" dt property should be
used.
•TF-A shall provide PSCI api
(PSCI_VERSION) to read the version.
17|
Software Architecture Specification
•Definesthe major elementsand subsystemsof the
software, howtheyareinterconnected, and howthe
requiredattributes, particularlysafetyintegrity, willbe
achieved
•Definesthe overallbehaviourof the software, and how
software elementsinterfaceand interact
•Satisfiesbothsafetyand non-safetyrequirements
•Initialversion of the documentwritten
18|
Software ValidationTest Cases
•Validation-"confirmationby examinationand provisionof objectiveevidencethatthe particular
requirementsfor a specificintendedusearefulfilled" (IEC 61508)
•120test cases (written as RST docs):
•Define:
§test objectives
§test prerequisites
§stepsrequiredto achievethe objectives
§test pass/failcriteria
Methods
ASIL
ABCD
1aAnalysis of requirements++++++++
1bGeneration and analysis of equivalence classes+++++++
1cAnalysis of boundary values++++++
1dError guessing based on knowledge of experience++++++
1eAnalysis of functional dependencies++++++
1fAnalysis of operational use cases+++++++
19|
Software ValidationTests
•160tests
•Verification criteria:
oCompliancewith software design spec
oCorrect implementation of the functionality
oRobustness check
oAbsence of unintended functionality
•3 typesof tests:
oLinux
§designed for high-level functionality testing through a black-
box approach
§tests written as userspace apps, kernel modules
§example: device tree parsing testing
oZephyr
§target mid-level functionalities with a focus on components
such as UART, Timer, etc.
§tests written as Zephyr applications
§example: emulated UART testing
oXTF
§designed for low-level functionality suited for examining the
core functionalities such as hypercalls
§tests written as XTF guests
§example: event channel hypercall testing
Methods
ASIL
ABCD
1aRequirements based test++++++++
1bInterface test++++++++
1cFault injection test++++++
1dResource usage evaluation++++++++
1eBack-to-back comparison test between model andcode, if
applicable
++++++
1fVerification of the control flow and data flow++++++
1gStatic code analysis++++++++
1hStatic analysis based on abstract interpretation++++
21|
•Keep doing the stringent code reviews
•Writing and Upstreaming requirements
•Writing and Upstreaming architecture specs
Community Collaboration
How can the community participateHow does the community benefit
•Better code quality
•Ease of onboarding new engineers
•Easier to explain to customers, FAE, management