Myriam Fentanes - Principal Prod Manager, Red Hat Jacqueline Koehler - Sr Manager Engineering, Red Hat Demo of Kubernetes at the edge for edge models deployment
Lifecycle of the Model Push Test Prepare Data Review Trusty AI Inner loop Outer loop ‹#› Experiment Train Gather Data Build Deploy Serve Model Monitor Retrain/Tune Physical Virtual Private cloud Public Cloud Edge Consistent observability and management
The management challenge ‹#› “72% of businesses cited manageability as the biggest obstacle in adopting edge computing” 1 1 Source: Omdia: 2021 Trends to Watch in Cloud Computing , Jan 2021
Centralized lifecycle management # of model servers Complexity of AI apps Inconsistent deployment interfaces Disconnected operations Compliance Challenge GitOps Declarative approach Approval process Version Control Eventually consistent
Centralized deployment management for models at the edge - Myriam Code Repository Distribution repository Download Dependencies Test Containerize Register AI Edge Inference Service container image Review Repository Manifest Kubernetes Kubernetes Kubernetes Controller Controller Controller Edge
Centralized deployment management for models at the edge - Myriam Code Repository Distribution repository Download Dependencies Test Containerize Register AI Edge Inference Service container image Review Repository Manifest Kubernetes Kubernetes Kubernetes Controller Controller Controller Edge
Centralized deployment management for models at the edge - Myriam Repository Distribution repository Download Dependencies Test Containerize Register AI Edge Inference Service container image Review Repository Manifest Kubernetes Kubernetes Kubernetes Controller Controller Controller Edge
Observability at the edge Core (public/private cloud, data center) Edge Node OTEL Collector Data Data Prometheus Pagerduty Dynatrace Edge Node OTEL Collector
Core OCM Hub 3) Retrieve Model 4) Build/test inference service container image 5) Push inference service container image Edge (Near/Far) Update? Model Storage - Trained Models GitOps repo 1) Store trained models 2) Trigger pipeline a) merge PR b) watches pulls updated manifest Image registry c) pulls inference container image d) deploys pods Deploy AI Edge inference service container Build AI Edge inference service container image AI Edge inference service container image(s) Inference metadata data Inference service container OCM Spoke Prometheus OTEL 6) PR with latest metadata Piecing it all Together