Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs

insideHPC 985 views 31 slides Feb 17, 2019
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

In this deck from the 2019 Stanford HPC Conference, Mohan Potheri from VMware,presents: Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs.

"This session introduces machine learning on vSphere to the attendee and explains when and why GPUs are important for...


Slide Content

Accelerating & Optimizing HPC/ML on vSphere Leveraging NVIDIA GPU Mohan Potheri, VMware, Inc Justin Murray, VMware, Inc

New Demands on IT VMware Goal and Approach Why Virtualize AI & ML Machine Learning Landscape Maximizing GPU Utilization Extending GPU Sharing to Containers Summary

New Demands on IT Infrastructure X86 SGX GPU NVM FPGA QAT IPU Specialized Hardware Security Hybrid Cloud Public Cloud Global Infra and Edge Growth of Apps Business Critical Apps Desktop Virtualization Graphic Intensive Cloud-Native Apps Edge/IOT SaaS Mobile Custom/Other Analytics/ AI/ML PMEM

Our Goal and Approach Increase agility and decrease time to discovery for researchers, data scientists, and engineers Provide IT with the ability to efficiently provision, allocate, manage and ensure compliance of research compute infrastructure across an increasingly broad range of technical and business requirements By leveraging VMware’s proven, enterprise-class virtualization and cloud technologies to meet the performance requirements of research computing, HPC, and ML workloads, and Bringing novel capabilities to bear to enable new capabilities not available in traditional HPC/ML environments CONFIDENTIAL 4

Simple cluster expansion and contraction Rapidly reproduce research environments Higher resiliency and less downtime with vMotion Fault-isolation (hardware and software) Cluster resource-sharing Minimize setup and configuration time with centralized management capabilities Simultaneously support mixed software environments Industry-leading virtualization platform that your IT already knows Easy, secure data access and sharing Security Isolation Multi-tenant data security Why Virtualize HPC AI/ML Infrastructure vSphere can help data scientists get to answers faster Operational Flexibility Reduced Complexity Secure Sensitive Workloads CONFIDENTIAL 5

Dispelling the Misunderstanding about GPUs on vSphere Hypervisor is not an intermediary when accessing the GPU GPU access is Directly via passthrough to VM or NVIDIA Grid vGPU Near Zero performance impact

Machine Learning Deep Learning Big Data Edge or IoT ON-PREM OFF-PREM training data training inference inference Machine Learning Infrastructure Landscape Data Analytics training Two Main Phases in ML Training / Model Building Often very large data sets Compute, storage, and network intensive Server-class infrastructure Inference / Scoring Apply existing models to new data Used for prediction Edge or core infrastructure VDI

Using GPUs with vSphere

VM Direct Path I/O for NVIDIA GPU

A Virtualized GPU PassThrough v Sphere 6.5/6.7 ESXi Host GPU VM VM Linux CUDA Library & Driver TensorFlow

Can provision VMs with one or more GPUs Easily reuse GPU infrastructure Same behavior as Public Cloud GPU instances Benefits: HW Isolation Workload Isolation VM Level Quality of Service Fast environment provisioning Near bare-metal performance Passthrough device certification for vSphere not required Server must be compatible with device as published by server OEM and GPU vendor Server must be vSphere Certified GPU Acceleration on vSphere with DirectPath I/O VM GPU App GPU App GPU App GPU App GPU App Caveats: No vMotion No Suspend and Resume No DRS No vSphere HA Learn more

VM DirectPath I/O – Multiple GPUs Attached to a Virtual Machine

vSphere GPU Sharing Mechanisms

Using GPUs with vSphere

Share single GPU among multiple VMs Provision VMs with partial up to one full GPU GRID vGPU VM Suspend and Resume support Quickly repurpose GPU infrastructure VDI or Data Science by day Compute (ML) by Night Benefits: HW Isolation Workload Isolation VM Level Quality of Service GPU Quality of Service Fast environment provisioning Bare-metal comparable performance VMware vSphere 6.7 and NVIDIA Quadro vDWS (GRID 7.0) GPU App GPU App GPU App GPU App GPU App GPU App GPU App GPU App Learn more

NVIDIA Grid – Two Layers of Software/Drivers

NVIDIA Grid Configuration – Choosing the vGPU Profile

Using GPUs with vSphere

Dynamic GPU attach anywhere Fractional GPUs for Efficiency Application Run Time Virtualization Standard based GPU Bitfusion Enables Remote GPU Sharing BF Client VM ESX Host BF Server VM ESX Host GPU Passthrough BF Server VM ESX Host GPU Passthrough BF Server VM ESX Host GPU Passthrough vSphere GPU Cluster BF Client VM ESX Host BF Client VM ESX Host BF Client VM ESX Host

Maximize GPU Utilization

vSphere 6.7 GPU Virtual Machine Suspend and Resume Source: Enhancing Operations for NVIDIA Grid Video Demo: https://youtu.be/PwVReRauY50 Blog Article: https://blogs.vmware.com/vsphere/2018/07/vsphere-6-7-suspend-and-resume-of-gpu-attached-virtual-machines.html

Go beyond a traditional batch-processing to viewing HPC resources as an engine for returning results in real time. Enable HPC compute jobs to harvest cycles from a VDI compute environment. Outcome Benefit Deep Learning Virtualization Use Case: Cycle Harvesting Challenge : Data Scientists submit jobs in traditional batches, because of compute availability Submit jobs one day Wait until the next day for the job results What if… The VDI environment has unused cycles. Could HPC jobs be run in the environment when it is not needed to run VDI? Will it blend?

Cycle Harvesting VMware ESXi VMware ESXi VMware ESXi 100 100 100 100 100 100 1 1 Share Value 100 8AM Time Noon 5PM 10PM 1

Cycle Harvesting Case Study https://bit.ly/2MrBngH

Extending GPGPU Sharing to Containers

Why Singularity Containers? Docker is not designed for HPC architectures Singularity is the best suited Container solution for HPC: Singularity container is encapsulated in a single file making it highly portable and secure. Singularity is designed from the ground up for scientific computing

Combining Virtual Machines & Containers for GPU sharing Sharing GPUs in a container is difficult as there is no resource management vSphere VM with NVIDIA Grid or Bitfusion can use whole or partial GPU Containers are a great packaging mechanism for applications By enclosing one container per virtual machine, we get the best of both worlds GPU resources can be shared with other containers Machine and Deep Learning applications & platforms can be packaged and distributed effectively as a container

Logical Schematic of Infrastructure components One Singularity Container per VM Containers leverage partial or full GPUs allocated to the virtual machine Container packaged with TensorFlow, tools, etc. Bitfusion provides GPU sharing BF Server VM ESX Host GPU Passthrough BF Server VM ESX Host GPU Passthrough BF Server VM ESX Host GPU Passthrough vSphere GPU Cluster Singularity Container Virtual Machine ESX Host Singularity Container Virtual Machine ESX Host vSphere Generic Cluster

Images/sec Throughput comparison for 1 GPU 2.5-3X more throughput with sharing

Runtime comparison for 1 GPU (with/without sharing) 17% Only 17% slower for nearly 3X Throughput

Summary Sharing is key to enable cloud like capabilities on premises vSphere is the best platform to leverage latest high performance hardware Virtualization supports device sharing and delivers near bare-metal performance HW Sharing through vSphere can increase utilization. (Cycle Harvesting) 31