Grifo-Infographics-2023 THE MOST POWERFUL SINGLE NODE UNIFIED E MEMOY SUPERCOMPUTER

DalportoBaldo 37 views 26 slides Aug 23, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

SUPERCOMPUTER FOR HPC AND I.A. APPLICATIONS


Slide Content

The most powerful single node unified & memory supercomputer HPC & DATA-ANALYTICS ™ UNIVERSAL Massive Multi-GPU Server Model-A™ Ing. Emilio Billi CTO

CTO A3Cube Inc. Ing. Emilio Billi is the founder, chief development officer and chairman of A3Cube Inc , a company that specializes in Artificial Intelligence (AI), hardware and software design, sound engineering research and automotive research. He is also an inventor, engineer, speaker and successful entrepreneur with 20 years of experience in founding, developing and growing companies. He has proven knowledge and expertise in high-performance computing, HPC interconnection networks, machine learning, deep learning algorithms and hardware architectures. Billi has invented and developed many different computer architectures, networked medical devices, intelligent sensors networks, FPGA-based data center acceleration and supercomputers for data analytics and AI. He is also the author of the specification of HyperTransport technology, which is implemented in more than two million devices, such as Microsoft’s X-BOX, AMD, CISCO, Cray, Dell, HP and IBM . He has been a consultant for Acer, AMD, Cray, Google, NICEVT and RSC-SKIF. Billi ’s business goal is to bring to market the fastest and most efficient machines that help to make breakthroughs in understanding the critical connection between data

Hardware Co- Design Ing. Baldo Alberto Luigi Dalporto Investigador por el C.E.R.N. ( Centro Europeo de Investigación Nucleare ) en Suiza a Ginebra 1995-2012 Has invented and developed many different computer architectures ( SuperCluster ) ( Blade Server ) ( Storage H.A. )

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE A3Cube GRIFO™ What is it? GRIFO™ is cluster-scale compute in a single system , single OS , and even single memory that holds up to 128 accelerators directly connected using a global shared memory to the CPU host. With more than 880,000 of computing cores in a single system, It’s faster than any other system in the world in many strategic applications. And…if you need more you can combine multiple GRIFO™ together… UNIVERSAL Massive Multi-GPU Server

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE Up to 128 NVIDIA® A100 GPUs A3Cube RONNIEE 4 Th Direct GPU-to-GPU Connection Between All 90 GPUs Up to 79 Peta OPS AI Total Compute Up to 24 NVMe Storage disks supporting GPU Direct Up to 10.2 TB Unified Memory GRIFO is The Most Powerful Single Node End-to-End AI and HPC Platform Grifo™ A10090 Single Node UNIVERSAL Massive Multi-GPU Server 880,000 computing cores

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE A unified FULLY INTEGRATED architecture Integrated NVMe storage 24 Bays Dual AMD Epyc CPU with up to 4 Tb of Main Memory GPU Fabric with Nvidia GPU direct fully supported(*) GPU Array Liquid Cooling Sub System 1600 Gbit/s Infiniband I/O (*) GPUs access directly to the storage for fastest computation UNIVERSAL Massive Multi-GPU Server

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE Under the Hood THE Pooled gpu technology 128 GPUs In a Pooled array MIPS based smart P2P routing Direct Memory Access Direct Storage Access 8 x 200 Gbit/s Direct CPU/GPU attached Storage Ultra High Speed Storage Pools (NVMe based) Up to 4 Tera Bytes of CPU memory 8 Concurrent Channel x CPU RONINEE 4 In memory Network (PCIe advanced Fabric (CPU-GPU-IO-Storage) NON BLOKING NVMe Ultra Fast Storage Devices A100 CUDA Cores & Tensor Core 205 TeraBytes/s GPU memory bandwidth UNIVERSAL Massive Multi-GPU Server TeraBytes GPU Memory per system 10.2 Up to 880,000 Up to Up to 1.6 Terabits/s I/O bandwidth Bloch Diagram CPU Cores

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNIVERSAL Massive Multi-GPU Server Record Performance Numbers Recap (GRIFO™ 128) FP64 (Double-Precision) 1.2 PetaFlops FP32 (Tensor-Cores) 19.9 PetaFlops FP64 (Tensor-Cores) 2.4 PetaFlops FP16 40 PetaFlops FP32 (single-Precision) 2.4 PetaFlops INT8 79 PetaFlops

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE Under the Hood THE Pooled gpu technology UNIVERSAL Massive Multi-GPU Server CPU Cores Single Server Configuration Possibile Options CPU Cores GRIFO 16 110,592 Cores GPU Memory TB 1.2 Up to 10 PFlops AI Power 6,6 KW/h GRIFO 24 165,888 Cores GPU Memory TB 1.9 Up to 15 PFlops AI Power 8.4 KW/h GRIFO 32 221,184 Cores GPU Memory TB 2.5 Up to 20 PFlops AI Power 10.8 KW/h GRIFO 48 331,776 Cores GPU Memory TB 3.8 Up to 30 PFlops AI Power 15.6 KW/h GRIFO 64 442,368 Cores GPU Memory TB 5.2 Up to 40 PFlops AI Power 20.4 KW/h GRIFO 96 663,552 Cores GPU Memory TB 7.6 Up to 60 PFlops AI Power 30 KW/h GRIFO 128 GPU Memory TB 10.2 884,736 Cores Up to 79 PFlops AI Power 39.6 KW/h

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE RECORD PERFORMANCE HPC, AI and ML applications need to perform an enormous amount of calculations per second. Increasing the compute density of each server node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors for computation, making GRIFO™ the unique solution. UNIVERSAL Massive Multi-GPU Server

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNIVERSAL Massive Multi-GPU Server Traditional Infrastructure is Constrained Infrastructure silos starve AI workloads or waste capacity and money $$$ GRIFO™ INFRASTRUCTURE IS AGILE! Run any workload on a single optimized system for maximized utilization and money $ GRIFO™ : RUN DATA ANALYTICS, TRAINING, AND INFERENCE WORKLOADS ON THE SAME SYSTEM

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNIVERSAL Massive Multi-GPU Server The Most Powerful Tool For Companies of Any size One GRIFO™ Model A 128 GIVE EVERY DEVELOPER THE POWER TO EXPLORE Une GRFIO™ can serve 96 dedicated 28-core dual CPU servers each 400 individual developers @ the performance equivalent of a cluster

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNIVERSAL Massive Multi-GPU Server The Most Powerful Tool For Companies of Any size One GRIFO™ Model A 128 (128 GPUs Total) GPU Clustered System (128 GPUs Total) VS $33 Million | 50 Racks | >2000 kW $2.9 Million | 1.5 Racks | < 40 kW TODAY AI DATACENTER GRIFO™ 1/10 Of the cost 1/30 Of the Space 1/50 Of the Power

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNIVERSAL Massive Multi-GPU Server The Most Powerful Tool For Companies of Any size One GRIFO™ Model A 128 (32 GPUs Total) GPU Clustered System (32 GPUs Total) VS $11 Million | 25 Racks | 630 kW $ 0.9 Million | 1 Racks | < 11 kW >1/10 Of the cost 1/25 Of the Space 1/50 Of the Power TODAY AI DATACENTER GRIFO™

Training Training increasingly complex models faster is key to improving productivity for data scientists and delivering AI services more quickly. Servers powered by NVIDIA® GPUs use the performance of accelerated computing to cut deep learning training time from months to hours or minutes. Inference is where a trained neural network really goes to work. As new data points come in such as images, speech, visual and video search, inference is what gives the answers and recommendations at the heart of many AI services. A server with a single GPU can deliver 27X higher inference throughput than a single-socket CPU-only server resulting in dramatic cost savings. Inference Based on analysis using public data and industry research reports GRIFO™

Grifo ™ Inference Inference is where AI goes to work, powering innovation across every industry. But as data scientists and engineers push the boundaries of what’s possible in computer vision, speech, natural language processing (NLP), and recommender systems, AI models are rapidly evolving and expanding in size, complexity, and diversity. To take full advantage of this opportunity, organizations have to adopt a full-stack-based approach to AI inference.

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNMATCHED ACCELERATED COMPUTING PLATFORM UNIVERSAL Massive Multi-GPU Server Most complex Big-Data, Machine Learning, AI and new High Performance Computing problems require and use the massive computing power. 90% of the application require single node multi-accelerator architecture, shared memory between CPUs and shared memory between accelerators. This in turn demands more intelligence in the system infrastructure that must support this never-ending need for increasing functionality and performance with the features of a single system. GRIFO™ is the answer to the most demanding computing power easy to use as a single computer but on steroids! The architecture of GRIFO™ permits to implement accelerated analytics at record-braking speed using standard SQL and NO-SQL models, making possible to execute hundred of billions of sophisticated query, over the data, per second in a single unified system. Data Analytics

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNMATCHED ACCELERATED COMPUTING PLATFORM UNIVERSAL Massive Multi-GPU Server Genome analysis is a computationally intensive effort that needs a high performance computing environment powered by CPUs and coprocessing. Sequencing platforms generate as much as 6 TerraBytes of data every day, which is analyzed by scientists performing whole genome sequencing. GRIFO™ supports and embarrassedly accelerate genomic. It is fully compliant with NVIDIA Clara™ Parabricks and does not require any recoding. Parabricks is a computational framework supporting genomics applications from DNA to RNA. It employs NVIDIA’s CUDA, HPC, AI, and data analytics stacks to build GPU accelerated libraries, pipelines, and reference application workflows for primary, secondary, and tertiary analysis. Each single GRIFO™ delivers up to10x the performance than the latest DGX A100 maintaining the same programming model at a fraction of the power/cost. Genomics

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNMATCHED ACCELERATED COMPUTING PLATFORM UNIVERSAL Massive Multi-GPU Server High Performance Computing (HPC) is one of the most essential tools fueling the advancement of science. By leveraging GRIFO™ parallel processing, it can run advanced, large-scale application programs efficiently, reliably, and quickly. This acceleration delivers a dramatic boost in throughput and cost savings, paving the way to scientific discovery. HPC and AI are converging to extend the reach of science and accelerate the pace of scientific innovation. With AI, HPC is tackling previously unsolvable problems by modeling the world using experimental and simulation data. It’s also delivering real-time results with models that used to take days or months to simulate. High Performance Computing

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNMATCHED ACCELERATED COMPUTING PLATFORM UNIVERSAL Massive Multi-GPU Server GRIFO™ implements 128 GPUs under a global shared memory architecture that can work together like a single one. It is designed to handle the deep learning model of elevated complexity easily. All the GPUs are interconnected using a programmable non blocking switching fabric that guarantees high bandwidth and direct communication between all the GPU, the GPUs and the integrated storage, and the GPUs to the NPU (Network Processing Units) enabling the creation of larger clustered systems GRIFO™ implements a sophisticated ultra fast flash memory sub system that permits to the GPUs to receive the data with an embarrassing speed. Flash memory are used in numerous defense and intelligence operations today and the number is rapidly growing. The need to store and transcribe huge amounts of data through data and image processing is becoming overwhelming. The more GPUs and flash memory available, the quicker the data can be used. GRIFO™ represents the best and most affordable way of accomplishing this tremendous feat. Military

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE GRIFO ™ A 128 multi-precision computing platform allows high-precision calculations using FP64 and FP32 for scientific computing and simulations, while also enabling FP16 and Int8 for AI training and inference. This unprecedented versatility provides unique flexibility to support the future of computing. MACHINE LEARNING PERFORMANCE UNMATCHED ACCELERATED COMPUTING PLATFORM BERT-Large Inference | CPU only: Dual Xeon Gold 6240 @ 2.60 GHz, precision = FP32, batch size = 128 | V100: NVIDIA TensorRT ™  (TRT) 7.2, precision = INT8, batch size = 256 | A100 40GB and 80GB, batch size = 256, precision = INT8 with sparsity.​ Higher is Better UNIVERSAL Massive Multi-GPU Server BERT-Large Inference Fully Loaded GPU GRIFO Configuration

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE Big data analytics benchmark | 30 analytical retail queries, ETL, ML, NLP on 10TB dataset | CPU: Intel Xeon Gold 6252 2.10 GHz, Hadoop | V100 32GB, RAPIDS/ Dask | A100 80GB, RAPIDS/ Dask / BlazingSQL ​ With up to 10 TB of unified memory and all-to-all GPU communications, GRIFO A 128 has the capability to load and perform calculations on enormous datasets to derive actionable insights quickly. GRIFO™ A 128 ANALYTICS PERFORMANCE UNMATCHED ACCELERATED COMPUTING PLATFORM UNIVERSAL Massive Multi-GPU Server 900x NVIDIA DGX A100 ONE Single GRIFO™ A 128 is equivalent to the power of a Datacenter with 10’000 s Dual CPUs servers ONE Single GRIFO™ A 128 is equivalent to the power of a cluster of more than 12 NVIDIA DGX A100

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNIVERSAL Massive Multi-GPU Server UNMATCHED ACCELERATED COMPUTING PLATFORM GRIFO™ enables users to gain key insights from massive amounts of data that were previously unmanageable. Given the critical research these systems are tasked to perform, these high-density clusters are expected to run at 100% utilization for sustained periods, making cooling performance critical. GRIFO™ Optimizes Compute Throughput Enabling High Performance, High Wattage Densities GRIFO™ Direct Liquid Cooling (DLC) uses the exceptional thermal conductivity of liquid to provide dense, concentrated cooling to targeted areas. The system is designed to be installed in traditional air-cooled data centers GRIFO™ GREENER DATA-CENTER EFFICIENCY Density Performance Savings

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE UNIVERSAL Massive Multi-GPU Server UNMATCHED ACCELERATED COMPUTING PLATFORM Software GPU-accelerate standard software —without code changes — runs on GRIFO™ natively. GRIFO™ speeds up data processing and model training, high performance computing simulations, while substantially lowering infrastructure costs. Thousands of Applications ready from HPC to Analytics, Inference, Training, Deep Learning…and more… No Code Modification Fully Compliant with standard developing Tools Fully Compliant with any NVidia Programming Tools Fully Compliant with any GPU based commercial application Support Virtualization and GPU sharing among multiple users

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE Universal System for Every Workload Fastest Time To Solution Unmatched Data Center Scalability GRIFO™ is the universal system for all AI, HPC and Data driven infrastructure, from analytics to training to inference. It sets a new bar for compute density, packing 39 petaFLOPS of AI performance into a 42U form factor, replacing legacy datacenter infrastructure with one platform. (one system instead 1000s) GRIFO™ is the world’s first large scale unified memory AI system built on the NVIDIA A100 Tensor Core GPU. Integrating 128 A100 GPUs with up to 10’000 GB of GPU memory, the system provides unprecedented acceleration and is fully optimized for  NVIDIA CUDA-X ™  software and the end-to-end  NVIDIA data center  solution stack. GRIFO™ features Mellanox ConnectX-6 VPI HDR InfiniBand/Ethernet network adapters with 16 00 gigabytes per second (GB/s) of peak bi-directional bandwidth. This is one of the many features that make GRIFO™ the foundational building block for scalable AI infrastructure. UNIVERSAL Massive Multi-GPU Server

FUSING HPC AND AI COMPUTING INTO A UNIFIED ARCHITECTURE (Ancient Greek:  γρύψ ,  grū́ps ; Classical Latin:  grȳps  or  grȳpus ) is a legendary creature with the body, tail, and back legs of a lion; the head and wings of an eagle; and sometimes an eagle's talons as its front feet. Because the lion was traditionally considered the king of the beasts, and the eagle the king of the birds, by the Middle Ages, the GRIFO was an especially powerful and majestic creature. GRIFO™ UNIVERSAL Massive Multi-GPU Server
Tags