Mirabilis Design | Chiplet Summit | 2024

DeepakShankar4 126 views 23 slides Aug 09, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

As the demand for higher performance and energy efficiency in semiconductor design grows, the use of chiplets has emerged as a powerful approach to scaling complex systems. However, the rapid deployment of chiplets presents a unique set of architectural challenges, particularly in balancing power co...


Slide Content

Title: Architecture challenges in meeting power, thermal and performance needs in partitioning Chiplets for rapid deployment Chiplet Summit 2024 Deepak Shankar Founder, Mirabilis Design Inc. Email: [email protected]

About Mirabilis Design EDA Software Company based in Silicon Valley System-level modeling and simulation software - VisualSim Architecture Exploration of Semiconductors, Networks, Systems and Software Generating Digital Twin to empower shift-left methodology Networking Best Embedded Paper at DAC 2024 – Second time in 3 years

What is System Level Design Implementation Engineer System Engineer ------------------- ---------- ------------------------------ ------------------- ---------------- ---------------- ----------- System Design Focuses on: Designing the Right Product System decisions are optimized, repeatable and linked to implementation Robust system design can survive implementation compromises and get to market faster Mirabilis Design is focused here Implementation Focuses on: Implementing the Product Right Perfect implementation cannot rescue product from bad design assumptions Historical EDA Companies focus Escalating complexity means increasing need for system design

What we do…….. VisualSim is used to analyze Performance (Latency, Throughput) Power (Peak, Instant, Cumulative, Heat, Temp and Battery lifecycle) Functionality (algorithm, arbitration, scheduling, flow control) Analysis is used to Explore product feasibility Size systems Map task graphs to heterogeneous resources Partition into hardware and software Generate documentation collaborative engineering Integrate with the EDA tool flow VisualSim provides System-level modelling IP and Generators Graphical, Hierarchical with Polymorphic types Multi-domain simulator AI-based multi-core diagnostic systems API to integrate software and other simulators

VisualSim System-Level IP Library Custom Creator Communication Power RF, Baseband, Channels Communication systems, A/D transceivers, Antenna, Analog, Signal/audio/Image Processing Power States, Allocation, Transition, Loss, Battery, Consumption, Management, Generation, Distribution, and Thermal Sensors, Interfaces, Distribution, Traces, Software, VCD, ML, DNN Traffic Reports Latency, Throughput, Utilization, Ave/peak power (instant, ave ) , hit-ratio, Heat, Temp RISC-V and Chiplets RTOS and Software SiFive , In-Order/Out-of-Order Generator, Tilelink Generic RTOS, ARINC 653, AUTOSAR, task Graph AMBA (AHB/ APB/ AXI/CHI), Tilelink Corelink (600, 700), NoC (Generic, Arteris , Signature, OpenEdges ), Virtual Channel, DMA, Crossbar, Serial Switch, Bridge, UCie SOC Board-Level VME, PCI/PCI-X/PCIe 6.0, SPI 3.0, 1553B, FlexRay, CAN-FD/XL, AFDX, TTEthernet, OpenVPX Processors ARM (M0-55), R5, Cortex (A8, A72, A53, A76, A77, A65, A78, A720), Nvidia- Pascal to Ampere, Generic GPU, m C , Leon, Power, X86, DSP- TI and ADI, Tensilica , Renesas SH, AI Engine, TPU Stochastic Queue ,Time Queue, Quantity Queue, Resources, Scheduler Scripting, RegEx , Task graph, Use cases, Hardware Builder, C/C++/Java/Python MatLab , STK Storage Flash, NVMe, Disk, SSD, NAS, Fibre Channel, FireWire TSN, AVB, 10BaseT1S, Switched Ethernet, Resilient Packet Ring, RP3, WiFi 802.11, Bluetooth, PAN, Spacewire, SpaceFibre , IEEE802.1Q, Time-Triggered Ethernet, AFDX, 5G Networking Memory Memory Controller, SDR, DDR DRAM 2,3,4, 5, LPDDR 2, 3, 4,5 HBM2.0, HMC, QDR, RDRAM, MPMC, cache, Coherent cache FPGA Xilinx- Versal, Zynq, Ultrascale , Kintex Altera-Stratix, Arria, Microsemi- Smartfusion, Programmable logic generator Trade-Off Requirements, Thermal, Power, Performance, Failure Verification, Upgrade

UCIe Component Library Unique Name Package Type Buffer Size Devices connected to the port Protocols used at the port

C hallenges in the Design of Chiplets Meet the Latency, bandwidth and power consumption requirements Design scalable dies for easy use across multiple applications Select the right number of UCIe or Die-to-die ports Mapping and scheduling applications across compute resources CPU DRAM DRAM CPU CPU CMN DRAM DRAM CPU CMN UCIe CHI CMN CMN UCIe CHI DDR Mem Die Sizing Positioning Topology

Creating a Next-Generation HPC using UCIe

Comparing Interconnects to UCIe All Die Adapters using PCIe 6.0 Die Adapter with Streaming Protocol (AXI) Lower latency when using PCIe 6.0

Power Generation Power Storage Power Consumption Thermal Management Different charging schemes Impact of surge and shocks Battery Lifecycle Battery Consumption Statistics Heat and temperature Impact of cooling strategy Add impact of power spikes State based power consumption of electronics (controller, SOC) and Mechanical (brakes, wheels) Average, instant and Cumulative Power per device and application Verification and Debugging 4 Types of Power Generators in VisualSim Constant, variable , motor, solar charge Charge sent to battery 1 2 3 5 6 Optimize and test the power management algorithms Sizing of power generators and battery Optimize the schedule, supplynet and voltage Estimate power consumed by the software application Downstream Integration Generate UPF file with power domains and associated voltage levels Generate S ystemVerilog power testbench Generate powerState change VCD dump 7 Power Management Change in power state controlled by time, utilization, temperature and expected activity 4 Integrate Power and Thermal into the Performance Model

Reference Data: Open Architecture Management (OAM) With DDR Placement

Base Configuration Host Speed: 1Ghz Processing Cycles: 1.0 Frame Size: 128KB Traffic issue Rate: 30.0µs Hub Speed: 1Ghz Processing Cycles: 1.0 Cache size: 64MB, (95% hit ratio, 8 bytes access per cycle) Domain Specific-Architecture Speed: 1Ghz Accelerator Processing Cycles: 100.0 DMA Access size: 2048 Cache Size: 32MB, (100% hit ratio, 8 bytes access per cycle) CMN600 Speed: 1Ghz Flit size: 1024 bytes DDR Speed: 3200 data rate DDR4 ACE bus Speed: 1Ghz Bus Width: 256 bytes UCIe speed: 16GTps Package: Standard

Chiplet System Architecture: Open Architecture Management DDR in Host CPU DDR in Hub

Result comparison DDR at Host DDR at Hub

Model Results (Base Configuration)

Reference Data: Mapping Applications onto SoC

Mapping Algorithm to Multi-Resources Standard HW Library Component Basic/Starting Configuration Grayscale_Conversion - PS [A72 Core 1] IIR – Logic (PL) FFT – AI Engine Tile Edge_Image - Logic (PL) iFFT – AI Engine Tile Edge_Image_Enhancement – Logic (PL) Segmentation – PS [A72 Core 2] Image Processing Algorithm

Experiments with Different Implementations Run 3 – Using Direct Path between Logic and AI Run 2 – Segmentation Mapped to AI Engine Run 1 – Base Configuration Mapped to Logic and ARM Application latency increasing over time. Latency increases due to Segmentation. Remap segmentation task AI Tiles Latency is deterministic Latency requirement (App latency < 80 msec) is met. Utilization across NoC is acceptable Application latency in bounded range. NoC Utilization is high. Changed interconnect for Segmentation from NoC to Direct

Automotive applications System Integration of SoC using Chiplet

System Overview Gateway Transfer messages between different CAN networks CAN Bus CAN bus is the network that connects sensors and ECU’s Wheel1 Wheel4 Wheel3 Wheel2 Gateway CAN Bus Engine Proximity Sensor Brake Pedal Gyro Sensor Road condition sensor CAN Bus CAN Bus ECU

VisualSim Model Chiplet with Arm/ Telsilica , GPU and AI Engine

Evaluation of Chiplet Performance in Automotive Application 7/17/2024 Mirabilis Design Inc. 22

Title: Architecture challenges in meeting power, thermal and performance needs in partitioning Chiplets for rapid deployment Chiplet Summit 2024 Deepak Shankar Founder, Mirabilis Design Inc. Email: [email protected]