What is FPGA?

FPGA b y Andriy Smolskyy

FPGA Programmable Logic Evolution: TTL  PLA  CPLD  FPGA  ASIC Development aspects Using FPGA for high speed data processing OpenCL

Programmable Logic is Found Everywhere!

TTL Logic Design

Digital Design with TTL Logic

From TTL to Programmable Logic General features of logic implementations Sum of products (AND-OR gates, combinatorial logic) Stored results (registered outputs) Wired together What if Logic functions were fixed (like TTL), but combined into a single device? Wiring (routing) connections could be controlled (programmed) somehow?

Programmable Array Logic (PAL) Simplest implementation of programmable logic Logic gates and registers are fixed Programmable sum of products array and output control

Programmable Logic Advantages Fewer devices required Lower cost Power savings Simpler to test and debug Design security (prevent reverse engineering) Design flexibility Automated tools simplify and consolidate design flow In-system reprogrammability ! (in some cases)

From PAL to Programmable Logic Device (PLD) Arrange multiple PAL arrays in a single device

From PLD to Complex PLD (CPLD) Combine multiple PLDs in single device with programmable interconnect and I/O

General CPLD Advantages Ample amounts of logic and advanced configurable I/ Os Programmable routing Instant on Low cost Non-volatile configuration Reprogrammable

From CPLD to FPGAs Higher density CPLDs don’t scale well because of requires additional global routing Rearrange LABs themselves into an array

Field Programmable Gate Array (FPGA) LABs arranged in an array Row and column programmable interconnect Interconnect may span all or part of the array

CPLD LABs vs. FPGA LABs FPGA LABs made up of logic elements (LEs) instead of product terms and macrocells Easier to create complex functions through LE cascading

Lookup Tables (LUTs) Replaces product term array Combinational functions created with programmed “tables” (cascaded multiplexers) LUT inputs are mux select lines

Adaptive Logic Modules (ALM) Based on LE, but includes dedicated resources & adaptive LUT (ALUT) Improves performance and resource utilization

FPGA Routing All device resources can feed into or be fed by any routing in device Differing fixed lengths to adjust for timing Scales linearly as density increases Local interconnect Connects between Les or ALMs within a LAB Can include direct connections between adjacent LABs Row and column interconnect Fixed length routing segments Span a number of LABs or entire device

Other Typical FPGA Features Embedded multipliers Useful for DSP High-performance multiply/add/accumulate operations Memory blocks High-speed transceivers Replace some LABs with dedicated functional hardware blocks PLLs SDRAM controllers Hard Processor System

System on Chip ( SoC ) + FPGA

FPGA Programming FPGA programming information must be stored somewhere to program device at power on Use external EEPROM, CPLD or CPU to program T wo programming methods Active: FPGA controls programming sequence automatically at power on Passive: Intelligent host (typically CPU) controls programming Also programmable through JTAG connection

FPGA Advantages High density to create many complex logic functions High performance Low cost Integration of many functions Many available I/O standards and features Fast programming

From FPGA to ASIC A true ASIC: no configuration at power-on required Create and test design with FPGA device Migrate design to pin–compatible, functionally equivalent ASIC device

FPGA design development Verilog Hardware Description Language VDHL - Very high speed integrated circuits Hardware Description Language

FPGA design development Schematic design development

Dedicated FPGA design block Core IP SDRAM Controller Ethernet PHY, Custom Transceiver PHY PCIe PHY SDi , Display Port Megafunctions PLL I/O Custom logic blocks

FPGA for high speed data processing CPU data processing optimization Pipelining, parallelism OpenCL

A simple CPU B A A ALU Op Val Instruction Fetch Registers Aaddr Baddr Caddr PC Load Store LdAddr StAddr CWriteEnable C Op LdData StData Op CData

Load immediate value into register B A A ALU Op Val Instruction Fetch Registers Aaddr Baddr Caddr PC Load Store LdAddr StAddr CWriteEnable C Op LdData StData Op CData

Load memory value into register B A A ALU Op Val Instruction Fetch Registers Aaddr Baddr Caddr PC Load Store LdAddr StAddr CWriteEnable C Op LdData StData Op CData

Store register value into memory B A A ALU Op Val Instruction Fetch Registers Aaddr Baddr Caddr PC Load Store LdAddr StAddr CWriteEnable C Op LdData StData Op CData

Add two registers, store result in register B A A ALU Op Val Instruction Fetch Registers Aaddr Baddr Caddr PC Load Store LdAddr StAddr CWriteEnable C Op LdData StData Op CData

A simple program Mem[100] += 42 * Mem[101 ] CPU instructions: R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100 ]

CPU activity, step by step A A A A A R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100] A Time

Unroll the CPU hardware… A A A A A R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100] A Space

… and specialize by position A A A A A R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100] A Instructions are fixed. Remove “Fetch”

… and specialize A A A A A R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100] A Instructions are fixed. Remove “Fetch” Remove unused ALU ops

and specialize A A A A A R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100] A Instructions are fixed. Remove “Fetch” Remove unused ALU ops Remove unused Load / Store

… and specialize R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100] Instructions are fixed. Remove “Fetch” Remove unused ALU ops Remove unused Load / Store Wire up registers properly! And propagate state.

… and specialize R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100] Instructions are fixed. Remove “Fetch” Remove unused ALU ops Remove unused Load / Store Wire up registers properly! And propagate state. Remove dead data.

… and specialize R0  Load Mem [100] R1  Load Mem [101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem [100] Instructions are fixed. Remove “Fetch” Remove unused ALU ops Remove unused Load / Store Wire up registers properly! And propagate state. Remove dead data. Reschedule!

So what ? Load Load Store 42 FPGA datapath = Your algorithm, in silicon Build exactly what you need: Operations Data widths Memory size, configuration Efficiency: Throughput / Latency / Power

OpenCL Programming Model Accelerator Local Mem Global Mem Local Mem Local Mem Local Mem Accelerator Accelerator Accelerator Processor Accelerator Local Mem Global Mem Local Mem Local Mem Local Mem Accelerator Accelerator Accelerator Processor Host Accelerator Local Mem Global Mem Local Mem Local Mem Local Mem Accelerator Accelerator Accelerator Processor __kernel void sum( __global float *a, __ global float *b, __ global float *y) { int gid = get_global_id (0); y[ gid ] = a[ gid ] + b[ gid ]; } main () { read_data ( … ); maninpulate ( … ); clEnqueueWriteBuffer ( … ); clEnqueueNDRange ( …,sum,…); clEnqueueReadBuffer ( … ); display_result ( … ); } Host + Accelerator Programming Model Sequential Host program on microprocessor Function offload onto a highly parallel accelerator device

OpenCL  FPGA is NOT just ‘C’-to-HW IP IP EMIF IP Processor Rest of the System ? HLS void F(...) { # pragma ... for ( int i ...) { #pragma ... for ( int j ...) { #pragma ... } } } RTL OpenCL kernel void F(...) { for ( int i ...) { for ( int j ...) { } } } ? Complete Platform C-to-HW tools Standard OpenCL Users Hardware Designers Target FPGA Only FPGA Expertise Yes Timing Closure Manual Users Software Programmers Target Complete Platforms FPGA Expertise No Timing Closure Automatic

Traditional OpenCL Data Parallelism OpenCL kernels expresses parallelism explicitly __kernel void sum( __global const float *a, __global const float *b, __global float *answer) { int xid = get_global_id (0); answer[ xid ] = a[ xid ] + b[ xid ]; } for ( int i =0; i < n; i ++) { answer[ i ] = a[ i ] + b[ i ]; } Host Code Kernel Code setup_memory_buffers (); transfer_data_to_fpga (); size_t global_size = {N, 1, 1}; clEnqueueNDRangeKernel ( sum_kernel , .., & global_size , ..); read_data_from_fpga ();

Loop Pipelining To achieve acceleration, we can pipeline each iteration of the loop Analyze any dependencies between iterations Schedule these operations Launch the next iteration as soon as possible float array[M]; for ( int i =0; i < n* numSets ; i ++) { for ( int j=0; j < M-1; j++) array[j] = array[j+1]; array[M-1] = a[ i ]; for ( int j=0; j < M; j++) answer[ i ] += array[j] * coefs [j]; } At this point, we can launch the next iteration

Loop Pipelining Example No Loop Pipelining i0 i1 i2 i0 i1 i2 i3 i4 Looks almost like parallel thread execution! With Loop Pipelining

Digital Filter

FPGA Q&A

FPGA Thank You!

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

What is FPGA?

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MGV Residential Design projects for different clients, including a New Mexico Adobe project-1-.pdf

EUNITED_Advocacy and Public Engagement through Visual Media

DESIGN THINKINGGG PPT 2 TOPIC IDEATION.pptx

DESIGN THINKING CHAPTER 1 PPTT PPT 1.pptx

Hinduism and Its History - PowerPoint Slides.pptx

Service Attributes of Manufactured Parts.pptx