2
What is a DSP?
•A specialized microprocessor for real-
time DSP applications
–Digital filtering (FIR and IIR)
–FFT
–Convolution, Matrix Multiplication etcADC DACDSP
ANALOG
INPUT
ANALOG
OUTPUT
DIGITAL
INPUT
DIGITAL
OUTPUT
3
Hardware used in DSP
ASIC FPGA GPP DSP
PerformanceVery High High Medium Medium High
FlexibilityVery low High High High
Power
consumption
Very low low Medium Low Medium
Development
Time
Long Medium Short Short
4
Common DSP features
•Harvard architecture
•Dedicated single-cycle Multiply-Accumulate
(MAC) instruction (hardware MAC units)
•Single-Instruction Multiple Data (SIMD) Very
Large Instruction Word (VLIW) architecture
•Pipelining
•Cache
•DMA
5
Harvard Architecture
•Physically separate
memories and paths
for instruction and
dataDATA
MEMORY
PROGRAM
MEMORY
CPU
6
Single-Cycle MAC unitMultiplier
Adder
Register
a xi i
axii
axi-1i-1
axiiaxi-1i-1+ Σ(a x )
ii
i=0
n
Can compute a sum of n-
products in ncycles
7
Single Instruction -Multiple Data
(SIMD)
•A technique for data-level parallelism by
employing a number of processing
elements working in parallel
8
Very Long Instruction Word (VLIW)
•A technique for
instruction-level
parallelism by executing
instructions without
dependencies (known at
compile-time) in parallel
•Example of a single
VLIW instruction:
F=a+b; c=e/g; d=x&y; w=z*h;VLIW instruction
F=a+b c=e/g d=x&y w=z*h
PU
PU
PU
PU
a
b
F
c
d
w
e
g
x
y
z
h
ACOE343 -Embedded Real-Time Processor Systems -
Frederick University
9
CISC vs. RISC vs. VLIW
10
Pipelining
•DSPs commonly feature deep pipelines
•TMS320C6x processors have 3 pipeline stages
with a number of phases (cycles):
–Fetch
•Program Address Generate (PG)
•Program Address Send (PS)
•Program ready wait (PW)
•Program receive (PR)
–Decode
•Dispatch (DP)
•Decode (DC)
–Execute
•6 to 10 phases
11
Direct Memory Access (DMA)
•The feature that allows peripherals to access
main memory without the intervention of the
CPU
•Typically, the CPU initiates DMAtransfer, does
other operations while the transfer is in
progress, and receivesan interrupt from the
DMA controller once the operation is complete.
•Can create cache coherency problems (the data
in the cache may be different from the data in
the external memory after DMA)
•Requires a DMA controller
12
Cache memory
•Separate instruction and data L1 caches
(Harvard architecture)
•most systems uses DMA
13
DSP vs. Microcontroller
•DSP
–Harvard Architecture
–VLIW/SIMD (parallel
execution units)
–No bit level operations
–Hardware MACs
–DSP applications
•Microcontroller
–Mostly von Neumann
Architecture
–Single execution unit
–Flexible bit-level
operations
–No hardware MACs
–Control applications
•A TMS320C6713 DSP operating at 225
MHz.
•16 Mbytes of synchronous DRAM
•512 Kbytes of non-volatile Flash memory
•(256 Kbytes usable in default conguration)
•4 user accessible LEDs and DIP switches
•Software board conguration through
•registers implemented in CPLD
16
•JTAG emulation through on-board JTAG
•emulator with USB host interface or
external emulator
17