ERTS UNIT 3.ppt

UNIT 3
EMBEDDED PROGRAMMING

Component for Embedded
Programs
State machine
Stream-oriented programming and
circular buffer
Queue(used for DSP)

State machine
State machine keeps internal state as a
variable, changes state based on inputs.
Uses:
control-dominated code;
reactive systems.

State machine example
buzzer
belted
idle
seated
no seat/-
seat/timer on
no belt
and no
timer/-
no belt/timer on
belt/-
belt/
buzzer off
Belt/buzzer on
no seat/-
no seat/
buzzer off

C implementation
#define IDLE 0
#define SEATED 1
#define BELTED 2
#define BUZZER 3
switch (state) {
case IDLE: if (seat) { state = SEATED; timer_on = TRUE; }
break;
case SEATED: if (belt) state = BELTED;
else if (timer) state = BUZZER;
break;
…
}

Stream oriented and circular
buffer programming
Commonly used in data stream:
new data constantly arrives;
each datum has a limited lifetime.
Use a circular buffer to hold the data
stream.
d1d2d3d4d5d6d7
time ttime t+1

Circular buffer
x1x2x3x4x5x6
t
1t
2t
3
Data stream
x1x2x3x4
Circular buffer
x5x6x7

Circular buffers
Indexes locate currently used data,
current input data:
d1
d2
d3
d4
time t1
use
input
d5
d2
d3
d4
time t1+1
use
input

Circular buffer
implementation: FIR filter
int circ_buffer[N], circ_buffer_head = 0;
int c[N]; /* coefficients */
…
int ibuf, ic;
for (f=0, ibuff=circ_buff_head, ic=0;
ic<N; ibuff=(ibuff==N-1?0:ibuff++), ic++)
f = f + c[ic]*circ_buffer[ibuf];

Queues
Queue is also used in signal processing
and event processing
It is used whenever data may arrive and
depart at unpredictable time
Elastic buffer: holds data that arrives
irregularly.

Models of programs
Modelprograms.moregeneralthan
sourcecode
Thefundamentalmodelforprogramis
thecontrol/dataflowgraphCDFG
TheCDFGhasconstructbothdata
operation&controloperation

Data flow graph
DFG:dataflowgraph.
Doesnotrepresentcontrol.
Modelsbasicblock:codewithonlyone
entryandexitpoint
Describestheminimalordering
requirementsonoperations.

Single assignment form
x = a + b;
y = c -d;
z = x * y;
y = b + d;
original basic block
x = a + b;
y = c -d;
z = x * y;
y1 = b + d;
single assignment form

Data flow graph
x = a + b;
y = c -d;
z = x * y;
y1 = b + d;
single assignment form
+ -
+*
DFG
a b c d
z
x
y
y1

DFGs and partial orders
Partial order:
a+b, c-d; b+d x*y
Can do pairs of
operations in any
order.
+ -
+*
a b c d
z
x
y
y1

Control-data flow graph
CDFG: represents control and data.
Uses data flow graphs as components.
Two types of nodes:
Decision nodes
data flow nodes

Data flow node
Encapsulates a data flow graph:
Write operations in basic block form for
simplicity.
x = a + b;
y = c + d

Control
cond
T
F
Equivalent forms
value
v1
v2
v3
v4

CDFG example
if (cond1) bb1();
else bb2();
bb3();
switch (test1) {
case c1: bb4(); break;
case c2: bb5(); break;
case c3: bb6(); break;
}
cond1 bb1()
bb2()
bb3()
bb4()
test1
bb5() bb6()
T
F
c1
c2
c3

for loop
for (i=0; i<N; i++)
loop_body();
for loop
i=0;
while (i<N) {
loop_body(); i++; }
equivalent
i<N
loop_body()
T
F
i=0

Assembly Linking and
Loading
Last steps in compilation process:
HLL
compile
assembly
assemble
HLL
HLL assembly
assembly
linkexecutableloader
Object
codeexecution

Multiple-module programs
Programs may be composed from several
files.
Addresses become more specific during
processing:
relative addressesare measured relative to
the start of a module;
absolute addressesare measured relative to
the start of the CPU address space.

Assemblers
Major tasks:
generate binary for symbolic instructions;
translate labels into addresses;
handle pseudo-ops (data, etc.).
Generally one-to-one translation.
Assembly labels:
ORG 100
label1ADR r4,c

Symbol table
ADD r0,r1,r2
xxADD r3,r4,r5
CMP r0,r3
yySUB r5,r6,r7
assembly code
xx0x8
yy0x10
symbol table

Symbol table generation
Use program location counter (PLC) to
determine address of each location.
Scan program, keeping count of PLC.
Addresses are generated at assembly
time, not execution time.

Symbol table example
ADD r0,r1,r2
xxADD r3,r4,r5
CMP r0,r3
yySUB r5,r6,r7
xx0x8
yy0x10
PLC=0x7
PLC=0x7
PLC=0x7
PLC=0x7

Two-pass assembly
Pass 1:
generate symbol table
Pass 2:
generate binary instructions

Relative address
generation
Some label values may not be known at
assembly time.
Labels within the module may be kept in
relative form.

Pseudo-operations
Pseudo-ops do not generate instructions:
ORGsets program location.
EQUgenerates symbol table entry without
advancing PLC.
Data statementsdefine data blocks.

Linking
Combines several object modules into a
single executable module.
Jobs:
put modules in order;
resolve labels across modules.

external reference
entry point
Externals and entry points
xxxADD r1,r2,r3
B a
yyy%1
a ADR r4,yyy
ADD r3,r4,r5

Module ordering
Code modules must be placed in absolute
positions in the memory space.
Load mapor linker flags control the order
of modules.
module1
module2
module3

Dynamic linking
Some operating systems link modules
dynamically at run time:
shares one copy of library among all
executing programs;
allows programs to be updated with new
versions of libraries.

Object code design
MemoryMapdesign
InterruptvectorandotherinformationforI/O
devicesmustbeplacedinspecificlocation
Memorymanagementtablesmustbesetup
Globalvariablesusedforcommunicationb/w
processesmustbeputinlocationthatare
accessibletoalltheusersofthedata
Re-entrancy
Relocatability

Compilation Techniques
Compilation process.
Basic Compilation methods.
Compiler optimizations.
Interpreters and just-in-time compilers.

Compilation
Compilation strategy (Wirth):
compilation = translation + optimization
Compiler determines quality of code:
use of CPU resources;
memory access scheduling;
code size.

Basic compilation phases
HLL
parsing, symbol table
machine-independent
optimizations
Instruction level
optimizations
assembly

Basic compilation method
Statement translation
Procedures
Data structures

Statement translation and
optimization
Source code is translated into
intermediate form such as CDFG.
CDFG is transformed/optimized.
CDFG is translated into instructions with
optimization decisions.
Instructions are further optimized.

Arithmetic expressions
a*b + 5*(c-d)
expression
DFG
* -
*
+
ab
cd
5

Control code generation
if (a+b > 0)
x = 5;
else
x = 7;
a+b>0 x=5
x=7

Procedure linkage
Need code to:
call and return;
pass parameters and results.
Parameters and returns are passed on
stack.
Procedures with few parameters may use
registers.

Procedure stacks
proc1
growth
proc1(int a) {
proc2(5);
}
proc2
SP
stack pointer
FP
frame pointer
5
accessed relative to SP

Data structures
Different types of data structures use
different data layouts.
Some offsets into data structure can be
computed at compile time, others must be
computed at run time.

One-dimensional arrays
C array name points to 0th element:
a[0]
a[1]
a[2]
a
= *(a + 1)

Two-dimensional arrays
Column-major layout:
a[0,0]
a[0,1]
a[1,0]
a[1,1] = a[i*M+j]
...
M
...
N

Structures
Fields within structures are static offsets:
field1
field2
aptr
struct {
int field1;
char field2;
} mystruct;
struct mystruct a, *aptr = &a;
4 bytes
*(aptr+4)

Compiler optimization
If we want to writing programs in high level
language then we need to understand how to
optimize them without rewriting them in asm
language
It requires creating the proper source code that
cause the complier to do what we want

Loop transformations
Goals:
reduce loop overhead;
increase opportunities for pipelining;
improve memory system performance.

Loop unrolling
Reduces loop overhead, enables some
other optimizations.
It helps expose parallelism
for (i=0; i<4; i++)
a[i] = b[i] * c[i];
for (i=0; i<2; i++) {
a[i*2] = b[i*2] * c[i*2];
a[i*2+1] = b[i*2+1] * c[i*2+1];
}

Loop fusion and
distribution
Fusion combines two loops into 1:
for (i=0; i<N; i++) a[i] = b[i] * 5;
for (j=0; j<N; j++) w[j] = c[j] * d[j];
for (i=0; i<N; i++) {
a[i] = b[i] * 5; w[i] = c[i] * d[i];
}
Distribution breaks one loop into two.
Changes optimizations within loop body.

Loop tiling
Breaks one loop into a nest of loops.
Changes order of accesses within array.
Changes cache behavior.

Array padding
Add array elements to change mapping
into cache:
a[0,0]a[0,1]a[0,2]
a[1,0]a[1,1]a[1,2]
before
a[0,0]a[0,1]a[0,2]
a[1,0]a[1,1]a[1,2]
after
a[0,2]
a[1,2]

Dead code elimination
Dead code:
#define DEBUG 0
if (DEBUG) dbg(p1);
Can be eliminated by
analysis of control flow,
constant folding.
0
dbg(p1);
1
0
•Dead code can also be generated by compiler
•Dead code can be identified by reachability analysis

Register allocation
Goals:
choose register to hold each variable;
determine lifespan of varible in the register.
Basic case: within basic block.

Register lifetime graph
w = a + b;
x = c + w;
y = c + d;
time
a
b
c
d
w
x
y
1 2 3
t=1
t=2
t=3

Instruction scheduling
Non-pipelined machines do not need
instruction scheduling: any order of
instructions that satisfies data
dependencies runs equally fast.
In pipelined machines, execution time of
one instruction depends on the nearby
instructions: opcode, operands.

Reservation table
A reservation table
relates
instructions/time to
CPU resources.
Time/instrA B
instr1 X
instr2 X X
instr3 X
instr4 X

Software pipelining
Schedules instructions across loop
iterations.
Reduces instruction latency in iteration i
by inserting instructions from iteration
i+1.

Instruction selection
May be several ways to implement an
operation or sequence of operations.
Represent operations as graphs, match
possible instruction sequences onto
graph.
*
+
expression templates
* +
*
+
MUL ADD
MADD

Using your compiler
Understand various optimization levels (-
O1, -O2, etc.)
Look at mixed compiler/assembler output.
Modifying compiler output requires care:
correctness;
loss of hand-tweaked code.

Interpreters and JIT
compilers
Interpreter: translates and executes
program statements on-the-fly.
JIT compiler: compiles small sections of
code into instructions during program
execution.
Eliminates some translation overhead.
Often requires more memory.

Program-level performance
analysis
Need to understand
performance in detail:
Real-time behavior, not
just typical.
On complex platforms.
Program performance 
CPU performance:
Pipeline, cache are
windows into program.
We must analyze the entire
program.

Execution time calculation
Varies with input data:
Different-length paths. (because those values
select different execution pathsin the
program)
Cache effects. (Plays major effecton
program)
Instruction-level performance variations:
Floating point operation.
Fetch times.

Measure program
performance
From microprocessor manufacturers:
Simulate execution of the CPU.(Simulator)
Makes CPU state visible.
By using timer.
Requires modifying the program to control
the timer. Used to measure the Performance
of executing sections of code.
By using logic analyzer.
to measure the start & stop times of a code
segment.

Types of Performance
measures on Programs
Average-case execution time. (Typical
Time)
Typically used in application programming.
Worst-case execution time. (Longest
time)
A component in deadline satisfaction.
Best-case execution time.
This measure can be important in multi-rate
real time system.

Elements of program
performance
Basic program execution time formula:
execution time = program path + instruction timing
The path is the sequence of instructions
executed by the program
We can trace the execution path of program
through its high-level language specification
It is hard to get accurate estimates of total
execution time from a high level program.

Instruction timing
It is based on sequence of instructions traced by
the program path.
Not all instructions take the same amount of time.
Multi-cycle instructions.
Fetches.
Execution times of instructions are not independent.
Pipeline interlocks.
Cache effects.
Execution times may depend on operand values.
Floating-point operations.
Some multi-cycle integer operations.

Measurement-driven
performance analysis
The most direct way to determine the execution
time of a program is by measuring it
Program trace-Process of combining the
determination of the execution path and the
timing of that path
Measurement issues
Need to know the desired input values
May need to write software scaffolding to generate
the input values (SIMULATOR)
Software scaffolding may also need to examine
outputs to generate feedback-driven inputs

Measurement-driven
performance analysis con’t
Methods-Directly on the Hardware & By using
the Simulator
Profiling
It is simple method for analysing software
performance
It does not measure execution time
There are two major ways to profile a program

Physical performance
measurement
In-circuit emulator allows tracing.
Affects execution timing.
Logic analyzer can measure behavior at pins.
Address bus can be analyzed to look for events.
Code can be modified to make events visible.
Particularly important for real-world input
streams.

Software performance
optimization
Software performance can be optimized
using several techniques such as loop and
cache optimizations.

Loop optimizations
Loops are good targets for optimization,
because programs with loops tend to
spend a lot of time executing those loops.
Basic loop optimizations:
code motion;
induction-variable elimination;
strength reduction (x*2 -> x<<1).

Code motion
Code motion can move
unnecessary code out of
the loop.
If a computation result does
not depend on operations
performed in the loop
body, then we can safely
move it out of the loop.
for (i=0; i<N*M; i++)
z[i] = a[i] + b[i];
i<N*M
i=0;
z[i] = a[i] + b[i];
i = i+1;
N
Y
i<X
i=0; X = N*M

Induction variable
elimination
Induction variable: It is a variable whose
value is derived from the loop iteration
variable’s value. loop index.
Consider loop:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
z[i,j] = b[i,j];
Rather than recompute i*M+j for each array in
each iteration, share induction variable between
arrays, increment at end of loop body.

Strength Reduction
Strength reduction helps us to reduce the
cost of a loop iteration. Consider this
assignment,
Y=x*2
In integer arithmetic, we can use left shift
rather than a multiplication by 2. if the shift
is faster than multiply, we want to perform
the substitution.

Cache Optimization
Loop nest: set of loops, one inside other.
Perfect loop nest: no conditionals in nest.
Because loops use large quantities of
data, cache conflicts are common.

Performance optimization
hints
Use registers efficiently.
Use page mode memory accesses.
Analyze cache behavior:
instruction conflicts can be handled by
rewriting code, rescheudling;
conflicting scalar data can easily be moved;
conflicting array data can be moved, padded.

Program level
Energy/power optimization
Power consumption is a particularly
important design metric for battery
powered systemsbecause the battery has
a very limited lifetime.
Energy: ability to do work.
Most important in battery-powered systems.
Power: energy per unit time.
Important even in wall-plug systems---power
becomes heat.

Measuring energy
consumption
Execute a small loop, measure current:
while (TRUE)
{
test_code();
}
CPU
I
Way to save power

Energy consumption of the
program
Energy consumption various somewhat from
instruction to instruction
The sequence of instructions has some influence
Opcode and the location of the operands also
matters.
Memory system also reduces the power
consumption
Cacheare an important factor in energy
consumption

Cache behavior is
important
Energy consumption has a sweet spot as
cache size changes:
cache too small: program thrashes, burning
energy on external memory accesses;
cache too large: cache itself burns too much
power.

Optimizing for energy
Way to optimize a program for low power
high performance = low energy.
Making the program run faster also reduce
energy consumption
Not many instructions trade speed for energy.
Memory access patterns control by the
programmer

Optimizing for energy,
cont’d.
Use registers efficiently.
Identify and eliminate cache conflicts.
Make the use of page mode access in the
memory system
Moderate loop unrolling eliminates some loop
overhead instructions.
Eliminate pipeline stalls.
Eliminate recursive procedure call

Analysis and Optimization
of Program Size
The memory size of a program is determined by
the size of its data and instructions
Data provide an excellent opportunity for
minimizing size because the data are most
highly dependent on programming style.
Many inefficient programsoften keep several
copies of data, identifying and eliminating
duplications can lead to memory utilization in
high manner.

Analysis and Optimization
of Program Size
A very low level technique for minimizing
data is to reuse values.
Data buffers can be reused at several
different points in the program.
Goal:
reduce hardware cost of memory;
reduce power consumption of memory units.
Two opportunities:
data;
instructions.

Data size minimization
Reuse constants, variables, data buffers in
different parts of code.
Requires careful verification of correctness.
Generate data using instructions.

Reducing code size
Avoid function inlining.
Choose CPU with compact instructions.
Use specialized instructions where
possible.

Program validation and
testing
Complex systems need testing to ensure that
they work as they are intended.
But bugs can be introduced particularly in
embedded system because specialized hardware
and real time responsiveness make
programming more challenging.
There are many available techniques for
software testing that can help us generate a
comprehensive set of tests to ensure that our
system works properly

Program validation and
testing
Breaking the testing problem into sub problems
and analyzing each sub programs.
Two Major type of testing strategies:
Black box doesn’t look at the source code.
Clear box (white box) does look at the source code.

Clear-box testing
Control /data flow graph is an important tool in
developing clear box test for program
Provide the program with input that exercise the test
Execute the program to perform the test
Examine the output to determine whether the test
was successful

Execution paths and
testing
Paths are important in functional testing
as well as performance analysis.
In general, an exponential number of
paths through the program.
Show that some paths dominate others.

Basis paths
Approximate CDFG
with undirected
graph.
Undirected graphs
have basis paths:
All paths are linear
combinations of basis
paths.

Cyclomatic complexity
Cyclomatic complexity
is a bound on the size
of basis sets:
e = # edges
n = # nodes
p = number of graph
components
M = e –n + 2p.

Types of Clear box testing
Strategy
Branch testing-A simple condition testing
strategy is known as branch testing.
This branch testing requires the true and
false branches of a condition.
Every simple condition in the conditional
expressions to be tested at least once.

Types of Clear box testing
Strategy
Domain testing-Another more
sophisticated strategy for testing
conditionals.
Data flow testing-It makes use of def-use
analysis (short for definition-use-analysis)
It selects the path that have some
relationship to the program function.

Loop testing
Loops need specialized tests to be tested
efficiently.
Heuristic testing strategy:
Skip loop entirely.
One loop iteration.
Two loop iterations.
# iterations much below max.
n-1, n, n+1 iterations where n is max.

Black-box testing
Two ways:
Block box testalone-Low probability of
finding all the bugs in a program, so black
box test will be combined with clear box
tests to provide a well-rounded test.
Complements clear-box testing.
May require a large number of tests.
Tests software in different ways.

Black-box test vectors
Random tests.
May weight distribution based on software
specification.
Regression tests.
Tests of previous versions, bugs, etc.
May be clear-box tests of previous versions.

How much testing is
enough?
Exhaustive testing is impractical.
One important measure of test quality---bugs
escaping into field.
Good organizations can test software to give
very low field bug report rates.
Error injection measures test quality:
Add known bugs.
Run your tests.
Determine % injected bugs that are caught.

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

ERTS UNIT 3.ppt

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77