MCES 21CS43 Module 3 microcontroller notes

vinodthrupthi 27 views 40 slides Sep 18, 2024
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

Microcontroller C compliers notes


Slide Content

Mic r o c ont r oller and Embedded Sy s t ems Modu l e-3 C Compilers and Optimi z ation: Structure Arrange m ent, Bit-fields, Unaligned Data and Endianness, Division, Floating Point, Inline Functions and Inline Asse m bl y , Portability Issues. ARM p r ogramming using Assembly language: W riting Asse m bly code, Profiling and cycle counting, instruction scheduling, Register Allocation, Conditional Execution, Looping Constructs Laboratory Component: 1. W rite a progra m to arrange a series of 32-bit nu m bers in ascending/descending orde r . 2. W rite a progra m to count the nu m ber of ones and zeros in two consecutive m e m ory locations. 3. Display “Hello W orld” m essage using Internal UA R T . 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 1

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation S TRUCTUR E A RRANG E M E NT • The way you l ay ou t a fr e qu e ntly used s t r uct u re c an hav e a s i gnific a n t i m pac t on i t s perfor m ance a nd c ode densit y . • There are two issues concerni n g structures on the ARM:  Al i gn m ent of the structure entries and  The overall size of the structure. • ARM co m pilers wil l auto m aticall y align the start address of a structure to a m ult i ple of the la r gest access width used within the structure (usuall y four or eight bytes) and align entries within structure s to their access width by insert i ng padding. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 2

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation For exa m ple, consider the structure struct { char a; int b; char c; short d; } For a l i t t le-endian m e m ory sys t e m the co m piler wil l lay this out adding padding to ensure that the next object is aligned to the size of that object: 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 3

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • T o i m prove the m e m ory usage, you should reorde r the ele m ents struct { char a; char c; short d; int b; } • This reduces the structure size fro m 12 bytes to 8 bytes, with the fol l owing new layout: • Therefore, i t is a good idea to group structure ele m ents of the sa m e size, so that the structure layout doesn ’ t contain unnecessary padding. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 4

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • The armcc co m piler does include a keyword packed that re m oves all padding. • For exa m ple, the structure packed struct { char a; int b; char c; short d; } • wil l be l aid out in m e m ory as Howeve r , packed st r uc t ure s a re slow and i neffi c ie n t to access . The co m piler e m ulat e s unal i gn e d l oad a nd st o re op e rat i ons b y us i ng s e veral a l i gn e d a c cesse s wi t h data op e rat i ons t o m e r ge t he r e sul t s. Onl y u s e th e pack e d keyword where s p ace is fa r m o r e im p orta n t th a n s pe e d and you c a n ’ t r e du c e padd i ng by rearrage m ent. Also use it for porting code that assu m es a certa i n structure layout in m e m or y . 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 5

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation The e xac t lay o ut of a s t ruc t ure in m e m ory m ay depend on t he c o m pil e r vendor a nd c o m piler versi o n y ou us e . In API (Appl i c ati o n P rogra m m er Int e rfac e ) defini t i ons it i s often a g o od i dea t o i ns e rt a ny paddi n g t hat you canno t get r i d of i n t o t he s t ruct u re m anuall y . Thi s way t he s t ructure lay o ut i s no t a m bigu o us. It i s e a sier t o l i nk code between co m piler versions and co m piler vendors if you st i ck to una m biguous structure s . Another po i n t of ambi g u i ty i s e n um . Di f f e ren t c o m piler s use d i f ferent size s for an e nu m erat e d t y pe, dependi n g on the range of the enu m eration. For exa m ple, consider the t y pe t y pedef enu m { F ALSE, TRUE } Bool; 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 6

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • The armcc i n ADS1.1 wi l l t reat B o ol a s a on e -byt e t y pe a s it only us e s t he va l ues 0 a nd 1. Bool w i l l only take up 8 bits of space in a structure. • Howeve r , gcc will treat Bool as a word and take up 32 bits of space in a structure. • T o avoid a m biguity it is best to avoid using enum t y pes in structures used in the API to your code. Another c ons i derat i on i s t he si z e of th e s t ructur e a nd t he o f fse t s of e l e m ents wi t h i n t he s t ru c t u re. Th i s proble m is m ost acu t e wh e n you a re c o m pil i ng for t he Thu m b ins t ru c t i on set. Thu m b i ns t r uct i ons a re onl y 16 bit s w i de a nd s o only al l ow for s m all ele m ent o f fse t s fro m a s t r uc t ure base po i nt e r . T ab l e s hows t he l o ad a n d store base regis t er o f fsets available in Thu m b. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 7

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • Ther e fore t he co m pile r can onl y acces s an 8-bit s t r uc t ur e ele m ent wi t h a s i ngl e i n s t ruct i on i f i t appears with i n t he fi rs t 32 bytes of t he st r uc t ure. S i m i l arl y , si n gle ins t ruct i ons c a n onl y acce s s 16-bit va l ues i n the firs t 64 bytes and 32-b i t va l ues i n t he fi rs t 128 bytes. Once you e xce e d t hese l i m i t s, st r uc t ure acc e sses beco m e inefficient. • The following rules generate a structure with the ele m ents packed for m axi m u m efficiency:  Place all 8-bit ele m ents at the start of the structur e .  Place all 16-bit ele m ents next, then 32-bit, then 64-bit.  Place all arrays and la r ger ele m ents at the end of the structure.  If the structure is too big for a single instructi o n to access all t he ele m ents, then group the ele m ents into substructure s . The co m piler can m aintain pointe r s to the individua l substructu r es. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 8

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation SUMMA R Y Efficient Structu r e Arrangement • Lay structures out i n ord e r of increasing ele m ent size. Start the structure with the smallest ele m ents and finish with the la r gest. • A void very la r ge structures. Instead use a hierarchy of s m aller structures. • For portabilit y , m anually a dd padding (that would appear i m plicitly) into API s tructures so that the layout of the structure does not depend on the co m pile r . • Beware of using enu m types in API structures. The siz e of an enu m type is co m piler dependent. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 9

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation B I T -fiE LDS • Bit-fie l ds a re probably th e le a st st a nd a rdiz e d par t of th e ANSI C spe c ifi c ati o n. The c o m piler c a n c hoos e how bi t s a re a l locat e d wi t h i n the bi t -fie l d conta i ne r . F or t h i s re a son a l on e , avoid us i ng b i t -fie l ds i ns i de a uni o n or i n a n API s t ructure defini t i o n. Di f f e ren t co m pi l ers can as s i g n the sa m e bi t -field d i f f e rent bit posi t ions in the containe r . • It i s al s o a good i d ea to a vo i d bi t -fie l ds for e ffici e nc y . Bi t - fi elds a re struc t ure ele m ents a nd usu a l l y acce s sed us i ng s t ructure po i nt e rs; consequ e ntl y , th e y su f f e r fro m the po i nt e r a l i as i ng probl e m s. Every bi t -fie l d access i s real l y a m e m ory ac c ess. Poss i ble po i nt e r ali a si n g often force s the co m pil e r t o reload t he bi t - fi eld several t i m es. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 10

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation The following exa m ple, dostages_v1, illust r ates this proble m . It also shows that co m pilers do not tend to opti m ize bit-field testing ver y well. void dostageA(void); void dostageB(void); void dostageC(void); typedef struct { unsigned int stage A : 1; unsigned int stageB : 1; unsigned int stageC : 1; } Stages_v1; void dostages_v1(Stages_v1 *stages) { if (stages->stageA) { dostageA(); } if (stages->stageB) { dostageB(); } if (stages->stageC) { dostageC(); } } 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 11

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation Here, we use three bit-field flags to enable three possible stages of process i ng. The exa m ple co m piles to 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 12 • Note that the compiler accesses the memo r y location containing the bit-field three times. Because the bit - field is sto r ed in memor y , the dostage fu n ctions could change the value. A l s o , the compiler us e s two instructions to test bit 1 and bit 2 o f the bit-field, r ather than a single instruction. • Y ou can gene r ate far more efficient code b y using an integer rather than a bit-field. • Use enum or #define masks to divi d e the integer type into di f ferent fields.

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • The following code i m ple m ents the dostages function using logical operat i on s rather than bit- fields: t y pedef unsigned long Stages_v2 ; #define S T AGE A (1ul << 0) #define S T AGEB (1ul << 1) #define S T AGEC (1ul << 2) void dostages_v 2 (S t ages_v2 *stages_v 2 ) { Stages_v2 stages = *stages_v2; if (stages & S T AGEA) { dostageA() ; } if (stages & S T AGEB) { dostageB(); } if (stages & S T AGEC) { dostageC(); } } 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 13

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • Now that a single unsigned long t y pe contains all the bit-field s , we can keep a copy of their values in a single local variable stages, which re m oves the m e m ory aliasing proble m . • In other words, the co m piler m ust assu m e that the dostage X (where X is A, B, or C) functi o ns could change the value of *stages_v 2 . • The co m piler generates the fol l owi n g code giving a saving of 33% over the previou s version using ANSI bit-fields : 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 14

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • Y ou can also use the m asks to se t and clear the bit-fields, just as easil y as for test i ng the m . The following code shows how to set, clea r , or toggle bits using the S T AGE m asks: stages |= S T AGEA; /* enable stage A */ stages &= ∼ S T AGEB; /* disable stage B */ stages ∧ = S T AGEC; /* toggle stage C */ • These bit set, clea r , and toggle operat i ons take onl y one ARM instructi o n each, using ORR, BIC, and EOR instructi o ns, respectivel y . Another advantage is that you can now m anipulate several bit-fields at the sa m e t i m e, using one instruc t ion. For exa m ple: 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 15

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation Summary Bit-fields • A void using bit-fields. Instead use #define or enu m to define m ask values. • T es t , t oggl e , a nd se t bi t -fie l ds us i ng i nt e ger l o gica l AND, OR, a n d e xc l usiv e OR op e ra t i o ns wi t h t he m ask va l ues. T hese op e ra t i o ns c o m pile e f fici e n t l y , and you c a n tes t , t ogg l e, or set m ult i ple fie l ds a t t he s a m e t i m e. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 16

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation U NALIGN E D D A T A AND E NDIANN E SS • Unaligned data and endianness are two issues that can co m plicate m e m ory accesses and portabili t y . Is the array pointer aligned? Is the ARM configure d for a big-end i an or l i t t le- endian m e m ory sys t e m ? • The ARM load and store instruct i ons assu m e that the address is a m ult i ple of the t y pe y ou are loading or storing. If you load or store to an address that is not aligned to i t s t y pe, then the behavior depends on the part i cula r i m ple m entation. The core m ay generate a data abort or load a rotated value. For well-wri t ten, portable code you should avoid unaligned accesses. • C co m pilers assu m e that a pointer is aligned unless you say otherwise. If a pointer isn ’ t aligned, then the progra m m ay give unexpected resul t s. This is so m eti m es an issue when you are porting code to the ARM fro m processo r s that do allow unaligned accesses. For armcc , the packed directive tel l s the co m piler that a data i t e m can be posi t ion e d at any byte align m ent. This is useful for porting code, but using packed wil l i m pact perfor m ance. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 17

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation T o i l l us t rate t h i s, l ook a t t he fol l ow i ng si m p l e rout i ne, re a d i nt. I t r e t u rns t he int e ger at th e addr e ss po i nt e d t o by data. W e’ve used packed to tel l the co m piler that the integer m ay possib l y not be aligned. int r eadint( packed int *data) { r eturn *data; } 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 18 BIC r3,r0,#3 ; r3 = d a t a & 0xFFFFFF F C AND r0,r0,#3 ; r0 = d a t a & 0x00000003 M O V r0,r0,LSL #3 ; r0 = bit of f set of d a t a w o r d LDMIA r3,{r3,r12} ; r3, r12 = 8 by t es r ead f r om r3 M O V r3,r3,LSR r0 ; These th r ee in s tructions R SB r0,r0,#0x20 ; shift the 64 bit v alue r12.r3 ORR r0,r3,r12,LSL r0 ; rig h t by r0 bits M O V pc,r14 ; r e turn r0 This compiles to readint • The c ode is la r ge and c ompl e x. • The c ompiler emul a t es the unal i gned access using t w o aligned accesses and da t a p r ocessing ope r a tions, which is v ery co s tl y . • y ou should a void _pac k ed. • In s t ead use the type char * t o poi n t t o da t a th a t c an appear a t a n y al i gnment.

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • Ali g n m ent proble m s : Wh e n re a ding dat a packet s or file s us e d t o t r a nsf e r i nfor m ati o n between c o m pu t ers. Netwo r k pack e ts a nd co m pre s sed i m age fil e s ar e g o od ex a m ples. T wo- or fou r -byte i nt e gers m ay appear at arbi t rary o f fset s in these files. Data has been squeezed as m uch as possib l e, to the detri m ent of align m ent. • Endi a nne s s (o r byte ord e r) i s also a b i g i s s ue wh e n re a d i ng dat a pa c ke t s or c o m pr e ss e d fil e s. T he ARM core c a n b e c on fi gur e d to work i n littl e - e nd i an (lea s t si g n i fican t byte a t l owes t a ddr e s s ) or b i g- e nd i an ( m ost significant byte at lowest address) m odes. Li t t l e-endian m ode is usually the default. • The endi a nn e s s of an ARM i s usuall y s e t a t powe r -up and re m ains fi xed th e reaft e r . T ables i l l u s t rate how the ARM ’ s 8-bit, 16-bit , and 32-bit load and store instructi o ns work for di f ferent endian configura t ions. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 19

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation W e a s s u m e th a t byte a ddr e ss A is a l i gn e d t o the size of t he m e m ory tr a nsf e r . T he t a bl e s show how t he byte addresses in m e m ory m ap into the 32-bit regis t er that the instructi o n loads or stores. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 20

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • What is the best way to deal with endian and alignment problems? • If speed is not critical, then use functions li k e readint_little and r eadint_ b ig in b elow example, w h ich read a fou r - b yte integer from a p ossibly unaligned address in memor y . The address alignment is n ot known at compile time, o n ly at ru n time. • If you’ve loaded a file containing big- endian data such as a JPEG image, then use readint_big. • For a bytestream containing little-endian data, use readint_little. • Both routines will work correctly regardless of the memory endianness ARM is configured fo r . Example: These functions read a 32-bit integer f r om a b ytestrea m pointed to b y d ata. The b ytestream contains little- or big-endian data, respectivel y . These functions are independent o f the ARM memory s y s tem b yte or d er si n ce they only use byte acces s es. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 21

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation If s pe e d i s c r i t i cal, t hen t he fastest a p p ro a ch i s to write sev e ral vari a n t s of t he c rit i c al rout i ne. F o r each poss i bl e al i gn m ent and A R M endi a nnes s confi g ur a t i on, you ca l l a s ep a rate rout i ne opt i m ized for t ha t si t uation. SUMMA R Y Endianness and Alignment • A void using unaligned data if you can. • Use t he t y pe ch a r * fo r da t a tha t c a n be at any byte a l i gn m ent. Acce s s t he data by re a ding byte s and co m bin i ng wi t h l o gica l op e rati o ns. T hen t he c ode won ’ t dep e nd on ali g n m ent or ARM endi a nnes s configurati o n. • For fast acc e s s to unal i gn e d s t ruc t ur e s, wri t e d i f f e ren t variants ac c ordi n g t o po i n t er al i gn m ent and processor endianness. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 22

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation D I v ISION • The AR M doe s not have a divide instruction in hardware. Instead the compiler implement s divisions b y calling software routines in the C libra r y . There are many di f fere n t types of division ro u tine that yo u can tailor t o a sp e cific range of nu m erator and deno m inator values. • The standard integer divisio n r o utine provided in the C lib r ary can ta k e betw e en 20 and 100 cycles, d epending on implementation, early termination, and the ranges of the input operands. • Division and modulus (/ and %) are such sl o w o perations that you should avoid them as much as possible. Howeve r , division by a constant and repeated division by the same denominator can be handled efficientl y . • This section de s cribes how to replace certain d ivisions b y multiplications and ho w to minimize the number of divis i on calls. • Circular bu f fer s are o n e area w here prog r am m ers ofte n us e division, b u t yo u can avoi d thes e divisions completel y . Suppose you h ave a circ u lar bu f fer o f size bu f fe r _size b ytes and a position indicated b y a bu f fer o f fset. T o advance the o f fset by increment bytes you could write offset = (offset + inc r ement) % buffer_si z e ; • 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 23

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation Instead i t is far m ore efficient to wri t e offset += inc r ement; if (off s e t >=buffer_s i z e) { offset -= buffer_s i z e; } • The first version m ay take 50 c y cles; the second will take 3 c y cles because it does not involve a division. • If you c a n ’ t a void a d i vi s i o n, t hen t ry to a r range t hat the nu m era t or and deno m inat o r a re uns i gned integers . • Sign e d d i v i s i on rout i nes a re slower s i nce t hey tak e t h e a bso l ute va l ues of t h e nu m era t or a n d d e no m ina t or and then call the unsigned divis i on routine. They fix the sign of the result afterwards. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 24

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation Many C library division routines return th e quotien t and re m ainder from th e division. In other words a free re m ainder operation i s ava i lab l e t o you wit h e ach divisio n operation an d vic e versa. For ex a m ple, to find the (x, y) position of a location at o f fset bytes into a screen bu f fe r , it is te m pting to write typedef struct { int x; int y; } point; point getxy_v1(unsigned int o f fset, unsigned int bytes_per_line) { point p; p.y = o f fset / bytes_per_line; p.x = o f fset - p.y * bytes_per_line; return p; } 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 25 It appea r s that w e h a v e s a v ed a division b y using a subt r act and multiply t o calcul a t e p.x, but in f ac t , it is of t en mo r e e f ficie n t t o wri t e the function w i th t he mo d ulus or r emainder ope r a tion.

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation Exa m ple: In getxy_v2, the quotient and re m ainder operation only require a single call to a division routine: point getxy_v2(unsigned int o f fset, unsigned int bytes_per_line) { point p; p.x = o f fset % bytes_per_line; p.y = o f fset / bytes_per_line; return p; } This version is four instruc t ions shorter than getxy_v1(Co m piler output-Asse m bly progra m ) 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 26

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation DIVISION • REPE A TED UNSIGNED DIVISION WITH REMAINDER • CONVE R TING DIVIDES IN T O MU L TIPLIES • UNSIGNED DIVISION B Y A CONS T ANT • SIGNED DIVISION B Y A CONS T ANT 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 27

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation REPE A TED UNSIGNED DIVISION WITH REMAINDER Often the same denominator occurs several times in code. In the prev i ous example, bytes_per_line will probab l y be fixed throughout the program. If we project from three to two cartesian coordinates, then we use the deno m inator twice: • ( x , y , z ) → ( x / z , y / z ) • In these situations it is more effic i ent to cache the value of 1/z in some way and use a multiplication by 1/z instead of a division. CONVE R TING DIVIDES IN T O MU L TIPLIES Example: Th e r o utine, s c ale , sh o w s ho w t o conv e rt divis i ons to m ult i plications in pract i c e . It divi d es an ar r a y o f N e lements by denominator d. • First calculate the value of s. • Then rep l ace each divide by d with a multiplication by s. • The 6 4-bit multiply is cheap b e c a u se the ARM has a n instruc t ion UMULL, which multiplies two 32-bit values, givin g a 6 4- bit result. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 28

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation void scale( unsigned int *dest, /* destinati o n for the scale data */ unsigned int *src, /* source unscaled data */ unsigned int d, /* deno m inator to divide by */ unsigned int N) /* data length */ { unsigne d int s = 0xFFFFFFFFul / d; /*s=(2 ^32 -1) / d*/ do { unsigned int n, q, r; n = *(src++); q = (unsigned int)(((un s igned long long)n * s) >> 32); r= n - q* d; if (r >= d) { q++; } *(dest++ ) = q; } while (--N); } 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 29

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation UNSIGNED DIVISION B Y A CONS T AN T - A LGORITHM unsigned int udiv_by_const(unsigned int n, unsigned int d) { unsigned int s,k,q; /* W e assu m e d!=0 */ /* first find k such that (1 << k) <=d< (1 << (k+1)) */ for (k=0; d/2>=(1u << k); k++); if (d==1u << k) { /* we can i m ple m ent the divide with a shift */ return n >> k; } • /* d is in the range (1 << k)<d< (1 << (k+1)) */ s = (unsigned int)(((1ull << (32+k))+(1ull << k))/d); if ((unsigned long long)s*d >= (1ull << (32+k))) { /* n/d = (n*s) >> (32+k) */ q = (unsigned int)(((unsigned long long)n*s) >> 32); return q >> k; } /* n/d = (n*s+s) >> (32+k) */ q = (unsigned int)(((unsigned long long)n*s + s) >> 32); return q >> k; } 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 30

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation SIGNED DIVISION B Y A CONS T ANT 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 31

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation The following routi n e, sdiv_by_con s t, shows how to divide by a signed constant d . In practice you wil l preca l culate k and s at co m pile ti m e. Only the operat i on s involvi n g n for your part i cular value of d need be executed at run t i m e. int sdiv_by_const(int n, int d) { int s, k ,q; unsigned int D; /* set D to be the absolute value of d, we assu m e d!=0 */ if (d>0) { D=(unsigned int)d; /* 1 <= D <= 0x7FFFFFF F */ } else { D=(unsigned int) - d; /* 1 <= D <= 0x80000000 */ } 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 32

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation /* first find k such that (1 << k) <=D< (1 << (k+1)) */ for (k=0; D/2>=(1u << k); k++); if (D==1u << k) { /* we can i m ple m ent the divide with a shift */ q = n >> 31; /* 0 if n>0, -1 if n<0 */ q=n+ ((un s igned)q >> (32-k ) ) ; /* insert roundin g */ q = q >> k; /* divide */ if (d < 0) { q = -q; /* correct sign */ } return q; } • 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 33

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation /* Next find s in the range 0<=s<=0xFFFFFFF F */ /* Note that k he r e is one s m alle r than the k in the equation */ s = (int)(((1ull << (31+( k +1)))+(1ull <<(k+1)))/D); if (s>=0) { q = (int)(((signed long long)n*s) >> 32); } else { /* (unsigned)s = (signed)s + (1 << 32) */ q=n+ (int)(((signed long long)n*s) >> 32); } q = q >> k ; /* if n<0 then the for m ula r equi r es us to add one */ q += (unsigned)n >> 31; /* if d was negative we m ust cor r ect the sign */ if (d<0) { q = -q; } r eturn q; } 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 34

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation SUMMA R Y Division • A void divis i ons as m uch as possib l e. Do not use the m for circular bu f fer handling. • If y ou c a n ’ t a vo i d a div i s i on, th e n t ry to t a ke a d v a nt a ge of t h e fac t t hat d ivid e r outin e s of t en gen e rat e the quotient n/d and m odulus n % d togethe r . • T o r e peat e dly d i vide by t he s a m e deno m in a t o r d , cal c ulat e s = (2 k - 1)/ d i n a dv a nce. Y ou c a n repla c e t he divide of a k -bit unsigned integer by d with a 2 k -bit m ult i ply by s . • T o d i vid e uns i gn e d n < 2 N by an un s i g ned c onst a nt d , you can find a 32-bit u n si g ned s a nd s h i ft k s uch tha t n / d is ei t her ( ns ) > > ( N + k ) or ( ns + s ) > > ( N + k ). T he choic e dep e nds onl y on d . There i s a si m i l ar result for signed divisions. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 35

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation F LO A TING P OINT • The m ajorit y of ARM proc e s s or i m ple m entati o ns do not prov i de hardware fl oat i n g -p o int su p por t , whi c h sav e s on pow e r a nd a re a wh e n usi n g ARM i n a pr i ce-s e nsi t ive , e m bedded app l icat i on. W i th th e e xcept i ons of t he F l oat i n g Po i nt Accel e rator (F P A) us e d on th e ARM7500FE a nd the V ec t or F l oat i ng Po i n t accel e ra t or (VFP) hardware, the C co m piler m ust provide support for floating point in software. • In prac t ic e , t h i s m eans that t he C co m pil e r conv e rts ev e ry float i ng-poi n t oper a t i on i n t o a s ubrout i ne call. The C l i brar y c ontai n s s ubrou t ine s t o si m ula t e fl oat i ng-poi n t behav i or us i ng int e ger a ri t h m etic. Thi s c ode is wri t t en in highl y opti m ized a sse m bl y . Ev e n so, fl oat i ng-po i nt a l gori t h m s will e xe c ute f a r m ore s l owl y th a n correspo n d i ng integer algorith m s. • If y ou need f a st e xe c ut i on a nd fract i onal value s , y ou s hou l d use fix e d-p o in t or b l ock- floa t i n g a l gori t h m s. Fractional values are m ost often used when processing digi t al signals such as audio and video. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 36

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation INLINE FUNCTIONS AND INLINE ASSEMB L Y SUMMA R Y Inline Functions and Assembly • Use i nline functions to declare new operat i on s or pri m i t ives not supported by the C co m pile r . • Use i nline asse m bly to access ARM instruction s not supported by the C co m pile r . Exa m ples are coproces s or instruction s or ARMv5E extensions. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 37

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation P O R T ABILIT Y I SSU E S The l is t of issues encounter when porti n g C code to the ARM. • The cha r type . On the ARM, char is unsigned rather than signed as for many other processors. A com m on proble m concerns loops that use a char loop counter i and the continuation condition i ≥ 0, they beco m e infinite loops. In this situation, ar m cc produces a warning of unsigned co m parison with zero. Y ou should eithe r use a compile r option to ma k e cha r signed o r change loop counters to type int. • The int type . Some older architectures use a 16-bit int, which m ay cause proble m s when moving to ARM ’ s 32-bit int type although this is rare nowadays. Note that expressions are pro m oted to an int type before evaluation. Therefore if i = -0x1000, the expression i == 0xF000 is true on a 16-bit m achine but false on a 32- bit m achine. • Unaligned data pointers . So m e processors support the loading of short and int typed values fro m unaligned addresses. A C program may m anipulate pointers directly so that they beco m e unaligned, for example, by casting a char * to an int *. ARM architectures up t o ARMv5TE do no t support unaligned pointers. T o detect the m , run the program on an ARM with an alignment checking trap. For exa m ple, you can configure the ARM720 T to data abort on an unaligned access. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 38

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • Endian ass u m p tions . C cod e m ay m ake as s u m pti o ns about the e ndi a nn e ss of a m e m ory sys t e m , for exa m ple, by c asti n g a c har * to an i nt *. If you c onfig u re the ARM for the sa m e endi a nn e ss t he c od e is expe c t i ng, t hen th e re i s no is s ue. O t herw i se, y ou m us t re m ove endi a n-dependen t c ode sequences and replace the m by endian-i n dependen t ones. See Sec t ion 5.9 for m ore detai l s. • Funct i on p r o t otyp in g . The a r m cc co m pil e r pass es a r gu m ent s narro w , t hat i s, reduc e d t o th e r a ng e of the a r gu m ent t y pe. If funct i on s a re no t p r oto t y p ed c o r rect l y , t h en t he funct i on m ay re t urn t he wrong an s we r . Other co m pil e rs t hat pas s a r gu m ents wid e m ay giv e th e corre c t a nswer ev e n i f th e funct i on pro t ot y pe i s incorrect. Always use ANSI protot y pes. • Use of b i t -fie l ds . Th e layou t of b i t s wi t h i n a b i t-fie l d i s i m ple m entat i on a n d e ndi a n d e pend e nt. If C cod e assu m es that bits are laid out in a certa i n orde r , then the code is not portab l e. • Use of en u m e rations . Al t h o ugh e nu m is portab l e, d i f f e r e nt c o m pil e rs a l locat e d i f f e ren t nu m bers of b y tes to a n e nu m . The gcc c o m pi l er wi l l a lways allocat e four b y tes t o a n e nu m t y pe. The a r m cc co m piler wil l onl y al l ocate one byte i f th e enu m t a kes only eigh t -bit va l ues. The r e fo r e you c an ’ t c r oss- l in k code and l i braries between diffe r ent compilers if you use enums in an API structu r e . 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 39

Mic r o c ont r oller and Embedded Sy s t ems C Compilers and O ptimi z ation • Inlin e assem b l y . Us i ng i nl i ne a s se m bly i n C code r e duc e s por t ab i l i t y between a rchit e ctures. Y ou shou l d sep a rate a ny i nl i ne a s se m bly in t o s m all inl i ned funct i ons that c a n eas i l y be r e plac e d. It is al s o us e ful to supp l y r e f e rence, pl a in C i m ple m entat i ons of th e se funct i ons t hat c a n b e us e d on o t her ar c hitec t ures, wh e re this is possible. • The vo l atil e keyword. Us e t he volat i l e keywo r d on t he t y pe defin i t i on s of AR M m e m ory- m apped periphera l lo c a t i ons. T hi s ke y word pr e vents the c o m pil e r fro m opt i m izing away th e m e m or y acce s s. I t a lso ensures t hat t he co m pil e r generat e s a data a ccess of th e correct t y pe. F or exa m ple, i f you defin e a m e m ory locat i on as a volat i l e s hort t y pe, t hen t he c o m piler wil l a ccess it us i ng 16-bit l oad and s t ore i ns t ruct i ons LDRSH and STRH. 28-07-2023 D r Anitha D B ,CSE-DS, A TMECE,Myso r e 40
Tags