Multithreading

abshinde 20,161 views 40 slides Apr 13, 2017
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

Multithreading concepts and its use in computer architectures.


Slide Content

Multithreading
Mr. A. B. Shinde
Assistant Professor,
Electronics Engineering,
P.V.P.I.T., Budhgaon

Contents…
UsingILPsupporttoexploit
thread–levelparallelism
performanceandefficiencyin
advanced multipleissue
processors
2

Threads
AthreadisabasicunitofCPUutilization.
Athreadisaseparateprocesswithitsowninstructionsanddata.
Athreadmayrepresentaprocessthatispartofaparallelprogram
consistingofmultipleprocesses,oritmayrepresentan
independentprogram.
3

Threads
ItcomprisesofathreadID,aprogramcounter,aregistersetanda
stack.
Itsharesitscodesection,datasection,andotheroperating-system
resources,suchasopenfilesandsignalswithotherthreads
belongingtothesameProcess.
Atraditionalprocesshasasinglethreadofcontrol.Ifaprocesshas
multiplethreadsofcontrol,itcanperformmorethanonetaskatatime.
4

Threads
Manysoftwarepackagesthatrun
onmoderndesktopPCsare
multithreaded.
Forexample:
Awordprocessormayhave:
athreadfordisplayinggraphics,
anotherthreadforrespondingto
keystrokesfromtheuser,and
athirdthreadforperformingspelling
andgrammarcheckinginthe
background.
5

Threads
Threadsalsoplayavitalroleinremoteprocedurecall(RPC)
systems.
RPCsallowsinterprocesscommunication byprovidinga
communicationmechanismsimilartoordinaryfunctionorprocedure
calls.
Manyoperatingsystemkernelsaremultithreaded;severalthreads
operateinthekernel,andeachthreadperformsaspecifictask,suchas
managingdevicesorinterrupthandling.
6

Multithreading
Benefits:
1.Responsiveness:Multithreadingisaninteractiveapplicationthat
mayallowaprogramtocontinuerunningevenifpartofitis
blocked,therebyincreasingresponsivenesstotheuser.
Forexample:Amultithreadedwebbrowsercouldstillallowuser
interactioninonethreadwhileanimagewasbeingloadedinanother
thread.
2.Resourcesharing:Bydefault,threadssharethememoryandthe
resourcesoftheprocesstowhichtheybelong.Thebenefitofsharing
codeanddataisthatitallowsanapplicationtohaveseveraldifferent
threadsofactivitywithinthesameaddressspace.
7

Multithreading
Benefits:
3.Economy:Allocatingmemoryandresourcesforprocesscreationis
costly.Sincethreadsshareresourcesoftheprocesstowhichthey
belong,theywillprovidecosteffectivesolution.
4.Utilizationofmultiprocessorarchitectures:Inmultiprocessor
architecture,threadsmayberunninginparallelondifferentprocessors.
AsinglethreadedprocesscanonlyrunononeCPU,nomatterhow
manyareavailable.
Multithreadingonamulti-CPUmachineincreasesconcurrency.
8

Multithreading Models
Supportforthreadsmaybeprovidedeitherattheuserlevelorat
thekernellevel.
Userthreadsaresupportedabovethekernelandaremanaged
withoutkernelsupport,whereaskernelthreadsaresupportedand
manageddirectlybytheoperatingsystem.
9

Multithreading Models
Many-to-OneModel:
Themany-to-onemodelmapsmanyuser-
levelthreadstoonekernelthread.
Threadmanagement isdonebythe
threadlibraryinuserspace,soitis
efficient.
Onlyonethreadcanaccessthekernelat
atime,hencemultiplethreadsareunableto
runinparallelonmultiprocessors.
10

Multithreading Models
One-to-OneModel:
Theone-to-onemodelmapseachuser
threadtoakernelthread.
Itprovidesmoreconcurrencythanthemany-
to-onemodel.Itallowsmultiplethreadstorunin
parallelonmultiprocessors.
Theonlydrawbacktothismodelisthat
creatingauserthreadrequirescreatingthe
correspondingkernelthread.
Theoverheadofcreatingkernelthreadscan
burdentheperformanceofanapplication.
11

Multithreading Models
Many-to-ManyModel:
Themany-to-manymodelmultiplexesmany
user-levelthreadstoasmallerorequal
numberofkernelthreads.
Thenumberofkernelthreadsmaybespecific
toeitheraparticularapplicationoraparticular
machine.
Developerscancreateasmanyuserthreads
asnecessary,andthecorrespondingkernel
threadscanruninparallelona
multiprocessor.
12

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
13

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
AlthoughILPincreasestheperformanceofsystem;thenalsoILP
canbequitelimitedorhardtoexploitinsomeapplications.
Furthermore,theremaybeparallelismoccurringnaturallyatahigher
levelintheapplication.
Forexample:
Anonlinetransaction-processingsystemhasparallelismamongthe
multiplequeriesandupdates.Thesequeriesandupdatescanbe
processedmostlyinparallel,sincetheyarelargelyindependentofone
another.
14

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Thishigher-levelparallelismiscalledthread-levelparallelism(TLP)
becauseitislogicallystructuredasseparatethreadsofexecution.
ILPisparalleloperationswithinalooporstraight-linecode.
TLPisrepresentedbytheuseofmultiplethreadsofexecutionthat
areinparallel.
15

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Thread-levelparallelismisanimportantalternativetoinstruction-
levelparallelism.
Inmanyapplicationsthread-levelparallelismoccursnaturally(many
serverapplications).
Ifsoftwareiswrittenfromscratch,thenexpressingtheparallelism
ismucheasy.
Butifestablishedapplicationswrittenwithoutparallelisminmind,
thentherecanbesignificantchallengesandcanbeextremelycostly
torewritethemtoexploitthread-levelparallelism.
16

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
TLPandILPexploitstwodifferentkindsofparallelstructures.
Thecrucialquestionis:
CanweexploitTLPonprocessordesignedforILP
Answeris:Yes
DatapathdesignedtoexploitILPwillfindthatmanyfunctionalunitsare
oftenidlebecauseofeitherstallsordependencesinthecode.
Thethreadscanbeusedasaindependentinstructionsthatmightkeep
theprocessorbusytoimplementTLP.
17

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Multithreadingallowsmultiplethreadstosharethefunctionalunits
ofasingleprocessorinanoverlappingfashion.
Topermitthissharing,theprocessormustduplicatethe
independentstateofeachthread.
Forexample:
Aseparatecopyoftheregisterfile,aseparatePCandaseparatepage
tablewererequiredforeachthread.
Inaddition,thehardwaremustsupporttheabilitytochangetoa
differentthreadsrelativelyquickly.
18

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Therearetwomainapproachestomultithreading.
Fine-grainedmultithreading&
Coarse-grainedmultithreading
19

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Fine-grainedmultithreading:
Itswitchesbetweenthreadsoneachinstruction,causingthe
executionofmultiplethreadstobeinterleaved.
Thisinterleavingisoftendoneinaround-robinfashion.
Tomakefine-grainedmultithreadingpractical,theCPUmustbe
abletoswitchthreadsoneveryclockcycle.
Advantage:Itcanhidethethroughputlossesthatarisefrombothshort
andlongstalls.
Disadvantage:Itslowsdowntheexecutionoftheindividualthreads.
20

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
Coarse-grainedmultithreading:
Itwasinventedasanalternativetofine-grainedmultithreading.
Coarse-grainedmultithreadingswitchesthreadsonlyoncostly
(larger)stalls.
Advantage:Thischangerelievestheneedtohavethreadswitching.
Disadvantage:Theyarelikelytoslowtheprocessordown,since
instructionsfromotherthreadswillonlybeissuedwhenathread
encountersacostly(larger)stalls.
21

Multithreading: ILP Support to Exploit
Thread-Level Parallelism
CPUwithcoarse-grainedmultithreadingissuesinstructionsfroma
singlethread.
Whenastalloccurs,thepipelinemustbeemptiedorfrozen.
Newthreadthatisexecutingafterthestallmustfillthepipeline.
Becauseofthisstart-upoverhead,coarsegrainedmultithreadingis
muchmoreusefulforreducingthepenaltyofhigh-coststalls,
wherepipelinerefillisnegligiblecomparedtothestalltime.
22

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Simultaneousmultithreading(SMT)isavariationonmultithreading
thatusestheresourcesofamultiple-issue,dynamicallyscheduled
processortoexploitTLP.
Multiple-issueprocessorsoftenhavemorefunctionalunit
parallelismthanasinglethread,motivatestheuseofSMT.
Withregisterrenaminganddynamicscheduling,multipleinstructions
fromindependentthreadscanbeissuedwithoutconsideringthe
dependencesamongthem.
23

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Figureillustratesthedifferencesinaprocessor’sabilitytoexploitthe
resourcesofasuperscalarforthefollowingconfigurations:
Asuperscalarwithnomultithreadingsupport
Asuperscalarwithcoarse-grainedmultithreading
Asuperscalarwithfine-grainedmultithreading
Asuperscalarwithsimultaneousmultithreading
24

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Inthesuperscalarwithoutmultithreadingsupport,
theuseofissueslotsislimitedbyalackofILP.
Inaddition,amajorstall,suchasaninstruction
cachemiss,canleavetheentireprocessoridle.
25
Anempty(white)boxindicatesthatthe
correspondingissueslotisunusedinthatclock
cycle.
Blackisusedtoindicatetheoccupiedissueslots

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Inthecoarse-grainedmultithreadedsuperscalar,
thelongstallsarepartiallyhiddenbyswitching
toanotherthreadthatusestheresourcesofthe
processor.
Thisreducesthenumberofcompletelyidle
clockcycles,withineachclockcycle,theILP
limitationsstillleadtoidlecycles.
Inacoarsegrainedmultithreadedprocessor,
threadswitchingonlyoccurswhenthereisa
stall,thenalsotherewillbesomefullyidlecycles
remaining.
26
Theshadesofgreyandblackcorrespondto
differentthreadsinthemultithreadingprocessors.

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
Inthefine-grainedmultithreading,the
interleavingofthreadseliminatesfullyempty
slots.
Becauseonlyonethreadissuesinstructionsin
agivenclockcycle,ILPlimitationsstillleadtoa
significantnumberofidleslotswithinindividual
clockcycles.
27
Anempty(white)boxindicatesthatthe
correspondingissueslotisunusedinthatclock
cycle.
Theshadesofgreyandblackcorrespondtofour
differentthreadsinthemultithreadingprocessors.

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
InSMT,TLPandILPareexploited
simultaneously.
Ideally,theissueslotusageislimitedby
imbalancesintheresourceneedsandresource
availabilityovermultiplethreads.
Inpractice,otherfactors—
-howmanyactivethreadsareconsidered,
-finitelimitationsonbuffers,
-theabilitytofetchenoughinstructionsfrom
multiplethreads,and
-practicallimitationsofwhatinstruction
combinationscanissuefromonethreadand
frommultiplethreads—canalsorestricthow
manyslotsareused.
28

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
DesignChallengesinSMT
Becauseadynamicallyscheduledsuperscalarprocessorhasa
deeppipeline,coarse-grainedMTwillgainmuchinperformance.
SinceSMTmakessenseonlyinafine-grainedimplementation,we
shouldthinkabouttheimpactoffine-grainedschedulingonsingle-
threadperformance.
Thiseffectcanbeminimizedbyhavingapreferredthread,which
stillpermitsmultithreadingtopreservesomeofitsperformance
advantagewithasmallercompromiseinsingle-threadperformance.
29

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
DesignChallengesinSMT
OtherdesignchallengesforanSMTprocessor:
Dealingwithalargerregisterfileneededtoholdmultiplecontexts.
Notaffectingtheclockcycle,particularlyininstructionissue,where
moreinstructionsneedstobeconsidered,andchoosingwhat
instructionstocommitmaybechallenging.
EnsuringthatthecacheandTLBconflictsgeneratedbythe
simultaneousexecutionofmultiplethreadsdonotcausesignificant
performancedegradationisalsochallenging.
30

Converting Thread-Level
Parallelism into Instruction-Level Parallelism
DesignChallengesinSMT
Inmanycases,thepotentialperformanceoverheaddueto
multithreadingissmall.
Theefficiencyofcurrentsuperscalarsislowenoughthatthereis
scopeforsignificantimprovement,evenatthecostofsomeoverhead.
31

Performance and Efficiency in Advanced
Multiple-Issue Processors
32

Performance and Efficiency in Advanced
Multiple-Issue Processors
Thequestionofefficiencyintermsofsiliconareaandpoweris
equallycritical.
Poweristhemajorconstraintonmodernprocessors.
TheItanium2isthemostinefficientprocessorbothforfloating-point
andintegercode.
TheAthlonandPentium4bothmakesgooduseoftransistorand
areaintermsofefficiency.
TheIBMPower5isthemosteffectiveuserofenergy.
Thefactthatnoneoftheprocessorsofferangreatadvantagein
efficiency.
33

Performance and Efficiency in Advanced
Multiple-Issue Processors
WhatLimitsMultiple-IssueProcessors?
Powerisafunctionofbothstaticpower(proportionaltothetransistor
count,whetherornotthetransistorsareswitching),anddynamic
power(proportionaltotheproductofthenumberofswitching
transistorsandtheswitchingrate).
Staticpoweriscertainlyadesignconcern,anddynamicpoweris
usuallythedominantenergyconsumer.
AmicroprocessortryingtoachievebothalowCPIandahighCR
mustswitchmoretransistorsandswitchthemfaster.
34

Performance and Efficiency in Advanced
Multiple-Issue Processors
WhatLimitsMultiple-IssueProcessors?
Mosttechniquesusedforincreasingperformance,(multiplecores
andmultithreading)willincreasepowerconsumption.
Thekeyquestioniswhetheratechniqueisenergyefficient?
Doesitincreasepowerconsumptionfasterthanitincreases
performance?
35

Performance and Efficiency in Advanced
Multiple-Issue Processors
WhatLimitsMultiple-IssueProcessors?
Thisinefficiency,arisesfromtwoprimarycharacteristics:
First,issuingmultipleinstructionsincurssomeoverheadinlogic
thatgrowsfasterthantheissuerategrows.
Thislogicisresponsibleforinstructionissueanalysis,including
dependencechecking,registerrenaming,andsimilarfunctions.
Thecombinedresultisthat,lowerCPIsarelikelytoleadtolower
ratiosofperformanceperwatt,simplyduetooverhead.
36

Performance and Efficiency in Advanced
Multiple-Issue Processors
WhatLimitsMultiple-IssueProcessors?
Second,thegrowinggapbetweenpeakissueratesandsustained
performance.
Thenumberoftransistorsswitchingwillbeproportionaltothe
peakissuerate,andtheperformanceisproportionaltothe
sustainedrate.
Forexample:Ifwewanttosustainfourinstructionsperclock,wemust
fetchmore,issuemore,andinitiateexecutiononmorethanfour
instructions.
Thepowerwillbeproportionaltothepeakrate,butperformance
willbeatthesustainedrate.
37

Performance and Efficiency in Advanced
Multiple-Issue Processors
WhatLimitsMultiple-IssueProcessors?
ImportanttechniqueforincreasingtheexploitationofILP(speculation)
—isinefficient…becauseitcanneverbeperfect.
Ifspeculationwereperfect,itcouldsavepower,sinceitwould
reducetheexecutiontimeandsavestaticpower.
Whenspeculationisnotperfect,itrapidlybecomesenergy
inefficient,sinceitrequiresadditionaldynamicpower.
38

Performance and Efficiency in Advanced
Multiple-Issue Processors
WhatLimitsMultiple-IssueProcessors?
Focusingonimprovingclockrate:
Increasingtheclockratewillincreasetransistorswitching
frequencyanddirectlyincreasepowerconsumption.
Toachieveafasterclockrate,wewouldneedtoincreasepipeline
depth.
Deeperpipelines,incuradditionaloverheadpenaltiesaswellas
causinghigherswitchingrates.
39

40
This presentation is published only for educational purpose
[email protected]