GAME THEORY AND MONTE CARLO SEARCH SPACE TREE

UNIT-III
GAME PLAYING AND CSP

TOPICS
•Game theory
•Optimal decisions in games
•Alpha-beta search
•Monte-carlotree search
•Stochastic games
•Partially observable games
•Constraint satisfaction problems
•Constraint propagation
•Backtracking search for CSP
•Local search for CSP
•Structure of CSP

Game theory
•GamePlayingisanimportantdomainofartificial
intelligence.Gamesdon’trequiremuchknowledge;
theonlyknowledgeweneedtoprovideistherules,
legalmovesandtheconditionsofwinningorlosing
thegame.
•Gametheory,branchofappliedmathematicsthat
providestoolsforanalyzingsituationsinwhich
parties,calledplayers,makedecisionsthatare
interdependent.Thisinterdependencecauseseach
playertoconsidertheotherplayer’spossible
decisions,orstrategies,informulatingstrategy.
•Competitiveenvironments,inwhichtheagent’s
goalsareinconflict,giverisetoadversarialsearch
problems–oftenknownasgames.

Formal Definition of Game
•Consider games with two players, whom we will call
MAX and MIN. MAX moves first, and then they take
turns moving until the game is over. At the end of the
game, points are awarded to the winning player and
penalties are given to the loser.
•A game can be formally defined as a kind of search
problem with the following elements:

•S0:Theinitialstate,whichspecifieshowthegameisset
upatthestart.
•PLAYER(s):Defineswhichplayerhasthemoveinastate.
•ACTIONS(s):Returnsthesetoflegalmovesinastate.
•RESULT(s,a):Thetransitionmodel,whichdefinesthe
resultofamove.
•Terminaltest(s):whichistruewhenthegameisoverand
falseotherwise.Stateswherethegamehasendedare
calledterminalstates
•UTILITY(s,p):Autilityfunction(alsocalledanobjective
functionorpayofffunction),definesthefinalnumeric
valueforagamethatendsinterminalstatesforaplayer
p.

•Theinitialstate,ACTIONSfunction,andRESULTfunction
definethegametreeforthegame—atreewherethe
nodesaregamestatesandtheedgesaremoves

Optimal decisions in games
•Innormalsearchproblem,theoptimalsolution
wouldbeasequenceofmoveleadingtoagoalstate
–aterminalstatethatisawin.
•MINhassomethingtosayaboutit,MAXtherefore
mustfindacontingentstrategy,whichspecifies
MAX’smoveintheinitialstate,
•ThenMAX’smovesinthestatesresultingfromevery
possibleresponsebyMINandsoon

•AND–ORsearchalgorithm
•MAXplayingtheroleofORandMINequivalentto
AND.
•anoptimalstrategyleadstooutcomesatleastas
goodasanyotherstrategywhenoneisplayingan
infallibleopponent.
•ThepossiblemovesforMAXattherootnodeare
labeleda1,a2,anda3.Thepossiblerepliestoa1for
MINareb1,b2,b3,andsoon.

•MAXpreferstomovetoastateofmaximumvalue,
whereasMINprefersastateofminimumvalue.
•Theterminalnodesonthebottomlevelgettheirutility
valuesfromthegame’sUTILITYfunction
•ThefirstMINnode,labeledB,hasthreesuccessorstates
withvalues3,12,and8,soitsminimaxvalueis3

minimaxalgorithm
•Theminimaxalgorithmcomputesthe
minimaxdecisionfromthecurrentstate.
•Itusesasimplerecursivecomputationofthe
minimaxvaluesofeachsuccessorstate,
directlyimplementingthedefiningequations.
•Therecursionproceedsallthewaydownto
theleavesofthetree,andthentheminimax
valuesarebackedupthroughthetreeasthe
recursionunwinds.

•Theminimaxalgorithmperformsacompletedepth-
firstexplorationofthegametree.
•Ifthemaximumdepthofthetreeismandthereare
blegalmovesateachpoint,thenthetime
complexityoftheminimaxalgorithmisO(bm).
•ThespacecomplexityisO(bm)foranalgorithmthat
generatesallactionsatonce,orO(m)foran
algorithmthatgeneratesactionsoneatatime

Optimal decisions in multiplayer games

ALPHA–BETA PRUNING
•Alpha beta pruning is a modified version of the
minimaxalgorithm
•It is an optimization technique for the minimax
algorithm
•Alpha beta pruning is the pruning(cutting down) of
useless branches in decision trees
•Alpha-highest value
–Initial Value α=-∞
•Max player will only update the value of alpha
•Beta –lowest value
–Initial Value α=+∞
•Min player will only update the value of alpha
•Condition : α>=β

Monte-carlotreesearch
•MonteCarloTreeSearch(MCTS)isasearch
techniqueinthefieldofArtificialIntelligence(AI).It
isaprobabilisticandheuristicdrivensearch
algorithmthatcombinestheclassictreesearch
implementationsalongsidemachinelearning
principlesofreinforcementlearning.
•There’salwaysthepossibilitythatthecurrentbest
actionisactuallynotthemostoptimalaction.
•itcontinuestoevaluateotheralternatives
periodicallyduringthelearningphasebyexecuting
them.exploration-exploitationtrade-off

•Explorationhelpsinexploringanddiscoveringthe
unexploredpartsofthetree,whichcouldresultin
findingamoreoptimalpath.
•UseofMonteCarloTreeSearch
•HandlingComplexandStrategicGames(chess,
poker)
•UnknownorImperfectInformation(cardgames)
•LearningfromSimulations(estimatethevalueof
actionsorstates)

MCTS algorithm
Steps:
1.Selection
2.Expansion
3.Simulation
4.Back propagation

•Selection:Inthisprocess,theMCTSalgorithmtraverses
thecurrenttreefromtherootnodeusingaspecific
strategy.Thestrategyusesanevaluationfunctionto
optimallyselectnodeswiththehighestestimatedvalue.
•Itreturnsthemaximumvalue
•Expansion:In this process, a new child node is added to
the tree to that node which was optimally reached
during the selection process.

•Simulation:Inthisprocess,asimulationisperformed
bychoosingmovesorstrategiesuntilaresultor
predefinedstateisachieved.
•Backpropagation:Afterdeterminingthevalueofthe
newlyaddednode,theremainingtreemustbe
updated.So,thebackpropagationprocessis
performed,whereitbackpropagatesfromthenew
nodetotherootnode.

StochasticStrategy
•Astrategyforanagentisaprobabilitydistribution
overtheactionsforthisagent.Iftheagentisacting
deterministically,oneoftheprobabilitieswillbe1
andtherestwillbe0;thisiscalledapurestrategy.
•Iftheagentisnotfollowingapurestrategy,noneof
theprobabilitieswillbe1,andmorethanoneaction
willhaveanon-zeroprobability;thisiscalled
astochasticstrategy.
•Thesetofactionswithanon-zeroprobabilityina
strategyiscalledthesupportsetofthestrategy.

Stochasticgames
•Manyunpredictableexternaloccurrencescanplace
usinunforeseencircumstancesinreallife.
•Ex:dicetossing,havearandomelementtoreflect
thisunpredictability.
•Ex:Backgammonisaclassicgamethatmixesskilland
luck

•Agametreeinbackgammonmustincludechance
nodesinadditiontoMAXandMINnodes.
•Thebranchesleadingfromeachchancenodedenote
thepossibledicerolls;eachbranchislabeledwith
therollanditsprobability.
•Thereare36waystorolltwodice,eachequally
likely;butbecausea6–5isthesameasa5–6,there
areonly21distinctrolls.
•Thesixdoubles(1–1through6–6)eachhavea
probabilityof1/36,sowesayP(1–1)=1/36.The
other15distinctrollseachhavea1/18probability

•nextstepistounderstandhowtomakecorrect
decisions.
•wecanonlycalculatetheexpectedvalueofa
position:theaverageoverallpossibleoutcomesof
thechancenodes.
•Togeneralizetheminimaxvaluefordeterministic
gamestoanexpectiminimaxvalueforgameswith
chancenodes.
•TerminalnodesandMAXandMINnodes(forwhich
thedicerollisknown)workexactlythesamewayas
before.

Partiallyobservablegames
•Partialobservabilitymeansthatanagentdoesnot
knowthestateoftheworldorthattheagentsact
simultaneously.
•Partialobservabilityforthemultiagentcaseismore
complicatedthanthefullyobservablemultiagent
caseorthepartiallyobservablesingle-agentcase.
Thefollowingsimpleexamplesshowsome
importantissuesthatariseeveninthecaseoftwo
agents,eachwithafewchoices.

•Apartiallyobservablesystemisoneinwhichthe
entirestateofthesystemisnotfullyvisibletoan
externalsensor.
•Inapartiallyobservablesystemtheobservermay
utiliseamemorysysteminordertoaddinformation
totheobserver'sunderstandingofthesystem.
•Anexampleofapartiallyobservablesystemwould
beacardgameinwhichsomeofthecardsare
discardedintoapilefacedown.
•Inthiscasetheobserverisonlyabletoviewtheir
owncardsandpotentiallythoseofthedealer.

•Theyarenotabletoviewtheface-down(used)
cards,northecardswhichwillbedealtatsome
stageinthefuture.
•Amemorysystemcanbeusedtorememberthe
previouslydealtcardsthatarenowontheusedpile
(largecollectionarrangedoneoverother).
•Thisaddstothetotalsumofknowledgethatthe
observercanusetomakedecisions.

•Incontrast,afullyobservablesystemwouldbethat
ofchess.Inchess(apartfromthe'whoismoving
next'state)thefullstateofthesystemisobservable
atanypointintime.
•Partiallyobservableisatermusedinavarietyof
mathematicalsettings,includingthatofArtificial
IntelligenceandPartiallyobservableMarkovdecision
processes.

•Chesshasoftenbeendescribedaswarinminiature,
butitlacksatleastonemajorcharacteristicofreal
wars,namely,partialobservability.
•Inthe“fogofwar,”theexistenceanddispositionof
enemyunitsisoftenunknownuntilrevealedby
directcontact.
•Partiallyobservablegamessharethese
characteristicsandarethusqualitativelydifferent
fromotherobservablegames.

State-of-the-artGamePrograms
•State-of-the-artgameprogramsareblindinglyfast,
highlyoptimizedmachinesthatincorporatethe
latestengineeringadvances,buttheyaren’tmuch
usefordoingtheshoppingordrivingoff-road.
•Racingandgame-playinggenerateexcitementanda
steadystreamofinnovationsthathavebeenadopted
bythewidercommunity.
•Chessisatwo-playerstrategyboardgameplayed
onachessboard,acheckeredgameboard
with64squaresarrangedinan8×8grid.

•IBMcomputerDeepBluewasthefirstmachineto
overcomeareigningWorldChessChampionina
matchwhenitdefeatedGarryKasparovin1997
•Backgammon
•Go
•CHINOOK

Constraintsatisfactionproblems
•Aproblemissolvedwheneachvariablehasavalue
thatsatisfiesalltheconstraintsonthevariable.A
problemdescribedthiswayiscalledaconstraint
satisfactionproblem,orCSP.
•Mainideaistoeliminatelargeportionsofthesearch
spaceallatoncebyidentifyingvariable/value
combinationsthatviolatetheconstraints.

•Aconstraintsatisfactionproblemconsistsofthree
components,X,D,andC:
•Xisasetofvariables,{X1,...,Xn}.
•Disasetofdomains,{D1,...,Dn},oneforeach
variable.
•Cisasetofconstraintsthatspecifyallowable
combinationsofvalues.

•EachstateinaCSPisdefinedbyanassignmentof
valuestosomeorallofthevariables.
•{Xi=vi,Xj=vj,...}
•Anassignmentthatdoesnotviolateanyconstraints
iscalledaconsistentorlegalassignment
•Acompleteassignmentisoneinwhichevery
variableisassigned,andasolutiontoaCSPisa
consistent,completeassignment
•Apartialassignmentisonethatassignsvaluesto
onlysomeofthevariables.

•Example problem: Map coloring
VariablesWA, NT, Q, NSW, V, SA, T
DomainsD
i= {red,green,blue}
Constraints: adjacent regions must have different
colors
e.g., WA ≠NT, or (WA,NT) in
{(red,green),(red,blue),(green,red),
(green,blue),(blue,red),(blue,green)}

•Solutionsarecompleteandconsistent
assignments
•e.g.,WA=red,NT=green,Q=red,NSW=
green,V=red,SA=blue,T=green

Constraint graph
•Binary CSP:each constraint relates two variables
•Constraint graph:nodes are variables, arcs are
constraints

Variations on the CSP formalism
•The simplest kind of CSP involves variables that have
discrete and finite domains
1.Finite domains
•Map-coloring problems and scheduling with time
limits and 8-queens problem
2.Discrete domains
•discrete domain can be infinite, such as the set of
integers or strings.
•no longer possible to describe constraints

Typesofconstraints
•Unaryconstraintsinvolveasinglevariable,
–e.g.,SA≠green
•Binaryconstraintsinvolvepairsofvariables,
–e.g.,SA≠WA
•GlobalconstraintorHigher-orderconstraintsinvolve3or
morevariables,
–e.g.,cryptarithmeticcolumnconstraints
•Auxiliary variables representing the digit carried over into
the tens, hundreds, or thousands column. These
constraints can be represented in a constraint
hypergraphC10, C100, and C1000

CryptarithmeticExample
Variables:FTUWRO
Domains:{0,1,2,3,4,5,6,7,8,9}
Constraints:Alldiff(F,T,U,W,R,O)

Constraintpropagation
•analgorithmcansearch(chooseanewvariable
assignmentfromseveralpossibilities)ordoaspecific
typeofinferencecalledconstraintpropagation.
usingtheconstraintstoreducethenumberoflegal
valuesforavariable,whichinturncanreducethe
legalvaluesforanothervariable.
•Thekeyideaislocalconsistency

Different types of local consistency
1.Node consistency
2.Arc consistency
3.Path consistency
4.K-consistency
5.Global constraints

1.Nodeconsistency
•Asinglevariable(correspondingtoanodeintheCSP
network)isnode-consistentifallthevaluesinthe
variable’sdomainsatisfythevariable’sunary
constraints.
•Ex:Australiamap-coloringproblemwhereSouth
Australiansdislikegreen,thevariablewiththe
reduceddomain{red,blue}.
•Itisalwayspossibletoeliminatealltheunary
constraintsinaCSPbyrunningnodeconsistency.

2.Arcconsistency
•AvariableinaCSPisarc-consistentifeveryvaluein
itsdomainsatisfiesthevariable’sbinaryconstraints.
•Xiisarc-consistentwithrespecttoanothervariableXj
ifforeveryvalueinthecurrentdomainDithereis
somevalueinthedomainDjthatsatisfiesthebinary
constraintonthearc(Xi,Xj).
•(X,Y),{(0,0),(1,1),(2,4),(3,9))}

•TomakeXarc-consistentwithrespecttoY,we
reduceX’sdomainto{0,1,2,3}.IfwealsomakeY
arc-consistentwithrespecttoX,thenY’sdomain
becomes{0,1,4,9}andthewholeCSPisarc-
consistent.
•Themostpopularalgorithmforarcconsistencyis
calledAC-3
•AC-3algorithmmaintainsaqueueofarcs

•thequeuecontainsallthearcsintheCSP.AC-3then
popsoffanarbitraryarc(Xi,Xj)fromthequeueand
makesXiarc-consistentwithrespecttoXj.
•IfthisleavesDiunchanged,thealgorithmjustmoves
ontothenextarc.ButifthisrevisesDi(makesthe
domainsmaller),thenweaddtothequeueallarcs
(Xk,Xi)whereXkisaneighborofXi.
•IfDiisreviseddowntonothing,thenweknowthe
wholeCSPhasnoconsistentsolution,andAC-3can
immediatelyreturnfailure.Otherwise,wekeep
checking,tryingtoremovevaluesfromthedomains
ofvariablesuntilnomorearcsareinthequeue.

Pathconsistency
•Pathconsistencytightensthebinaryconstraintsby
usingimplicitconstraintsthatareinferredbylooking
attriplesofvariables.
•Atwo-variableset{Xi,Xj}ispath-consistentwith
respecttoathirdvariableXmif,foreveryassignment
{Xi=a,Xj=b}consistentwiththeconstraintson
{Xi,Xj},thereisanassignmenttoXmthatsatisfiesthe
constraintson{Xi,Xm}and{Xm,Xj}.Thisiscalledpath
consistencybecauseonecanthinkofitaslookingat
apathfromXitoXjwithXminthemiddle.

•4.K-consistency
•Strongerformsofpropagationcanbedefinedwith
thenotionofk-consistency.
•ACSPisk-consistentif,foranysetofk−1variables
andforanyconsistentassignmenttothosevariables,
aconsistentvaluecanalwaysbeassignedtoanykth
variable.

5.Globalconstraints
•Aglobalconstraintisoneinvolvinganarbitrary
numberofvariables
•Globalconstraintsoccurfrequentlyinrealproblems
andcanbehandledbyspecial-purposealgorithms
thataremoreefficientthanthegeneral-purpose
methods
•The Alldiffconstraint says that all the variables
involved must have distinct values
•Ex: cryptarithmeticproblem and Sudoku puzzles

•Sudoku example

BacktrackingsearchforCSP
•Variableassignmentsarecommutative,i.e.,
[WA=redthenNT=green]sameas[NT=green
thenWA=red]
•=>Onlyneedtoconsiderassignmentstoasingle
variableateachnode
•Depth-firstsearchforCSPswithsingle-variable
assignmentsiscalledbacktrackingsearch
•Cansolven-queensforn≈25

Backtracking example

Improving backtracking efficiency
•General-purposemethods can give huge gains
in speed:
–Which variable should be assigned next?
–In what order should its values be tried?
–Can we detect inevitable failure early?

Most constrained variable
•Most constrained variable:
choose the variable with the fewest legal values
•a.k.a. minimum remaining values (MRV)
heuristic

Most constraining variable
•A good idea is to use it as a tie-breaker among
most constrained variables
•Most constraining variable:
–choose the variable with the most constraints on
remaining variables
–

Least constraining value
•Given a variable to assign, choose the least
constraining value:
–the one that rules out the fewest values in the
remaining variables
–
•Combining these heuristics makes 1000
queens feasible

Forward checking
•Idea:
–Keep track of remaining legal values for unassigned variables
–Terminate search when any variable has no legal values
–

LocalsearchforCSP
•theinitialstateassignsavaluetoeveryvariable,and
thesearchchangesthevalueofonevariableata
time.Forexample,inthe8-queensproblemor4
queensproblem
•eachstepmovesasinglequeentoanewpositionin
itscolumn
•new value for a variable, the most obvious heuristic
is to select the value that results in the minimum
number of conflicts with other variables—the min-
conflicts

4 queens problem
•States:4queensin4columns(4
4
=256states)
•Actions:movequeenincolumn
•Goaltest:noattacks
•Evaluation:h(n)=numberofattacks

•Structure of CSP

1.ChooseasubsetSoftheCSP’svariablessuchthat
theconstraintgraphbecomesatreeafterremovalof
S.Siscalledacyclecutset.
2.ForeachpossibleassignmenttothevariablesinS
thatsatisfiesallconstraintsonS,
(a)removefromthedomainsoftheremaining
variablesanyvaluesthatareinconsistentwiththe
assignmentforS,and
(b)IftheremainingCSPhasasolution,returnit
togetherwiththeassignmentforS.

Tree decomposition
Everyvariableinoriginalproblemmustappear
inatleastonesubproblem
Iftwovariablesareconnectedintheoriginal
problem,theymustappeartogether(along
withtheconstraint)inatleastonesubproblem
Ifavariableoccursintwosubproblemsinthe
tree,itmustappearineverysubproblemon
thepaththatconnectsthetwo

GAME THEORY AND MONTE CARLO SEARCH SPACE TREE

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

GAME THEORY AND MONTE CARLO SEARCH SPACE TREE

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 10

Slide 11

Slide 12

Slide 14

Slide 16

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 25

Slide 26

Slide 28

Slide 30

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 76

Slide 77

Slide 80

Slide 81

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

DTI BPI Pivot Small Business - BUSINESS START UP PLAN

CATHOLIC EDUCATIONAL Corporate Responsibilities

Karin Schaupp – Evocation; lançamento: 2000

Pillars of Biblical Oneness in the Book of Acts

7-10. STP + Branding and Product &amp; Services Strategies.pptx

7-10. STP + Branding and Product & Services Strategies.pptx