GAME THEORY AND MONTE CARLO SEARCH SPACE TREE

GOKULKANNANMMECLECTC 239 views 81 slides Jun 20, 2024
Slide 1
Slide 1 of 81
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81

About This Presentation

BASIC CONCEPTS OF GAME THEORY


Slide Content

UNIT-III
GAME PLAYING AND CSP

TOPICS
•Game theory
•Optimal decisions in games
•Alpha-beta search
•Monte-carlotree search
•Stochastic games
•Partially observable games
•Constraint satisfaction problems
•Constraint propagation
•Backtracking search for CSP
•Local search for CSP
•Structure of CSP

Game theory
•GamePlayingisanimportantdomainofartificial
intelligence.Gamesdon’trequiremuchknowledge;
theonlyknowledgeweneedtoprovideistherules,
legalmovesandtheconditionsofwinningorlosing
thegame.
•Gametheory,branchofappliedmathematicsthat
providestoolsforanalyzingsituationsinwhich
parties,calledplayers,makedecisionsthatare
interdependent.Thisinterdependencecauseseach
playertoconsidertheotherplayer’spossible
decisions,orstrategies,informulatingstrategy.
•Competitiveenvironments,inwhichtheagent’s
goalsareinconflict,giverisetoadversarialsearch
problems–oftenknownasgames.

Formal Definition of Game
•Consider games with two players, whom we will call
MAX and MIN. MAX moves first, and then they take
turns moving until the game is over. At the end of the
game, points are awarded to the winning player and
penalties are given to the loser.
•A game can be formally defined as a kind of search
problem with the following elements:

•S0:Theinitialstate,whichspecifieshowthegameisset
upatthestart.
•PLAYER(s):Defineswhichplayerhasthemoveinastate.
•ACTIONS(s):Returnsthesetoflegalmovesinastate.
•RESULT(s,a):Thetransitionmodel,whichdefinesthe
resultofamove.
•Terminaltest(s):whichistruewhenthegameisoverand
falseotherwise.Stateswherethegamehasendedare
calledterminalstates
•UTILITY(s,p):Autilityfunction(alsocalledanobjective
functionorpayofffunction),definesthefinalnumeric
valueforagamethatendsinterminalstatesforaplayer
p.

•Theinitialstate,ACTIONSfunction,andRESULTfunction
definethegametreeforthegame—atreewherethe
nodesaregamestatesandtheedgesaremoves

Optimal decisions in games
•Innormalsearchproblem,theoptimalsolution
wouldbeasequenceofmoveleadingtoagoalstate
–aterminalstatethatisawin.
•MINhassomethingtosayaboutit,MAXtherefore
mustfindacontingentstrategy,whichspecifies
MAX’smoveintheinitialstate,
•ThenMAX’smovesinthestatesresultingfromevery
possibleresponsebyMINandsoon

•AND–ORsearchalgorithm
•MAXplayingtheroleofORandMINequivalentto
AND.
•anoptimalstrategyleadstooutcomesatleastas
goodasanyotherstrategywhenoneisplayingan
infallibleopponent.
•ThepossiblemovesforMAXattherootnodeare
labeleda1,a2,anda3.Thepossiblerepliestoa1for
MINareb1,b2,b3,andsoon.

•MAXpreferstomovetoastateofmaximumvalue,
whereasMINprefersastateofminimumvalue.
•Theterminalnodesonthebottomlevelgettheirutility
valuesfromthegame’sUTILITYfunction
•ThefirstMINnode,labeledB,hasthreesuccessorstates
withvalues3,12,and8,soitsminimaxvalueis3

minimaxalgorithm
•Theminimaxalgorithmcomputesthe
minimaxdecisionfromthecurrentstate.
•Itusesasimplerecursivecomputationofthe
minimaxvaluesofeachsuccessorstate,
directlyimplementingthedefiningequations.
•Therecursionproceedsallthewaydownto
theleavesofthetree,andthentheminimax
valuesarebackedupthroughthetreeasthe
recursionunwinds.

•Theminimaxalgorithmperformsacompletedepth-
firstexplorationofthegametree.
•Ifthemaximumdepthofthetreeismandthereare
blegalmovesateachpoint,thenthetime
complexityoftheminimaxalgorithmisO(bm).
•ThespacecomplexityisO(bm)foranalgorithmthat
generatesallactionsatonce,orO(m)foran
algorithmthatgeneratesactionsoneatatime

Optimal decisions in multiplayer games

ALPHA–BETA PRUNING
•Alpha beta pruning is a modified version of the
minimaxalgorithm
•It is an optimization technique for the minimax
algorithm
•Alpha beta pruning is the pruning(cutting down) of
useless branches in decision trees
•Alpha-highest value
–Initial Value α=-∞
•Max player will only update the value of alpha
•Beta –lowest value
–Initial Value α=+∞
•Min player will only update the value of alpha
•Condition : α>=β

Monte-carlotreesearch
•MonteCarloTreeSearch(MCTS)isasearch
techniqueinthefieldofArtificialIntelligence(AI).It
isaprobabilisticandheuristicdrivensearch
algorithmthatcombinestheclassictreesearch
implementationsalongsidemachinelearning
principlesofreinforcementlearning.
•There’salwaysthepossibilitythatthecurrentbest
actionisactuallynotthemostoptimalaction.
•itcontinuestoevaluateotheralternatives
periodicallyduringthelearningphasebyexecuting
them.exploration-exploitationtrade-off

•Explorationhelpsinexploringanddiscoveringthe
unexploredpartsofthetree,whichcouldresultin
findingamoreoptimalpath.
•UseofMonteCarloTreeSearch
•HandlingComplexandStrategicGames(chess,
poker)
•UnknownorImperfectInformation(cardgames)
•LearningfromSimulations(estimatethevalueof
actionsorstates)

MCTS algorithm
Steps:
1.Selection
2.Expansion
3.Simulation
4.Back propagation

•Selection:Inthisprocess,theMCTSalgorithmtraverses
thecurrenttreefromtherootnodeusingaspecific
strategy.Thestrategyusesanevaluationfunctionto
optimallyselectnodeswiththehighestestimatedvalue.
•Itreturnsthemaximumvalue
•Expansion:In this process, a new child node is added to
the tree to that node which was optimally reached
during the selection process.

•Simulation:Inthisprocess,asimulationisperformed
bychoosingmovesorstrategiesuntilaresultor
predefinedstateisachieved.
•Backpropagation:Afterdeterminingthevalueofthe
newlyaddednode,theremainingtreemustbe
updated.So,thebackpropagationprocessis
performed,whereitbackpropagatesfromthenew
nodetotherootnode.

StochasticStrategy
•Astrategyforanagentisaprobabilitydistribution
overtheactionsforthisagent.Iftheagentisacting
deterministically,oneoftheprobabilitieswillbe1
andtherestwillbe0;thisiscalledapurestrategy.
•Iftheagentisnotfollowingapurestrategy,noneof
theprobabilitieswillbe1,andmorethanoneaction
willhaveanon-zeroprobability;thisiscalled
astochasticstrategy.
•Thesetofactionswithanon-zeroprobabilityina
strategyiscalledthesupportsetofthestrategy.

Stochasticgames
•Manyunpredictableexternaloccurrencescanplace
usinunforeseencircumstancesinreallife.
•Ex:dicetossing,havearandomelementtoreflect
thisunpredictability.
•Ex:Backgammonisaclassicgamethatmixesskilland
luck

•Agametreeinbackgammonmustincludechance
nodesinadditiontoMAXandMINnodes.
•Thebranchesleadingfromeachchancenodedenote
thepossibledicerolls;eachbranchislabeledwith
therollanditsprobability.
•Thereare36waystorolltwodice,eachequally
likely;butbecausea6–5isthesameasa5–6,there
areonly21distinctrolls.
•Thesixdoubles(1–1through6–6)eachhavea
probabilityof1/36,sowesayP(1–1)=1/36.The
other15distinctrollseachhavea1/18probability

•nextstepistounderstandhowtomakecorrect
decisions.
•wecanonlycalculatetheexpectedvalueofa
position:theaverageoverallpossibleoutcomesof
thechancenodes.
•Togeneralizetheminimaxvaluefordeterministic
gamestoanexpectiminimaxvalueforgameswith
chancenodes.
•TerminalnodesandMAXandMINnodes(forwhich
thedicerollisknown)workexactlythesamewayas
before.

Partiallyobservablegames
•Partialobservabilitymeansthatanagentdoesnot
knowthestateoftheworldorthattheagentsact
simultaneously.
•Partialobservabilityforthemultiagentcaseismore
complicatedthanthefullyobservablemultiagent
caseorthepartiallyobservablesingle-agentcase.
Thefollowingsimpleexamplesshowsome
importantissuesthatariseeveninthecaseoftwo
agents,eachwithafewchoices.

•Apartiallyobservablesystemisoneinwhichthe
entirestateofthesystemisnotfullyvisibletoan
externalsensor.
•Inapartiallyobservablesystemtheobservermay
utiliseamemorysysteminordertoaddinformation
totheobserver'sunderstandingofthesystem.
•Anexampleofapartiallyobservablesystemwould
beacardgameinwhichsomeofthecardsare
discardedintoapilefacedown.
•Inthiscasetheobserverisonlyabletoviewtheir
owncardsandpotentiallythoseofthedealer.

•Theyarenotabletoviewtheface-down(used)
cards,northecardswhichwillbedealtatsome
stageinthefuture.
•Amemorysystemcanbeusedtorememberthe
previouslydealtcardsthatarenowontheusedpile
(largecollectionarrangedoneoverother).
•Thisaddstothetotalsumofknowledgethatthe
observercanusetomakedecisions.

•Incontrast,afullyobservablesystemwouldbethat
ofchess.Inchess(apartfromthe'whoismoving
next'state)thefullstateofthesystemisobservable
atanypointintime.
•Partiallyobservableisatermusedinavarietyof
mathematicalsettings,includingthatofArtificial
IntelligenceandPartiallyobservableMarkovdecision
processes.

•Chesshasoftenbeendescribedaswarinminiature,
butitlacksatleastonemajorcharacteristicofreal
wars,namely,partialobservability.
•Inthe“fogofwar,”theexistenceanddispositionof
enemyunitsisoftenunknownuntilrevealedby
directcontact.
•Partiallyobservablegamessharethese
characteristicsandarethusqualitativelydifferent
fromotherobservablegames.

State-of-the-artGamePrograms
•State-of-the-artgameprogramsareblindinglyfast,
highlyoptimizedmachinesthatincorporatethe
latestengineeringadvances,buttheyaren’tmuch
usefordoingtheshoppingordrivingoff-road.
•Racingandgame-playinggenerateexcitementanda
steadystreamofinnovationsthathavebeenadopted
bythewidercommunity.
•Chessisatwo-playerstrategyboardgameplayed
onachessboard,acheckeredgameboard
with64squaresarrangedinan8×8grid.

•IBMcomputerDeepBluewasthefirstmachineto
overcomeareigningWorldChessChampionina
matchwhenitdefeatedGarryKasparovin1997
•Backgammon
•Go
•CHINOOK

Constraintsatisfactionproblems
•Aproblemissolvedwheneachvariablehasavalue
thatsatisfiesalltheconstraintsonthevariable.A
problemdescribedthiswayiscalledaconstraint
satisfactionproblem,orCSP.
•Mainideaistoeliminatelargeportionsofthesearch
spaceallatoncebyidentifyingvariable/value
combinationsthatviolatetheconstraints.

•Aconstraintsatisfactionproblemconsistsofthree
components,X,D,andC:
•Xisasetofvariables,{X1,...,Xn}.
•Disasetofdomains,{D1,...,Dn},oneforeach
variable.
•Cisasetofconstraintsthatspecifyallowable
combinationsofvalues.

•EachstateinaCSPisdefinedbyanassignmentof
valuestosomeorallofthevariables.
•{Xi=vi,Xj=vj,...}
•Anassignmentthatdoesnotviolateanyconstraints
iscalledaconsistentorlegalassignment
•Acompleteassignmentisoneinwhichevery
variableisassigned,andasolutiontoaCSPisa
consistent,completeassignment
•Apartialassignmentisonethatassignsvaluesto
onlysomeofthevariables.

•Example problem: Map coloring
VariablesWA, NT, Q, NSW, V, SA, T
DomainsD
i= {red,green,blue}
Constraints: adjacent regions must have different
colors
e.g., WA ≠NT, or (WA,NT) in
{(red,green),(red,blue),(green,red),
(green,blue),(blue,red),(blue,green)}

•Solutionsarecompleteandconsistent
assignments
•e.g.,WA=red,NT=green,Q=red,NSW=
green,V=red,SA=blue,T=green

Constraint graph
•Binary CSP:each constraint relates two variables
•Constraint graph:nodes are variables, arcs are
constraints

Variations on the CSP formalism
•The simplest kind of CSP involves variables that have
discrete and finite domains
1.Finite domains
•Map-coloring problems and scheduling with time
limits and 8-queens problem
2.Discrete domains
•discrete domain can be infinite, such as the set of
integers or strings.
•no longer possible to describe constraints

Typesofconstraints
•Unaryconstraintsinvolveasinglevariable,
–e.g.,SA≠green
•Binaryconstraintsinvolvepairsofvariables,
–e.g.,SA≠WA
•GlobalconstraintorHigher-orderconstraintsinvolve3or
morevariables,
–e.g.,cryptarithmeticcolumnconstraints
•Auxiliary variables representing the digit carried over into
the tens, hundreds, or thousands column. These
constraints can be represented in a constraint
hypergraphC10, C100, and C1000

CryptarithmeticExample
Variables:FTUWRO
Domains:{0,1,2,3,4,5,6,7,8,9}
Constraints:Alldiff(F,T,U,W,R,O)

Constraintpropagation
•analgorithmcansearch(chooseanewvariable
assignmentfromseveralpossibilities)ordoaspecific
typeofinferencecalledconstraintpropagation.
usingtheconstraintstoreducethenumberoflegal
valuesforavariable,whichinturncanreducethe
legalvaluesforanothervariable.
•Thekeyideaislocalconsistency

Different types of local consistency
1.Node consistency
2.Arc consistency
3.Path consistency
4.K-consistency
5.Global constraints

1.Nodeconsistency
•Asinglevariable(correspondingtoanodeintheCSP
network)isnode-consistentifallthevaluesinthe
variable’sdomainsatisfythevariable’sunary
constraints.
•Ex:Australiamap-coloringproblemwhereSouth
Australiansdislikegreen,thevariablewiththe
reduceddomain{red,blue}.
•Itisalwayspossibletoeliminatealltheunary
constraintsinaCSPbyrunningnodeconsistency.

2.Arcconsistency
•AvariableinaCSPisarc-consistentifeveryvaluein
itsdomainsatisfiesthevariable’sbinaryconstraints.
•Xiisarc-consistentwithrespecttoanothervariableXj
ifforeveryvalueinthecurrentdomainDithereis
somevalueinthedomainDjthatsatisfiesthebinary
constraintonthearc(Xi,Xj).
•(X,Y),{(0,0),(1,1),(2,4),(3,9))}

•TomakeXarc-consistentwithrespecttoY,we
reduceX’sdomainto{0,1,2,3}.IfwealsomakeY
arc-consistentwithrespecttoX,thenY’sdomain
becomes{0,1,4,9}andthewholeCSPisarc-
consistent.
•Themostpopularalgorithmforarcconsistencyis
calledAC-3
•AC-3algorithmmaintainsaqueueofarcs

•thequeuecontainsallthearcsintheCSP.AC-3then
popsoffanarbitraryarc(Xi,Xj)fromthequeueand
makesXiarc-consistentwithrespecttoXj.
•IfthisleavesDiunchanged,thealgorithmjustmoves
ontothenextarc.ButifthisrevisesDi(makesthe
domainsmaller),thenweaddtothequeueallarcs
(Xk,Xi)whereXkisaneighborofXi.
•IfDiisreviseddowntonothing,thenweknowthe
wholeCSPhasnoconsistentsolution,andAC-3can
immediatelyreturnfailure.Otherwise,wekeep
checking,tryingtoremovevaluesfromthedomains
ofvariablesuntilnomorearcsareinthequeue.

Pathconsistency
•Pathconsistencytightensthebinaryconstraintsby
usingimplicitconstraintsthatareinferredbylooking
attriplesofvariables.
•Atwo-variableset{Xi,Xj}ispath-consistentwith
respecttoathirdvariableXmif,foreveryassignment
{Xi=a,Xj=b}consistentwiththeconstraintson
{Xi,Xj},thereisanassignmenttoXmthatsatisfiesthe
constraintson{Xi,Xm}and{Xm,Xj}.Thisiscalledpath
consistencybecauseonecanthinkofitaslookingat
apathfromXitoXjwithXminthemiddle.

•4.K-consistency
•Strongerformsofpropagationcanbedefinedwith
thenotionofk-consistency.
•ACSPisk-consistentif,foranysetofk−1variables
andforanyconsistentassignmenttothosevariables,
aconsistentvaluecanalwaysbeassignedtoanykth
variable.

5.Globalconstraints
•Aglobalconstraintisoneinvolvinganarbitrary
numberofvariables
•Globalconstraintsoccurfrequentlyinrealproblems
andcanbehandledbyspecial-purposealgorithms
thataremoreefficientthanthegeneral-purpose
methods
•The Alldiffconstraint says that all the variables
involved must have distinct values
•Ex: cryptarithmeticproblem and Sudoku puzzles

•Sudoku example

BacktrackingsearchforCSP
•Variableassignmentsarecommutative,i.e.,
[WA=redthenNT=green]sameas[NT=green
thenWA=red]
•=>Onlyneedtoconsiderassignmentstoasingle
variableateachnode
•Depth-firstsearchforCSPswithsingle-variable
assignmentsiscalledbacktrackingsearch
•Cansolven-queensforn≈25

Backtracking example

Backtracking example

Backtracking example

Backtracking example

Improving backtracking efficiency
•General-purposemethods can give huge gains
in speed:
–Which variable should be assigned next?
–In what order should its values be tried?
–Can we detect inevitable failure early?

Most constrained variable
•Most constrained variable:
choose the variable with the fewest legal values
•a.k.a. minimum remaining values (MRV)
heuristic

Most constraining variable
•A good idea is to use it as a tie-breaker among
most constrained variables
•Most constraining variable:
–choose the variable with the most constraints on
remaining variables

Least constraining value
•Given a variable to assign, choose the least
constraining value:
–the one that rules out the fewest values in the
remaining variables

•Combining these heuristics makes 1000
queens feasible

Forward checking
•Idea:
–Keep track of remaining legal values for unassigned variables
–Terminate search when any variable has no legal values

Forward checking
•Idea:
–Keep track of remaining legal values for unassigned variables
–Terminate search when any variable has no legal values

Forward checking
•Idea:
–Keep track of remaining legal values for unassigned variables
–Terminate search when any variable has no legal values

Forward checking
•Idea:
–Keep track of remaining legal values for unassigned variables
–Terminate search when any variable has no legal values

LocalsearchforCSP
•theinitialstateassignsavaluetoeveryvariable,and
thesearchchangesthevalueofonevariableata
time.Forexample,inthe8-queensproblemor4
queensproblem
•eachstepmovesasinglequeentoanewpositionin
itscolumn
•new value for a variable, the most obvious heuristic
is to select the value that results in the minimum
number of conflicts with other variables—the min-
conflicts

4 queens problem
•States:4queensin4columns(4
4
=256states)
•Actions:movequeenincolumn
•Goaltest:noattacks
•Evaluation:h(n)=numberofattacks

•Structure of CSP

1.ChooseasubsetSoftheCSP’svariablessuchthat
theconstraintgraphbecomesatreeafterremovalof
S.Siscalledacyclecutset.
2.ForeachpossibleassignmenttothevariablesinS
thatsatisfiesallconstraintsonS,
(a)removefromthedomainsoftheremaining
variablesanyvaluesthatareinconsistentwiththe
assignmentforS,and
(b)IftheremainingCSPhasasolution,returnit
togetherwiththeassignmentforS.

Tree decomposition
Everyvariableinoriginalproblemmustappear
inatleastonesubproblem
Iftwovariablesareconnectedintheoriginal
problem,theymustappeartogether(along
withtheconstraint)inatleastonesubproblem
Ifavariableoccursintwosubproblemsinthe
tree,itmustappearineverysubproblemon
thepaththatconnectsthetwo
Tags