ARTIFICIAL INTELLIGENCE ARTICIAL INTELLIGENCE: Adversarial Search engr dr e. m. martey LECTURE THREE
Adversarial Search Adversarial Search in AI is a type of search in which one can trace the movement of an enemy or opponent. Such searches are useful in chess, business strategy tools, trade platforms and war-based games using AI agents. The user can change the current state but has no control over the future stage in an adversarial search. The opponent has control over the next state which is unexpected. There may be instances where more than one agent is searching for a solution in the same search space which is common in gameplay. Games are treated as a Search problem and a heuristic evaluation function which are the two key components that help in the modeling and solving of games in AI.
Game Playing
Why Study Game Playing?
Different Game Scenarios using Adversarial Search
Different Game Scenarios using Adversarial Search
Different Game Scenarios using Adversarial Search
Zero-Sum Games Focus primarily on “adversarial games” Two-player, zero-sum games As Player 1 gains strength Player 2 loses strength and vice versa The sum of the two strengths is always 0.
Zero-Sum Games 2-person game Players alternate moves Zero-sum: one player’s loss is the other’s gain Perfect information: both players have access to complete information about the state of the game. No information is hidden from either player. No chance (e.g., using dice) involved Examples: Tic-Tac-Toe, Checkers, Chess, Go, Nim, Othello
Using Search Search could be used to find a perfect sequence of moves except the following problems arise: There exists an adversary who is trying to minimize your chance of winning every other move You cannot control his/her move Search trees can be very large, but you have finite time to move Chess has 10 40 nodes in search space With single-agent search, can afford to wait Some two-player games have time limits Solution? Search to n levels in the tree (n ply ) Evaluate the nodes at the nth level Head for the best looking node
Search Applied to Adversarial Games Initial state Current board position (description of current game state) Operators Legal moves a player can make Terminal nodes Leaf nodes in the tree Indicate the game is over Utility function Objective function or Payoff function Value of the outcome of a game Example: tic tac toe, utility is -1, 0, or 1
Game Tree
Optimal Decision Games A normal search problem -The optimal solution is a sequence of actions leading to a goal state An adversarial search problem -MIN interferes the sequence of action -Strategy for MAX Specify moves in the initial states Observe every possible response by MIN Specify moves in response and so on The optimal strategy can be determined from the minimax value of each node. The minimax value is the magic function that tells you the state (good or bad)
Terminology The initial state is the first layer that defines that the board is blank it’s MAX’s turn to play. Successor function lists all the possible successor moves. It is defined for all the layers in the tree. Terminal State is the last layer of the tree that shows the final state, i.e whether the player MAX wins, loses, or ties with the opponent. Utilities in this case for the terminal states are 1, 0, and -1 as discussed earlier, and they can be used to determine the utilities of the other nodes as well.
Game Trees Tic tac toe Two players, MAX and MIN Moves (and levels) alternate between two players
Minimax Properties Complete if tree is finite Optimal if play against opponent with same strategy (utility function) Time complexity is O(b m ) Space complexity is O(bm) (depth-first exploration) If we have 100 seconds to make a move Can explore 10 4 nodes/second Can consider 10 6 nodes / move Standard approach is Apply a cutoff test (depth limit, quiescence) Evaluate nodes at cutoff (evaluation function estimates desirability of position)
How does the minimax algorithm work?
Let us calculate the utility for the left node(red) of the layer above the terminal. we have to evaluate MIN{3, 5, 10}, which we know is certainly 3. So the utility for the red node is 3.
Similarly, for the green node in the same layer, we will have to evaluate MIN{2,2} which is 2.
To summarize, Minimax Decision = MAX{MIN{3,5,10},MIN{2,2}} = MAX{3,2} = 3
Minimax Algorithm Search the tree to the end Assign utility values to terminal nodes Find the best move for MAX (on MAX’s turn), assuming: MAX will make the move that maximizes MAX’s utility MIN will make the move that minimizes MAX’s utility Here, MAX should make the leftmost move Minimax applet
Optimization
Alpha-beta pruning It is a standard minimax algorithm, it returns the same move as the standard one, but it removes (prunes) all the nodes that are possibly not affecting the final decision.
Suppose, we have the following game tree: In this case, Minimax Decision = MAX{MIN{3,5,10}, MIN{2,a,b}, MIN{2,7,3}} = MAX{3,c,2} = 3
Working of Alpha-Beta Pruning: Step 1: At the first step the, Max player will start first move from node A where α= -∞ and β= +∞, these value of alpha and beta passed down to node B where again α= -∞ and β= +∞, and Node B passes the same value to its child D.
Working of Alpha-Beta Pruning: Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D and node value will also 3.
Working of Alpha-Beta Pruning: Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min (∞, 3) = 3, hence at node B now α= -∞, and β= 3.
Working of Alpha-Beta Pruning: In this next step, algorithm traverse the next successor of Node B which is node E, and the values of α= -∞, and β= 3 will also be passed. Step 4: At node E, Max will take its turn, and the value of alpha will change. The current value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3, where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it, and the value at node E will be 5.
Working of Alpha-Beta Pruning: Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node A, the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3, and β= +∞, these two values now passes to right successor of A which is Node C. At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Working of Alpha-Beta Pruning: Step 6: At node F, again the value of α will be compared with left child which is 0, and max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α remains 3, but the node value of F will become 1.
Working of Alpha-Beta Pruning: Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, and again it satisfies the condition α>=β, so the next child of C which is G will be pruned, and the algorithm will not compute the entire sub-tree G.
Working of Alpha-Beta Pruning: Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3. Following is the final game tree which is the showing the nodes which are computed and nodes which has never computed. Hence the optimal value for the maximizer is 3 for this example.
Pseudocode
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Bad and Good Cases for Alpha-Beta Pruning Bad: Worst moves encountered first Good: Good moves ordered first If we can order moves, we can get more benefit from alpha-beta pruning 4 MAX +----------------+----------------+ 2 3 4 MIN +----+----+ +----+----+ +----+----+ 6 4 2 7 5 3 8 6 4 MAX +--+ +--+ +--+ +-+-+ +--+ +--+ +--+ +--+ +--+--+ 6 5 4 3 2 1 1 3 7 4 5 2 3 8 2 1 6 1 2 4 4 MAX +----------------+----------------+ 4 3 2 MIN +----+----+ +----+----+ +----+----+ 4 6 8 3 x x 2 x x MAX +--+ +--+ +--+ +--+ +-+-+ 4 2 6 x 8 x 3 2 1 2 1
Alpha Beta Properties
Problems with a fixed ply: The Horizon Effect Inevitable losses are postponed Unachievable goals appear achievable Short-term gains mask unavoidable consequences (traps) Lose queen Lose pawn Lose queen!!! The “look ahead horizon”
Solutions How to counter the horizon effect Feedover Do not cut off search at non-quiescent board positions (dynamic positions) Example, king in danger Keep searching down that path until reach quiescent (stable) nodes Secondary Search Search further down selected path to ensure this is the best move Progressive Deepening Search one ply, then two ply, etc., until run out of time Similar to IDS
Non-Deterministic Games Uncertain outcomes controlled by chance, not an adversary
Game trees with chance nodes Chance nodes (shown as circles) represent random events For a random event with N outcomes, each chance node has N distinct children; a probability is associated with each (For 2 dice, there are 21 distinct outcomes) Use minimax to compute values for MAX and MIN nodes Use expected values for chance nodes For chance nodes over a max node, as in C: expectimax (C) = ∑ i (P(d i ) * maxvalue ( i )) For chance nodes over a min node: expectimin (C) = ∑ i (P(d i ) * minvalue ( i )) Max Rolls Min Rolls
Expectimax Search The Expectimax search algorithm is a game theory algorithm used to maximize the expected utility. It is a variation of the Minimax algorithm. While Minimax assumes that the adversary(the minimizer) plays optimally, the Expectimax doesn’t. This is useful for modelling environments where adversary agents are not optimal, or their actions are based on chance.
Algorithm of Expectimax Search Step 1: If the current call is a maximizer node, return the maximum of the state values of the nodes successors. Step 2: If the current call is a chance node, then return the average of the state values of the nodes successors(assuming all nodes have equal probability). If different nodes have different probabilities the expected utility from there is given by ∑ ixipi Step 3: We call the function recursively until we reach a terminal node(the state with no successors). Then return the utility for that state.
Example of Expectimax Search
Nondeterministic Game Algorithm Just like Minimax except also handle chance nodes Compute ExpectMinimaxValue of successors If n is terminal node, then ExpectMinimaxValue (n) = Utility(n) If n is a Max node, then ExpectMinimaxValue (n) = max s Successors (n) ExpectMinimaxValue (s) If n is a Min node, then ExpectMinimaxValue (n) = min s Successors (n) ExpectMinimaxValue (s) If n is a chance node, then ExpectMinimaxValue (n) = s Successors (n) P(s) * ExpectMinimaxValue (s)
Summary Adversarial games can be classified as having perfect or imperfect information. Every player in a game with perfect information is fully aware of the game's present condition; yet, in a game with imperfect information, certain information is kept hidden from the players Search strategies such as minimax and alpha-beta pruning are used in adversarial games to discover the best move. These algorithms aid in choosing the best move for a player by weighing the possible outcomes of each move Since the size of the game tree, it may not always be able to search it thoroughly. In these cases, heuristics are used to approximate the optimal move. Heuristics are shortcuts that allow the algorithm to quickly evaluate the possibility of a move without having to look through the entire game tree
Status of AI Game Players Tic Tac Toe Tied for best player in world Othello Computer better than any human Human champions now refuse to play computer Scrabble Maven beat world champions Joel Sherman and Matt Graham Backgammon 1992, Tesauro combines 3-ply search & neural networks (with 160 hidden units) yielding top-3 player Bridge Gib ranked among top players in the world Poker Pokie plays at strong intermediate level Checkers 1994, Chinook ended 40-year reign of human champion Marion Tinsley Chess 1997, Deep Blue beat human champion Gary Kasparov in six-game match Deep Blue searches 200M positions/second, up to 40 ply Now looking at other applications (molecular dynamics, drug synthesis) Go 2008, MoGo running on 25 nodes (800 cores) beat Myungwan Kim $2M prize available for first computer program to defeat a top player