KR-IST - Lecture 5a Game playing with Minimax and Pruning

Chris Thornton

Introduction

An important application of AI search methods has been in the domain of 2-person games, such as draughts (checkers) and chess.

Until quite recently (late 1990s) it was widely believed by many that hard problems of intelligence would never be solved by computer.

Chess was often put forward as a good example.

Then, in May 1997, an IBM machine known as `Deep Blue' defeated chess grandmaster Garry Kasparov.

No special techniques were used to achieve the victory. Deep Blue relied on tried and trusted methods.

The version of Deep Blue which beat Kasparov was able to evalute more than 200 million chess states per second.

Kasparov and Deep Blue

Deep Blue

Recordings

Adapting search for game playing

Deep Blue used ordinary search methods. and the standard approach for adapting those methods to the problem of game-play.

Games like chess can readily be seen in terms of transitions between states. Transitions are moves; states are board configurations.

Normally, we would then solve the problem by searching for a path of transitions (i.e., moves) connecting the start state with a goal state.

Unfortunately, in this context, we `lose' control over the choice of move every other turn.

Using search for evaluation

In a 2-person game, a solution path is unobtainable because we never know what the other player is going to do at any stage.

What we need to work out is the best move.

In the minimax method we use the search process not to find a solution path, but to derive the most accurate evaluation of the possible moves, i.e., an evaluation which takes into account the implications that any given move will have later in the game.

Minimax method

There are three elements to the minimax method.

Expand the search tree all the way down to a game conclusion (win, lose or draw). If this is too much search, choose a suitable cutoff.
Obtain an evaluation of the relevant terminal state. (e.g., positive for a win, negative for a lose and neutral for a draw). This is known as the static evaluation.
Then back-up the evaluations, level by level, working on the basis that when it is the opponent's turn, they will chose a transition which achieves the worst outcome from our point of view, and whenever it is our turn to move, we will choose the best.

To do this we need to identify the minimum evaluation in any level of the tree corresponding to the opponent's move, and the maximum otherwise.

Hence the `minimax'.

Worked example

Cont.

Evaluation obtained

Negmax simplification

Implementing minimax can be a pain because of the need to alternate between minimisation and maximisation in the backing-up of evaluations.

The negmax idea gets around this problem.

Board states are still evaluated from the `current' player's point of view (i.e., whichever player has control at the given depth). but the value which is backed-up is always the negative of the maximum.

As in minimax, the effect is to ensure that the value backed-up is the value of the worst outcome that the opponent can achieve from our point of view.

But the code to implement the method can be written using a simple recursive procedure.

Negmax illustration

Alpha-beta pruning

When using minimax (or negmax), situations can arise when search of a particular branch can be safely terminated.

Applying an alpha-cutoff means we stop search of a particular branch because we see that we already have a better opportunity elsewhere.
Applying a beta-cutoff means we stop search of a particular branch because we see that the opponent already has a better opportunity elsewhere.

Applying both forms is alpha-beta pruning.

Alpha-cutoff

If, from some state S, the opponent can achieve a state with a lower value for us than one achievable in another branch. we will certainly not move the game to S. We do not need to expand S.

Beta-cutoff

If, from some state S, we would be able to achieve a state which has a higher value for us than one the opponent can hold us to in another branch, we can assume the opponent will not choose S.

Summary

Adapting search for game playing
Minimax method
Negmax simplification
Alpha-cutoff
Beta-cutoff

Questions

Could we use the A* algorithm to improve the effectiveness of minimax?
Why is there no point searching for a solution path in a game-playing problem?
What is minimised and what is maximised in minimax search?
What is special about a static evaluation?
How are static evaluations of board states in a 2-person game derived?
In a game like chess, where the full search tree is very large, how might reasonable static evaluations of board states be obtained?
What is the advantage of the negmax method?
What is the minimum depth of search for the application of alpha-cutoffs?
What is the minimum depth of search for the application of beta-cutoffs?

Exercises

In the game of noughts and crosses (tic-tac-toe), the two players take it in turns to capture an empty cell of a 3x3 grid. The game is won once a line of three cells has been captured. Devise a suitable representation scheme for states in this game.
Estimate the branching factor of the space.
Calculate the maximum depth of a search tree in this game.
Calculate the total size of the state space in this game.

Exercises cont.

On the basis that the underscore represents an unfilled cell, draw out the full tree of states that can be reached from the state

  X X O
  X O O
  _ _ _

Annotate the tree to differentiate levels where X has control (i.e., where it is X's turn) from the levels at which O has control.
Annotate nodes of the tree which represent won/lost states, giving them a value of one 1 if the state is won by X, and zero if it is won by O.

Exercises cont.

Show how the evaluations of immediate successor states can be produced by backing up evaluations of terminal states using first MINIMAX and then NEGMAX.
How well would the NEGMAX evaluation work if you used -1 as the static evaluation for a lost state?
Devise a representation scheme for states in this game which minimises the difficulty of generating successors and evaluating states.
In this game, the first player to move can always force a draw provided a certain procedure is followed. Devise a way of using the evaluation mechanism (i.e., backing up of terminal evaluations) to identify what this procedure is.

Resources

Another chess applet: http://chess.captain.at/
CNN website on Deep Blue v. Kasparov: http://www.cnn.com/WORLD/9705/11/chess.update/