Finding the best move using MinMax with Alpha-Beta pruning

21,587

Solution 1

After some research and a lot of time wasted solving this problem, I came up with this solution that seems to work.

private class MoveValue {

    public double returnValue;
    public Move returnMove;

    public MoveValue() {
        returnValue = 0;
    }

    public MoveValue(double returnValue) {
        this.returnValue = returnValue;
    }

    public MoveValue(double returnValue, Move returnMove) {
        this.returnValue = returnValue;
        this.returnMove = returnMove;
    }

}


protected MoveValue minMax(double alpha, double beta, int maxDepth, MarbleType player) {       
    if (!canContinue()) {
        return new MoveValue();
    }        
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    double value = 0;
    boolean isMaximizer = (player.equals(playerType)); 
    if (maxDepth == 0 || board.isGameOver()) {            
        value = evaluateBoard();            
        return new MoveValue(value);
    }
    MoveValue returnMove;
    MoveValue bestMove = null;
    if (isMaximizer) {           
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue < returnMove.returnValue)) {
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            }
            if (returnMove.returnValue > alpha) {
                alpha = returnMove.returnValue;
                bestMove = returnMove;
            }
            if (beta <= alpha) {
                bestMove.returnValue = beta;
                bestMove.returnMove = null;
                return bestMove; // pruning
            }
        }
        return bestMove;
    } else {
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue > returnMove.returnValue)) {
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            }
            if (returnMove.returnValue < beta) {
                beta = returnMove.returnValue;
                bestMove = returnMove;
            }
            if (beta <= alpha) {
                bestMove.returnValue = alpha;
                bestMove.returnMove = null;
                return bestMove; // pruning
            }
        }
        return bestMove;
    }   
}

Solution 2

This is a bit diffuclt as the given code is not an actual Java implementation; in order to achieve what you want, there must be concrete types to represent a move and position in the game tree. Usually the the game tree is not explicitly encoded but navigated in a sparse representation where the implementation would actually perform the move in question, evaluate the resulting smaller problem recursively and undo the move, thus using depth-first search by using the call stack so represent the current path.

To obtain the actual best move, simply return the instance from your method which maximizes the subsequent evaluation. It might be helpful to first implement the Minimax algorithm without alpha-beta-pruning, which is added in a subsequent steps after the basic structure works.

The implementation from the link in the question (Section 1.5) actually returns the best move, as indicated in the following comment taken from there.

/** Recursive minimax at level of depth for either
    maximizing or minimizing player.
    Return int[3] of {score, row, col}  */

Here no user-defined type is used to represent the move, but the method returns three values, which are the evaluated best score and the coordinates to which the player would move to actually perform the best move (which the implementation already has done to obtain the score), which are a representation of the actual move.

Share:
21,587
StepTNT
Author by

StepTNT

(your about me is currently blank)

Updated on July 18, 2022

Comments

  • StepTNT
    StepTNT almost 2 years

    I'm working on an AI for a game and I want to use the MinMax algorithm with the Alpha-Beta pruning.

    I have a rough idea on how it works but I'm still not able to write the code from scratch, so I've spend the last two days looking for some kind of pseudocode online.

    My problem is that every pseudocode I've found online seems to be based on finding the value for the best move while I need to return the best move itself and not a number.

    My current code is based on this pseudocode (source)

    minimax(level, player, alpha, beta){  // player may be "computer" or "opponent"
        if (gameover || level == 0)
           return score
        children = all valid moves for this "player"
        if (player is computer, i.e., max's turn){
           // Find max and store in alpha
           for each child {
              score = minimax(level - 1, opponent, alpha, beta)
              if (score > alpha) alpha = score
              if (alpha >= beta) break;  // beta cut-off
           }
           return alpha
        } else (player is opponent, i.e., min's turn)
           // Find min and store in beta
           for each child {
              score = minimax(level - 1, computer, alpha, beta)
              if (score < beta) beta = score
              if (alpha >= beta) break;  // alpha cut-off
           }
           return beta
        }
    }
    
    // Initial call with alpha=-inf and beta=inf
    minimax(2, computer, -inf, +inf)
    

    As you can see, this code returns a number and I guess that this is needed to make everything work (since the returned number is used during the recursion).

    So I thought that I may use an external variable to store the best move, and this is how I've changed the previous code:

    minimax(level, player, alpha, beta){  // player may be "computer" or "opponent"
        if (gameover || level == 0)
           return score
        children = all valid moves for this "player"
        if (player is computer, i.e., max's turn){
           // Find max and store in alpha
           for each child {
              score = minimax(level - 1, opponent, alpha, beta)
              if (score > alpha) {
                  alpha = score
                  bestMove = current child // ROW THAT I ADDED TO UPDATE THE BEST MOVE
              }
              if (alpha >= beta) break;  // beta cut-off
           }
           return alpha
        } else (player is opponent, i.e., min's turn)
           // Find min and store in beta
           for each child {
              score = minimax(level - 1, computer, alpha, beta)
              if (score < beta) beta = score
              if (alpha >= beta) break;  // alpha cut-off
           }
           return beta
        }
    }
    
    // Initial call with alpha=-inf and beta=inf
    minimax(2, computer, -inf, +inf)
    

    Now, this is how it makes sense to me, because we need to update the best move only if it's player's turn and if the move is better than the previous.

    So, while I think that this one's correct (even if I'm not 100% sure), the source has also a java implementation which updates the bestMove even in the score < beta case and I don't understand why.

    Trying with that implementation led my code to choose as best move a move from the oppositing player, which doesn't seem to be correct (assuming that I'm the black player, I'm looking for the best move that I can make so I'm expecting a "black" move and not a "white" one).

    I don't know if my pseudocode (the second one) is the correct way to find the best move using MinMax with alpha-beta pruning or if I need to update the best move even in the score < beta case.

    Please feel free to suggest any new and bettere pseudocode if you prefer, I'm not bound to anything and I don't mind rewriting some code if it's better than mine.

    EDIT:

    Since I can't understand the replies, I guess that maybe the question doesn't ask what I want to know so I'm trying to write it better here.

    Provided that I want to get the best move only for one player and that this player, which is the maximizer, is passed to the MinMax function everytime that I need a new move (so that minmax(2, black, a, b) returns the best move for the black player while minmax(2, white, a ,b) returns the best one for the white player), how would you change the first pseudocode (or the java implementation in the source) to store this given best move somewhere?

    EDIT 2:

    Let's see if we can make it work this way.

    This is my implementation, can you please tell me if it's correct?

    //PlayerType is an enum with just White and Black values, opponent() returns the opposite player type
    protected int minMax(int alpha, int beta, int maxDepth, PlayerType player) {        
        if (!canContinue()) {
            return 0;
        }
        ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
        Iterator<Move> movesIterator = moves.iterator();
        int value = 0;
        boolean isMaximizer = (player.equals(playerType)); // playerType is the player used by the AI        
        if (maxDepth == 0 || board.isGameOver()) {
            value = evaluateBoard();
            return value;
        }
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            value = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if (isMaximizer) {
                if (value > alpha) {
                    selectedMove = currentMove;
                    alpha = value;
                }
            } else {
                if (value < beta) {
                    beta = value;
                }
            }
            if (alpha >= beta) {
                break;
            }
        }
        return (isMaximizer) ? alpha : beta;
    }
    

    EDIT 3:

    New implementation based on @Codor's answer/comments

    private class MoveValue {
        public Move move;
        public int value;
    
        public MoveValue() {
            move = null;
            value = 0;
        }
    
        public MoveValue(Move move, int value) {
            this.move = move;
            this.value = value;
        }
    
        @Override
        public String toString() {
            return "MoveValue{" + "move=" + move + ", value=" + value + '}';
        }
    
    }
    
    protected MoveValue minMax(int alpha, int beta, int maxDepth, PlayerType player) {
        if (!canContinue()) {
            return new MoveValue();
        }
        ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
        Iterator<Move> movesIterator = moves.iterator();
        MoveValue moveValue = new MoveValue();
        boolean isMaximizer = (player.equals(playerType));
        if (maxDepth == 0 || board.isGameOver()) {            
            moveValue.value = evaluateBoard();
            return moveValue;
        }
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            moveValue = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if (isMaximizer) {
                if (moveValue.value > alpha) {
                    selectedMove = currentMove;
                    alpha = moveValue.value;
                }
            } else {
                if (moveValue.value < beta) {
                    beta = moveValue.value;
                    selectedMove = currentMove;
                }
            }
            if (alpha >= beta) {
                break;
            }
        }
        return (isMaximizer) ? new MoveValue(selectedMove, alpha) : new MoveValue(selectedMove, beta);
    }
    

    I don't know if I got it right or if I did something wrong, but I'm back to the problem I had when I posted the question:

    calling minMax(Integer.MIN_VALUE, Integer.MAX_VALUE, 1, PlayerType.Black) returns a move that can be done only by the white player and this is not what I need.

    I need the best move for the given player, not the best move for the whole board.

  • StepTNT
    StepTNT over 9 years
    How would you change wikipedia's pseudocodes to store the best move and not only the best value? By the way, in the source link in the question there's a java implementation which is what I'm using (I've just changed the type of the best move), you can use that one as reference
  • Codor
    Codor over 9 years
    The implementation you refer to (Section 1.5) actually returns the best move; see the updated answer.
  • StepTNT
    StepTNT over 9 years
    Ok. Now, given that this implementation returns the best move despite the current player (and that's why ir updates the best move in both score > alpha and score < beta cases), to get the best move for the maximizer in which of the 2 cases do I have to update the best move?
  • Codor
    Codor over 9 years
    In both cases, as these cases correspond to the two different players evaluating alternatively. To put it more colloquially, the algorithm gets called to evaluate for player A; then it asks "what moves can I do?" - for each of these moves, hypothetically perform it, call yourself but "be player B to see what he or she would do" (which results in a board with less free positions), store the value of the move to examine it and undo to evaluate the next move. The alpha and beta values basically just eliminate checking of all possible moves as soon as a "good enough" move is found.
  • StepTNT
    StepTNT over 9 years
    I'm sorry but I still don't get it, I updated the question, hopefully now it's clearer what I need to know
  • Codor
    Codor over 9 years
    Although this is not actually rocket science, it is not trivial as it involves optimization, recursion and tre search. The presentation from the link also uses a "non-standard" scoring for evaluation; the values 1 (player 1 wins), -1 (player 2 wins) and 0 (game not in terminal state) might be easier. Is the basic minimax algorithm clear to you? On second inspection, the code from the link seems broken. Note that the return type of minimax is int[], but gets assigned to score, which is an int.
  • StepTNT
    StepTNT over 9 years
    Basic minmax should be clear, but I'd have the same problem that I have with alpha-beta: minmax's goal is to assign the best value to the root node, which is a maximizer, starting from the leaves. What I need is the best move though, not the best utility value that I can achieve by applying that move. So, to summarize, even if I was using plain minmax, how would I make it store the best move somewhere? (Please forgive me but I'm quite new to this stuff) EDIT: I've added my implementation to the question
  • Codor
    Codor over 9 years
    You're almost there. Change the return type to Move, store 'currentMove' in either case and return selectedMove in the end. However this is impossible with the current types; create a type which included both the move and its value and use this type instead of Move.
  • StepTNT
    StepTNT over 9 years
    Added the edited implementation, hoping that I wrote it correctly. I have the same problem as before though, I've wrote it in the question
  • Codor
    Codor over 9 years