Monte Carlo Tree Searching UCT implementation

23,609

Solution 1

The best way to generate the tree is a series of random playouts. The trick is being able to balance between exploration and exploitation (this is where the UCT comes in). There are some good code samples and plenty of research paper references here : https://web.archive.org/web/20160308043415/http://mcts.ai:80/index.html

When I implemented the algorithm, I used random playouts until I hit an end point or termination state. I had a static evaluation function that would calculate the payoff at this point, then the score from this point is propagated back up the tree. Each player or "team" assumes that the other team will play the best move for themselves, and the worst move possible for their opponent.

I would also recommend checking out the papers by Chaslot and his phd thesis as well as some of the research that references his work (basically all the MCTS work since then).


For example: Player 1's first move could simulate 10 moves into the future alternating between player 1 moves and player 2 moves. Each time you must assume that the opposing player will try to minimize your score whilst maximizing their own score. There is an entire field based on this known as Game Theory. Once you simulate to the end of 10 games, you iterate from the start point again (because there is no point only simulating one set of decisions). Each of these branches of the tree must be scored where the score is propagated up the tree and the score represents the best possible payoff for the player doing the simulating assuming that the other player is also choosing the best moves for themselves.

MCTS consists of four strategic steps, repeated as long as there is time left. The steps are as follows.

  1. In the selection step the tree is traversed from the root node until we reach a node E, where we select a position that is not added to the tree yet.

  2. Next, during the play-out step moves are played in self-play until the end of the game is reached. The result R of this “simulated” game is +1 in case of a win for Black (the first player in LOA), 0 in case of a draw, and −1 in case of a win for White.

  3. Subsequently, in the expansion step children of E are added to the tree.

  4. Finally, R is propagated back along the path from E to the root node in the backpropagation step. When time is up, the move played by the program is the child of the root with the highest value. (This example is taken from this paper - PDF

www.ru.is/faculty/yngvi/pdf/WinandsBS08.pdf

Here are some implementations:

A list of libraries and games using some MCTS implementations http://senseis.xmp.net/?MonteCarloTreeSearch

and a game independent open source UCT MCTS library called Fuego http://fuego.sourceforge.net/fuego-doc-1.1/smartgame-doc/group__sguctgroup.html

Solution 2

I wrote this one if you're interrested : https://github.com/avianey/mcts4j

Solution 3

From http://mcts.ai/code/index.html:

Below are links to some basic MCTS implementations in various
programming languages. The listings are shown with timing, testing
and debugging code removed for readability.

Java

Python

Share:
23,609
Makers_F
Author by

Makers_F

Updated on July 09, 2022

Comments

  • Makers_F
    Makers_F almost 2 years

    Can you explain me how to build the tree?

    I quite understood how the nodes are chosen, but a nicer explanation would really help me implementing this algorithm. I already have a board representing the game state, but I don't know (understand) how to generate the tree.

    Can someone points me to a well commented implementation of the algorithm (I need to use it for AI)? Or better explanation/examples of it?

    I didn't found a lot of resources on the net, this algorithm is rather new...

  • Makers_F
    Makers_F about 12 years
    This is fairly clear. But is the tree constructed while making the decision, or is it build before, and then the AI uses it to determinate the right move? Can you write point per point from the start (nothing in memory) to the right move decision the steps the algorithm does?
  • danielbeard
    danielbeard about 12 years
    Generally the tree is constructed while making a series of simulated decisions, and then the "actual" play through is made based on these previous decisions. An easy way to accomplish this is to have a method that can store the state of the game - I notice you already have this, then play through x amount of times (this depends on either how much computation time you have, or the quality of the choices required) and then restore the initial game state you simulated from and make a choice from there using the constructed and scored tree.
  • danielbeard
    danielbeard about 12 years
    I also updated my answer with the main steps of MCTS and a link
  • Makers_F
    Makers_F about 12 years
    I need to run it on a mobile device (read: no much memory, no fast cpu). So i thought of run multiple simulations on my pc, save the tree(slightly modified) to a file, and them in my app implement a method that can easily read the saved file (modified to be more easily readable without loading it all in the memory).[if i don't save changes back to the file] I will lose the learning part of it (since the matches the real player does don't update the tree), but i'll get fairly good ai for little expense. Is this right/feasible?
  • danielbeard
    danielbeard about 12 years
    Depends on the size of the possible tree. Even a tic-tac-toe game can have a surprisingly large game tree and you would have to essentially brute force every possible move. This would take forever for something like chess. A possible implementation would to set up a server running a service based MCTS implementation. Found it! Here are some existing implementations: senseis.xmp.net/?MonteCarloTreeSearch and a game independent UCT MCTS library called Fuego fuego.sourceforge.net/fuego-doc-1.1/smartgame-doc/…
  • Makers_F
    Makers_F about 12 years
    I'm a student, i do not have money to set up a server ;) In addition, the user will need to be always on line to play. If the game goes very well, i can plan to add it as a new play category. Btw my game doesn't have too much states(just n_playern_total_element), this means that in a situation like 210 (in my game this is not so restrictive), if i build in an intelligent way the tree (not duplicate nodes) i'll have 1024 nodes representing all possible states. This is not so much. In addition i don't think i'll ever reach 315 = 14,34M states, 312 = 0,5M is really acceptable
  • Makers_F
    Makers_F about 12 years
    i made a "little" mistake: each element has much more states that what i was thinking, so definitely building a complete tree is impossible. I'll stick with standard mcts. Can i make last few questions about mcts.ai/?q=code/simple_java ? Why are nAction static and = 5? Where does the algorithm ask my board for the possible moves or the right move to do? What should return rollout(TreeNode tn)? Thank you very much!
  • danielbeard
    danielbeard about 12 years
    The Rollout method is where you play your game, nActions is how many child nodes to create and add to an existing node.
  • Makers_F
    Makers_F about 12 years
    how many child nodes to create = all possible (intelligent) move from the current state?
  • Thomas Ahle
    Thomas Ahle over 10 years
    I don't understand how this simulated approach is an improvement. If you are basically searching through the entire tree multiple times, why not just use a normal min/max, which would only need one visit of each node?
  • danielbeard
    danielbeard over 10 years
    The entire Monte-Carlo method is for when outcomes are not deterministic (i.e they have some randomness), or when it's difficult to build a good heuristic scoring algorithm, or when simulations are continuous (not turn based). Minimax does not work when games are not turn-based in nature. More than one visit attempts to give an "average" outcome.