Monte-Carlo Planning: Policy Improvement Alan Fern 1 Monte-Carlo
arm pulls is upper bounded by O?−??for a constant c that is larger than the constant for Uniform(this holds for "large enough" n). ... Define: ?′?=argmax????,?,? ... which will run all trajectories until the simulator hits a terminal state.