Mastering the game of Go with deep neural networks and tree search
This paper introduced AlphaGo, a system combining deep convolutional neural networks (policy and value networks) trained by supervised learning from human games and reinforcement learning by self-play, integrated with Monte Carlo tree search. The networks reduce the breadth and depth of the search needed to evaluate Go positions. AlphaGo defeated other Go programs and became the first program to beat a professional human Go player (Fan Hui) on a full-size board.