Human-level control through deep reinforcement learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver · 19 authors total (Google DeepMind). Remaining authors: Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis.

Published 25 February 2015 · Nature · Journal article

Read the original paper Cite

Summary

The paper introduced the Deep Q-Network (DQN), which combines Q-learning with deep convolutional networks and stabilizing techniques such as experience replay and a target network. Trained end-to-end from raw pixels and game scores, a single architecture and hyperparameter set learned to play 49 Atari 2600 games. It reached or exceeded the level of a professional human games tester on the majority of titles, demonstrating a general agent learning directly from high-dimensional sensory input.

Key findings

Combined deep convolutional networks with Q-learning, using experience replay and target networks for stable training
A single agent learned 49 Atari 2600 games from raw pixels, surpassing prior algorithms on most
Achieved performance comparable to a professional human player on more than half of the games tested

Subjects & keywords

Artificial Intelligence deep reinforcement learning dqn atari q-learning neural networks

Cite this paper

APA

Volodymyr Mnih, Koray Kavukcuoglu, & David Silver [19 authors total (Google DeepMind). Remaining authors: Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis.] (2015). Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236

BibTeX

@article{mnih2015humanlevel,
  author    = {Volodymyr Mnih and Koray Kavukcuoglu and David Silver and {19 authors total (Google DeepMind). Remaining authors: Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis.}},
  title     = {Human-level control through deep reinforcement learning},
  journal   = {Nature},
  year      = {2015},
  doi       = {10.1038/nature14236},
  url       = {https://doi.org/10.1038/nature14236}
}

Related in Artificial Intelligence

AI2023

Segment Anything

Alexander Kirillov, Eric Mintun and Nikhila Ravi

This paper introduces the Segment Anything project: a promptable image segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset. SAM combines an image encoder, a flexible prompt encoder (points, boxes, masks, text), and a fast mask decoder to produce valid segmentation masks from arbitrary prompts. Trained on over 1 billion masks across 11 million images, SAM shows strong zero-shot transfer to many segmentation tasks without additional training.

IEEE/CVF International Conference on Computer Vision (ICCV) Open access

AI2023

GPT-4 Technical Report

OpenAI

This technical report describes GPT-4, a large-scale multimodal Transformer model that accepts image and text inputs and produces text outputs. The report emphasizes that GPT-4 achieves human-level performance on a range of professional and academic benchmarks, and details infrastructure and optimization methods that allowed performance to be predicted from much smaller models. For competitive and safety reasons, the report withholds architecture, dataset, and training details.

arXiv Open access

AI2023

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril and Gautier Izacard

The paper presents LLaMA, a family of foundation language models ranging from 7B to 65B parameters trained exclusively on publicly available datasets. It argues that strong performance can be reached without proprietary data and at smaller parameter counts than prior models. LLaMA-13B outperforms the much larger GPT-3 175B on most benchmarks, and LLaMA-65B is competitive with the best contemporary models such as Chinchilla-70B and PaLM-540B.

arXiv preprint (arXiv:2302.13971) Open access