Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Published 27 June 2016 · 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) · Conference paper

Read the original paper Open-access full text Cite

Summary

The authors introduced a residual learning framework that reformulates network layers to learn residual functions with reference to their inputs (via identity 'shortcut' connections), making very deep networks substantially easier to optimize. They showed that such residual networks gain accuracy from greatly increased depth, evaluating models up to 152 layers deep on ImageNet at lower complexity than VGG networks. The approach won first place in the ILSVRC 2015 classification task and yielded large improvements on detection and localization benchmarks.

Key findings

Identity shortcut connections let networks learn residual mappings, addressing the degradation problem and allowing successful training of networks far deeper than previously feasible.
An ensemble of residual nets achieved 3.57% top-5 error on the ImageNet test set, winning 1st place in the ILSVRC 2015 classification competition.
Deeper residual representations also produced a 28% relative improvement on the COCO object-detection dataset and 1st-place finishes across multiple ILSVRC and COCO 2015 tracks.

Subjects & keywords

Artificial Intelligence deep learning computer vision convolutional neural networks image classification residual networks

Cite this paper

APA

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2016.90

BibTeX

@inproceedings{he2016deep,
  author    = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
  title     = {Deep Residual Learning for Image Recognition},
  booktitle = {2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2016},
  doi       = {10.1109/CVPR.2016.90},
  url       = {https://doi.org/10.1109/CVPR.2016.90}
}

Related in Artificial Intelligence

AI2023

Segment Anything

Alexander Kirillov, Eric Mintun and Nikhila Ravi

This paper introduces the Segment Anything project: a promptable image segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset. SAM combines an image encoder, a flexible prompt encoder (points, boxes, masks, text), and a fast mask decoder to produce valid segmentation masks from arbitrary prompts. Trained on over 1 billion masks across 11 million images, SAM shows strong zero-shot transfer to many segmentation tasks without additional training.

IEEE/CVF International Conference on Computer Vision (ICCV) Open access

AI2023

GPT-4 Technical Report

OpenAI

This technical report describes GPT-4, a large-scale multimodal Transformer model that accepts image and text inputs and produces text outputs. The report emphasizes that GPT-4 achieves human-level performance on a range of professional and academic benchmarks, and details infrastructure and optimization methods that allowed performance to be predicted from much smaller models. For competitive and safety reasons, the report withholds architecture, dataset, and training details.

arXiv Open access

AI2023

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril and Gautier Izacard

The paper presents LLaMA, a family of foundation language models ranging from 7B to 65B parameters trained exclusively on publicly available datasets. It argues that strong performance can be reached without proprietary data and at smaller parameter counts than prior models. LLaMA-13B outperforms the much larger GPT-3 175B on most benchmarks, and LLaMA-65B is competitive with the best contemporary models such as Chinchilla-70B and PaLM-540B.

arXiv preprint (arXiv:2302.13971) Open access