Course Materials | Intro to Computer Vision (Spring '24)

There is no required textbook for this class. However, you may find the texts and resources on this page useful during the semester. Many of these resources are copied from MIT’s Advances in Computer Vision course website.

Sample Colab notebooks

Basic image operations

Colab and PyTorch

Colab Tutorial 1
Colab Tutorial 2
Colab and PyTorch
PyTorch
Basics of PyTorch
PyTorch Tutorial
Deep Learning 60 Minute Blitz with PyTorch

Computer Vision

[Sz] Szeliski, Computer Vision: Algorithms and Applications, 2022 (online draft)
[HZ] Hartley and Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2004
[FP] Forsyth and Ponce, Computer Vision: A Modern Approach, Prentice Hall, 2002
[Pa] Palmer, Vision Science, MIT Press, 1999

Learning

[GBC] Goodfellow, Bengio, Courville, Deep Learning, MIT Press, 2016
[Mi] Mitchel, Machine Learning, McGraw-Hill, 1997
[DHS] Duda, Hart and Stork, Pattern Classification (2nd Edition), Wiley-Interscience, 2000
[SB] Sutton & Barto, On-line book. The classic reference to the field of reinforcement learning.

Graphical Models

[KF] Koller and Friedman, Probabilistic Graphical Models: Principles and Techniques, MIT Press, 2009

Image Datasets

Labelme: an online annotation tool to build image databases for computer vision research
OpenSurfaces: a large database of annotated surfaces created from real-world consumer photographs.
ImageNet: a large-scale image dataset for visual recognition organized by WordNet hierarchy
ADE20K Dataset: a benchmark for scene and instance segmentation, with pixelwise semantic annotations
Places Database: a scene-centric database with 205 scene categories and 2.5 millions of labelled images
NYU Depth Dataset v2: a RGB-D dataset of segmented indoor scenes
Microsoft COCO: a new benchmark for image recognition, segmentation and captioning
Flickr100M: 100 million creative commons Flickr images
Labeled Faces in the Wild: a dataset of 13,000 labeled face photographs
Human Pose Dataset: a benchmark for articulated human pose estimation
YouTube Faces DB: a face video dataset for unconstrained face recognition in videos
UCF101: an action recognition data set of realistic action videos with 101 action categories
HMDB-51: a large human motion dataset of 51 action classes
CelebA: 250,000 faces of celebrities with labeled attributes
FFHQ: 70,000 high-res (1024 x 1024) face images sourced from Flickr

Top computer vision conferences and papers:

CVPR: IEEE Conference on Computer Vision and Pattern Recognition
ICCV: International Conference on Computer Vision
ECCV: European Conference on Computer Vision
NIPS: Neural Information Processing Systems

Related courses:

Advances in Computer Vision, by Bill Freeman and Phillip Isola
Introduction to Computer Vision, by Michael Black
Learning-Based Methods in Vision, by Alyosha Efros
Computer Vision, by Kristen Grauman
Computer Vision, by Rob Fergus
Introduction to Computer Vision, by Fei-Fei Li