GSAPP SMORGASBORD

AI Vision and Imagery

by William Martin

Computer Vision

Computer vision (or CV) refers to non-generative AI techniques that enable computers to identify patterns in and extract data from pixellated images sourced from cameras.

From a human perspective, these techniques are intended to mimic humans' and animals' vision abilities, from identifying objects to even in some cases, to extract an understanding of spatial depth and positioning.

Image Analysis

This set of methods refers to abilities to extract data and information from raster data. For example:

depth estimation - Computing z-depth distance values from a given image.
image classification - Producing a label that represents the content of an image
object detection - Producing outlines and labels of perceived physical objects from an image.

Image Generation

This category refers to a set of tasks, techniques, and models that generate imagery given a set of inputs, like text prompts or other images.

text-to-image - Generating an image derived from the meaning of a given text.
text-to-video - Generating an animated image derived from the meaning of a given text.
image-to-image - Producing new images based on previous images, such as style transfer.

Computer Vision with MediaPipe

Upcoming Modules

Computer Vision with Hugging Face
Image Generation (Midjourney, Stable Diffusion, DALL•E)

Computer Vision

Image Analysis

Image Generation

Contents

Upcoming Modules