Skip to main content

CLIP: Visual Models from Natural Language (Radford et al 2021)

Draft notes on contrastive image-text pretraining enabling zero-shot image classification.

Draft — not yet written.

Overview

Key Ideas

Further Reading