Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning

Derek Lim; Joshua Robinson; Sharut Gupta; Soledad Villar; Stefanie Jegelka

arxiv: 2306.13924 · v1 · pith:HJWAQDCZnew · submitted 2023-06-24 · 💻 cs.LG · cs.CV

Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning

Sharut Gupta , Joshua Robinson , Derek Lim , Soledad Villar , Stefanie Jegelka This is my paper

classification 💻 cs.LG cs.CV

keywords spacecontrastivedataembeddingaugmentationscorrespondinputlearning

0 comments

read the original abstract

Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embedding space. Specifically, in the contrastive learning setting, we introduce an equivariance objective and theoretically prove that its minima forces augmentations on input space to correspond to rotations on the spherical embedding space. We show that merely combining our equivariant loss with a non-collapse term results in non-trivial representations, without requiring invariance to data augmentations. Optimal performance is achieved by also encouraging approximate invariance, where input augmentations correspond to small rotations. Our method, CARE: Contrastive Augmentation-induced Rotational Equivariance, leads to improved performance on downstream tasks, and ensures sensitivity in embedding space to important variations in data (e.g., color) that standard contrastive methods do not achieve. Code is available at https://github.com/Sharut/CARE.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TriMotion: Modality-Agnostic Camera Control for Video Generation
cs.CV 2026-06 unverdicted novelty 6.0

TriMotion is a modality-agnostic framework that maps video, pose, and text descriptions of the same camera trajectory into a shared motion embedding space, trained with a new triplet dataset and latent consistency obj...