Learning Correspondence from the Cycle-Consistency of Time

Alexei A. Efros; Allan Jabri; Xiaolong Wang

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1903.07593 v2 pith:MMGNBXXZ submitted 2019-03-18 cs.CV cs.AIcs.LG

Learning Correspondence from the Cycle-Consistency of Time

Xiaolong Wang , Allan Jabri , Alexei A. Efros This is my paper

classification cs.CV cs.AIcs.LG

keywords timecorrespondencelearningrepresentationvisualacrosscycle-consistencymethods

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
eess.AS 2019-07 unverdicted novelty 6.0

CycleVAE optimizes non-parallel voice conversion indirectly via cyclic reconstructed spectra, yielding higher spectral accuracy, latent feature correlation, and improved converted speech quality.