Scal- ing 4d representations.arXiv preprint arXiv:2412.15212

Jo ˜ao Carreira, Dilara Gokay, Michael King, Chuhan Zhang, Ignacio Rocco, Aravindh Mahendran, Thomas Albert Keck, Joseph Heyward, Skanda Koppula, Etienne Pot, Goker Erdogan, Yana Hasson, Yi Yang, Klaus Greff, Guillaume Le Moing, Sjoerd · 2024 · arXiv 2412.15212

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

LA-Pose: Latent Action Pretraining Meets Pose Estimation

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods on Waymo and PandaSet benchmarks by repurposing latent actions from self-supervised inverse-dynamics pretraining while using orders of magnitude less labeled 3D data.

Frozen Forecasting: A Unified Evaluation

cs.CV · 2025-07-18 · unverdicted · novelty 6.0

A new evaluation framework using latent diffusion on frozen vision backbones shows video-pretrained models consistently outperform image-based ones in forecasting entire trajectories across abstraction levels.

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

cs.AI · 2025-06-11 · unverdicted · novelty 6.0

V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.

Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

Freezing an image foundation model and training only a recurrent temporal module yields strong temporal performance on video tasks without large-scale video pre-training.

citing papers explorer

Showing 4 of 4 citing papers.

LA-Pose: Latent Action Pretraining Meets Pose Estimation cs.CV · 2026-04-30 · unverdicted · none · ref 8
LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods on Waymo and PandaSet benchmarks by repurposing latent actions from self-supervised inverse-dynamics pretraining while using orders of magnitude less labeled 3D data.
Frozen Forecasting: A Unified Evaluation cs.CV · 2025-07-18 · unverdicted · none · ref 4
A new evaluation framework using latent diffusion on frozen vision backbones shows video-pretrained models consistently outperform image-based ones in forecasting entire trajectories across abstraction levels.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning cs.AI · 2025-06-11 · unverdicted · none · ref 12
V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.
Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models cs.CV · 2026-05-18 · unverdicted · none · ref 4
Freezing an image foundation model and training only a recurrent temporal module yields strong temporal performance on video tasks without large-scale video pre-training.

Scal- ing 4d representations.arXiv preprint arXiv:2412.15212

fields

years

verdicts

representative citing papers

citing papers explorer