pith. sign in

Scal- ing 4d representations.arXiv preprint arXiv:2412.15212

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.CV 3 cs.AI 1

years

2026 2 2025 2

verdicts

UNVERDICTED 4

representative citing papers

LA-Pose: Latent Action Pretraining Meets Pose Estimation

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods on Waymo and PandaSet benchmarks by repurposing latent actions from self-supervised inverse-dynamics pretraining while using orders of magnitude less labeled 3D data.

Frozen Forecasting: A Unified Evaluation

cs.CV · 2025-07-18 · unverdicted · novelty 6.0

A new evaluation framework using latent diffusion on frozen vision backbones shows video-pretrained models consistently outperform image-based ones in forecasting entire trajectories across abstraction levels.

citing papers explorer

Showing 4 of 4 citing papers.

  • LA-Pose: Latent Action Pretraining Meets Pose Estimation cs.CV · 2026-04-30 · unverdicted · none · ref 8

    LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods on Waymo and PandaSet benchmarks by repurposing latent actions from self-supervised inverse-dynamics pretraining while using orders of magnitude less labeled 3D data.

  • Frozen Forecasting: A Unified Evaluation cs.CV · 2025-07-18 · unverdicted · none · ref 4

    A new evaluation framework using latent diffusion on frozen vision backbones shows video-pretrained models consistently outperform image-based ones in forecasting entire trajectories across abstraction levels.

  • V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning cs.AI · 2025-06-11 · unverdicted · none · ref 12

    V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.

  • Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models cs.CV · 2026-05-18 · unverdicted · none · ref 4

    Freezing an image foundation model and training only a recurrent temporal module yields strong temporal performance on video tasks without large-scale video pre-training.