CaRLi-V fuses RADAR velocity cube, camera optical flow, and LiDAR ranges in a closed-form solution to produce dense point-wise 3D velocity estimates that outperform scene flow methods on a custom dataset.
Raft: Recurrent all-pairs field transforms for optical flow.arXiv:2003.12039, 2020
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6representative citing papers
EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.
AV1 motion vectors serve as a high-fidelity warm-start for the RAFT optical flow network, delivering a four-fold speedup in convergence with only minor end-point error increase compared to standard initialization.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
citing papers explorer
-
CaRLi-V: Camera-RADAR-LiDAR Point-Wise 3D Velocity Estimation
CaRLi-V fuses RADAR velocity cube, camera optical flow, and LiDAR ranges in a closed-form solution to produce dense point-wise 3D velocity estimates that outperform scene flow methods on a custom dataset.
-
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
-
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
-
Learning Zero-Shot Subject-Driven Video Generation Using 1% Compute
A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.
-
AV1 Motion Vector Fidelity and Application for Efficient Optical Flow
AV1 motion vectors serve as a high-fidelity warm-start for the RAFT optical flow network, delivering a four-fold speedup in convergence with only minor end-point error increase compared to standard initialization.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.