CaRLi-V fuses RADAR velocity cube, camera optical flow, and LiDAR ranges in a closed-form solution to produce dense point-wise 3D velocity estimates that outperform scene flow methods on a custom dataset.
Raft: Recurrent all-pairs field transforms for optical flow
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
A RANSAC-based geometric gate routes regions to homography or optical flow warping before SSP fusion, improving mIoU by 4.24-4.91% on synthetic UAVid with only 211K added parameters to frozen backbones.
Decouples action-free video world models from embodiment-specific IDMs using Jacobian-based translation to achieve zero-shot cross-embodiment robot policies.
EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.
AV1 motion vectors serve as a high-fidelity warm-start for the RAFT optical flow network, delivering a four-fold speedup in convergence with only minor end-point error increase compared to standard initialization.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
citing papers explorer
-
CaRLi-V: Camera-RADAR-LiDAR Point-Wise 3D Velocity Estimation
CaRLi-V fuses RADAR velocity cube, camera optical flow, and LiDAR ranges in a closed-form solution to produce dense point-wise 3D velocity estimates that outperform scene flow methods on a custom dataset.
-
Zero-Parameter Geometric Gating for Temporally Stable Low-Altitude UAV Video Semantic Segmentation
A RANSAC-based geometric gate routes regions to homography or optical flow warping before SSP fusion, improving mIoU by 4.24-4.91% on synthetic UAVid with only 211K added parameters to frozen backbones.
-
Turning Video Models into Generalist Robot Policies
Decouples action-free video world models from embodiment-specific IDMs using Jacobian-based translation to achieve zero-shot cross-embodiment robot policies.
-
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
EgoDyn-Bench reveals a perception bottleneck in vision-centric foundation models: ego-motion logic derives from language while visual input adds negligible signal, with explicit trajectories restoring consistency.
-
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
-
Learning Zero-Shot Subject-Driven Video Generation Using 1% Compute
A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.
-
AV1 Motion Vector Fidelity and Application for Efficient Optical Flow
AV1 motion vectors serve as a high-fidelity warm-start for the RAFT optical flow network, delivering a four-fold speedup in convergence with only minor end-point error increase compared to standard initialization.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.