Title resolution pending

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, et al

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

browse 10 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.

HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

cs.RO · 2026-03-18 · unverdicted · novelty 7.0

HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.

KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models

cs.RO · 2026-03-02 · unverdicted · novelty 7.0

KERV integrates kinematic Kalman Filter predictions with speculative decoding in VLA models to achieve 27-37% faster inference while maintaining nearly the same task success rates.

Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation

cs.RO · 2026-02-18 · unverdicted · novelty 7.0

PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.

ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

ST-π structures VLA models by having a spatiotemporal VLM produce causally ordered chunk-level prompts that guide a dual-generator action expert to jointly handle spatial and temporal control in robotic manipulation.

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

OFlow unifies temporal foresight and object-aware reasoning inside a shared latent space via flow matching to improve VLA robustness in robotic manipulation under distribution shifts.

Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

cs.RO · 2026-04-14 · unverdicted · novelty 6.0

Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.

ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement

cs.RO · 2026-04-22 · unverdicted · novelty 5.0

ALAS disentangles environment and self-state streams via bio-inspired modules to deliver 23% higher subtask success and 29% better execution efficiency on long-horizon HSI tasks.

DA-PTQ: Drift-Aware Post-Training Quantization for Efficient Vision-Language-Action Models

cs.RO · 2026-04-13 · unverdicted · novelty 5.0

DA-PTQ quantizes VLAs by compensating cross-space distortions and allocating mixed precision to minimize motion errors and kinematic drift in trajectories.

citing papers explorer

Showing 10 of 10 citing papers.

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace cs.AI · 2026-04-09 · unverdicted · none · ref 20
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness cs.RO · 2026-03-18 · unverdicted · none · ref 8
HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.
KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models cs.RO · 2026-03-02 · unverdicted · none · ref 10
KERV integrates kinematic Kalman Filter predictions with speculative decoding in VLA models to achieve 27-37% faster inference while maintaining nearly the same task success rates.
Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation cs.RO · 2026-02-18 · unverdicted · none · ref 25
PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.
ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation cs.RO · 2026-04-20 · unverdicted · none · ref 16
ST-π structures VLA models by having a spatiotemporal VLM produce causally ordered chunk-level prompts that guide a dual-generator action expert to jointly handle spatial and temporal control in robotic manipulation.
OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation cs.RO · 2026-04-20 · unverdicted · none · ref 30
OFlow unifies temporal foresight and object-aware reasoning inside a shared latent space via flow matching to improve VLA robustness in robotic manipulation under distribution shifts.
Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models cs.RO · 2026-04-14 · unverdicted · none · ref 26
Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.
Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA cs.RO · 2026-04-03 · unverdicted · none · ref 17
SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.
ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement cs.RO · 2026-04-22 · unverdicted · none · ref 14
ALAS disentangles environment and self-state streams via bio-inspired modules to deliver 23% higher subtask success and 29% better execution efficiency on long-horizon HSI tasks.
DA-PTQ: Drift-Aware Post-Training Quantization for Efficient Vision-Language-Action Models cs.RO · 2026-04-13 · unverdicted · none · ref 9
DA-PTQ quantizes VLAs by compensating cross-space distortions and allocating mixed precision to minimize motion errors and kinematic drift in trajectories.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer