hub

InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025

· 2025 · DOI 10.15607/rss.2025.xxi.010

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

open at publisher browse 18 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TAP-VLA: Tactile Annotation Prompting for Vision Language Action Models

cs.RO · 2026-06-27 · unverdicted · novelty 6.0

TAP-VLA improves VLA performance in contact-rich manipulation by visually annotating tactile shear fields onto input images, reaching 78% success versus under 50% for vision-only and other tactile methods.

LocalNav: Distilling Frontier VLMs and Embodied RL for On-Device Object Goal Navigation

cs.RO · 2026-06-26 · unverdicted · novelty 6.0

Distillation from frontier VLMs plus E-RLVR regularization produces a 4B local model that achieves 34.5% SR on OVON while cutting inference latency by 82.8%.

CoStream: Composing Simple Behaviors for Generalizable Complex Manipulation

cs.RO · 2026-06-24 · unverdicted · novelty 6.0 · 2 refs

CoStream composes semantic, predictive, and reactive behaviors on an SE(3) interface to enable precise, generalizable performance on eight real-world contact-rich manipulation tasks.

Reflective VLA: In-Context Action Consequences Make VLAs Generalize

cs.CV · 2026-06-23 · unverdicted · novelty 6.0

Reflective VLA improves VLA generalization on LIBERO-Plus and LIBERO-Plus-Hard by 5.4 and 4.2 percentage points by conditioning on action consequences instead of reactive single-frame inputs.

dVLA-RL: Reinforcement Learning over Denoising Trajectories for Discrete Diffusion Vision-Language-Action Models

cs.RO · 2026-06-22 · unverdicted · novelty 6.0

dVLA-RL models denoising as an MDP to enable RL on dVLAs via trajectory probabilities, reporting 99.7% success on LIBERO and 30.6% gains over SFT on RoboTwin 2.0.

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

cs.RO · 2026-06-15 · unverdicted · novelty 6.0

ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.

See Selectively, Act Adaptively: Dual-Level Structural Decomposition for Bimanual Robot Manipulation

cs.RO · 2026-06-11 · unverdicted · novelty 6.0

A VLA policy using view-selective visual routing and interaction-aware action MoE improves average success by 27.7% in simulation and 43.3% in real-world bimanual tasks over monolithic baselines.

DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

cs.RO · 2026-06-10 · unverdicted · novelty 6.0

DIRECT is a multimodal-context router that allocates test-time compute across chain-of-thought depth, model size, and memory history for VLM embodied planners, improving the success-cost Pareto frontier and matching stronger models at up to 65% lower latency on benchmarks and a physical Franka arm.

$\mu$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.

CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control

cs.RO · 2026-06-08 · unverdicted · novelty 6.0

CT-VAM is a 68M-parameter cerebello-thalamic-inspired model that achieves competitive LIBERO success rates with lower inference latency than larger VLA models by using a stream-separated attention decoder called TARS.

EmbodimentSemantic: A Spatial Scene-Graph Dataset and Benchmark for Vision-Language Models on Embodied Manipulation Trajectories

cs.RO · 2026-06-06 · unverdicted · novelty 6.0

EmbodimentSemantic is a spatial scene-graph dataset and benchmark for evaluating relational grounding in vision-language models on embodied manipulation trajectories.

Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

SDP constructs sets of desired action-chunks from human correction pairs and trains diffusion policies to align with those sets, yielding better performance and robustness than standard behavior cloning on robotic tasks.

Continuous Reasoning for Vision-Language-Action

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

Continuous Reasoning for VLA introduces a shared Gaussian latent for continuous thoughts, trained with self-verification to improve action prediction on LIBERO-PRO and real robots.

Turning Video Models into Generalist Robot Policies

cs.RO · 2026-05-27 · unverdicted · novelty 6.0

Decouples action-free video world models from embodiment-specific IDMs using Jacobian-based translation to achieve zero-shot cross-embodiment robot policies.

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

cs.RO · 2026-02-09 · unverdicted · novelty 6.0

R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.

PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation

cs.RO · 2026-06-07 · unverdicted · novelty 5.0

PACT is a self-evolving post-training framework that projects diffusion policies onto constraint-feasible regions via reverse-KL distillation and a tightening curriculum, reporting 31% fewer safety violations and 30.7% higher task success on embodied manipulation benchmarks.

Let It Be Simple: One-Step Action Generation for Vision-Language-Action Models

cs.CV · 2026-06-04 · conditional · novelty 5.0

Biasing the training time distribution toward high-noise states enables one-step action generation in VLA models that matches or exceeds ten-step decoding on LIBERO benchmarks and real-robot tasks.

RouterVLA: Turning Smoke Tests into Supervision for Heterogeneous VLA Selection

cs.RO · 2026-06-25 · unverdicted · novelty 4.0

RouterVLA reports that a simple probe-success rule from outcome-separated smoke tests raises held-out VLA success by 14.64pp on 34,752 LIBERO-Plus records, with learned scorers adding no further gain.

citing papers explorer

Showing 17 of 17 citing papers after filters.

TAP-VLA: Tactile Annotation Prompting for Vision Language Action Models cs.RO · 2026-06-27 · unverdicted · none · ref 31
TAP-VLA improves VLA performance in contact-rich manipulation by visually annotating tactile shear fields onto input images, reaching 78% success versus under 50% for vision-only and other tactile methods.
LocalNav: Distilling Frontier VLMs and Embodied RL for On-Device Object Goal Navigation cs.RO · 2026-06-26 · unverdicted · none · ref 12
Distillation from frontier VLMs plus E-RLVR regularization produces a 4B local model that achieves 34.5% SR on OVON while cutting inference latency by 82.8%.
CoStream: Composing Simple Behaviors for Generalizable Complex Manipulation cs.RO · 2026-06-24 · unverdicted · none · ref 11 · 2 links
CoStream composes semantic, predictive, and reactive behaviors on an SE(3) interface to enable precise, generalizable performance on eight real-world contact-rich manipulation tasks.
Reflective VLA: In-Context Action Consequences Make VLAs Generalize cs.CV · 2026-06-23 · unverdicted · none · ref 3
Reflective VLA improves VLA generalization on LIBERO-Plus and LIBERO-Plus-Hard by 5.4 and 4.2 percentage points by conditioning on action consequences instead of reactive single-frame inputs.
dVLA-RL: Reinforcement Learning over Denoising Trajectories for Discrete Diffusion Vision-Language-Action Models cs.RO · 2026-06-22 · unverdicted · none · ref 3
dVLA-RL models denoising as an MDP to enable RL on dVLAs via trajectory probabilities, reporting 99.7% success on LIBERO and 30.6% gains over SFT on RoboTwin 2.0.
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining cs.RO · 2026-06-15 · unverdicted · none · ref 3
ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.
See Selectively, Act Adaptively: Dual-Level Structural Decomposition for Bimanual Robot Manipulation cs.RO · 2026-06-11 · unverdicted · none · ref 11
A VLA policy using view-selective visual routing and interaction-aware action MoE improves average success by 27.7% in simulation and 43.3% in real-world bimanual tasks over monolithic baselines.
DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners? cs.RO · 2026-06-10 · unverdicted · none · ref 61
DIRECT is a multimodal-context router that allocates test-time compute across chain-of-thought depth, model size, and memory history for VLM embodied planners, improving the success-cost Pareto frontier and matching stronger models at up to 65% lower latency on benchmarks and a physical Franka arm.
$\mu$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models cs.LG · 2026-06-10 · unverdicted · none · ref 3
Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.
CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control cs.RO · 2026-06-08 · unverdicted · none · ref 7
CT-VAM is a 68M-parameter cerebello-thalamic-inspired model that achieves competitive LIBERO success rates with lower inference latency than larger VLA models by using a stream-separated attention decoder called TARS.
EmbodimentSemantic: A Spatial Scene-Graph Dataset and Benchmark for Vision-Language Models on Embodied Manipulation Trajectories cs.RO · 2026-06-06 · unverdicted · none · ref 4
EmbodimentSemantic is a spatial scene-graph dataset and benchmark for evaluating relational grounding in vision-language models on embodied manipulation trajectories.
Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections cs.RO · 2026-06-01 · unverdicted · none · ref 4
SDP constructs sets of desired action-chunks from human correction pairs and trains diffusion policies to align with those sets, yielding better performance and robustness than standard behavior cloning on robotic tasks.
Continuous Reasoning for Vision-Language-Action cs.RO · 2026-05-29 · unverdicted · none · ref 6
Continuous Reasoning for VLA introduces a shared Gaussian latent for continuous thoughts, trained with self-verification to improve action prediction on LIBERO-PRO and real robots.
Turning Video Models into Generalist Robot Policies cs.RO · 2026-05-27 · unverdicted · none · ref 6
Decouples action-free video world models from embodiment-specific IDMs using Jacobian-based translation to achieve zero-shot cross-embodiment robot policies.
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning cs.RO · 2026-02-09 · unverdicted · none · ref 7
R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.
PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation cs.RO · 2026-06-07 · unverdicted · none · ref 2
PACT is a self-evolving post-training framework that projects diffusion policies onto constraint-feasible regions via reverse-KL distillation and a tightening curriculum, reporting 31% fewer safety violations and 30.7% higher task success on embodied manipulation benchmarks.
RouterVLA: Turning Smoke Tests into Supervision for Heterogeneous VLA Selection cs.RO · 2026-06-25 · unverdicted · none · ref 23
RouterVLA reports that a simple probe-success rule from outcome-separated smoke tests raises held-out VLA success by 14.64pp on 34,752 LIBERO-Plus records, with learned scorers adding no further gain.

InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer