arXiv preprint arXiv:2406.04339 (2024)

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning, Manipulation · 2024 · arXiv 2406.04339

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models

cs.RO · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

AT-VLA proposes adaptive tactile injection and a dual-stream tactile reaction mechanism to enhance VLA models for contact-rich robotic manipulation with real-time responses.

Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment

cs.RO · 2026-04-27 · unverdicted · novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.

HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

cs.RO · 2026-03-18 · unverdicted · novelty 7.0

HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

cs.CV · 2025-03-13 · unverdicted · novelty 6.0

HybridVLA unifies diffusion and autoregression in a single VLA model via collaborative training and ensemble to raise robot manipulation success rates by 14% in simulation and 19% in real-world tasks.

What Matters in Building Vision-Language-Action Models for Generalist Robots

cs.RO · 2024-12-18 · unverdicted · novelty 5.0

Systematic tests of VLM backbones, policy architectures, and cross-embodiment data yield RoboVLMs that set new SOTA on robot manipulation benchmarks while requiring few manual designs.

The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency

cs.AR · 2026-04-09 · unverdicted · novelty 4.0

Mamba-3 architectural changes optimized for hyperscale GPUs cause 28% higher edge latency at 880M parameters and 48% at 15M parameters compared to earlier versions.

A Survey of Mamba

cs.LG · 2024-08-02 · unverdicted · novelty 2.0

The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.

CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models

cs.CV · 2026-05-21

citing papers explorer

Showing 8 of 8 citing papers.

AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models cs.RO · 2026-05-08 · unverdicted · none · ref 33 · 2 links
AT-VLA proposes adaptive tactile injection and a dual-stream tactile reaction mechanism to enhance VLA models for contact-rich robotic manipulation with real-time responses.
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment cs.RO · 2026-04-27 · unverdicted · none · ref 13
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness cs.RO · 2026-03-18 · unverdicted · none · ref 16
HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model cs.CV · 2025-03-13 · unverdicted · none · ref 45
HybridVLA unifies diffusion and autoregression in a single VLA model via collaborative training and ensemble to raise robot manipulation success rates by 14% in simulation and 19% in real-world tasks.
What Matters in Building Vision-Language-Action Models for Generalist Robots cs.RO · 2024-12-18 · unverdicted · none · ref 28
Systematic tests of VLM backbones, policy architectures, and cross-embodiment data yield RoboVLMs that set new SOTA on robot manipulation benchmarks while requiring few manual designs.
The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency cs.AR · 2026-04-09 · unverdicted · none · ref 11
Mamba-3 architectural changes optimized for hyperscale GPUs cause 28% higher edge latency at 880M parameters and 48% at 15M parameters compared to earlier versions.
A Survey of Mamba cs.LG · 2024-08-02 · unverdicted · none · ref 121
The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.
CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models cs.CV · 2026-05-21 · unreviewed · ref 8

arXiv preprint arXiv:2406.04339 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer