Beyond sight: Finetuning generalist robot policies with heterogeneous sensors via language grounding.arXiv preprint arXiv:2501.04693

· 2025 · arXiv 2501.04693

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

cs.RO · 2026-03-23 · unverdicted · novelty 7.0

VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.

Heterogeneous Tactile Transformer

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

HTT learns shared representations across heterogeneous tactile sensors using a new paired dataset and pretraining objectives, enabling transfer to unseen sensors and tasks.

FADA: Few-Shot Domain Adaptation via Dynamics Alignment for Humanoid Control

cs.RO · 2026-06-26 · unverdicted · novelty 6.0

FADA is a three-stage Planner-IDM method that achieves few-shot domain adaptation for humanoid control by distilling an oracle policy then finetuning only the IDM on short target-domain rollouts via supervised learning.

ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

cs.RO · 2026-03-26 · unverdicted · novelty 6.0

ThermoAct integrates thermal imaging into VLA models via a VLM planner to enable robots to perceive physical properties like heat and improve safety over vision-only systems.

FAST: Efficient Action Tokenization for Vision-Language-Action Models

cs.RO · 2025-01-16 · unverdicted · novelty 6.0

FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.

Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference

cs.LG · 2026-04-10 · unverdicted · novelty 5.0

SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation

cs.RO · 2025-12-29

citing papers explorer

Showing 8 of 8 citing papers.

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models cs.RO · 2026-03-23 · unverdicted · none · ref 13
VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.
Heterogeneous Tactile Transformer cs.RO · 2026-06-29 · unverdicted · none · ref 35
HTT learns shared representations across heterogeneous tactile sensors using a new paired dataset and pretraining objectives, enabling transfer to unseen sensors and tasks.
FADA: Few-Shot Domain Adaptation via Dynamics Alignment for Humanoid Control cs.RO · 2026-06-26 · unverdicted · none · ref 9
FADA is a three-stage Planner-IDM method that achieves few-shot domain adaptation for humanoid control by distilling an oracle policy then finetuning only the IDM on short target-domain rollouts via supervised learning.
ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making cs.RO · 2026-03-26 · unverdicted · none · ref 7
ThermoAct integrates thermal imaging into VLA models via a VLM planner to enable robots to perceive physical properties like heat and improve safety over vision-only systems.
FAST: Efficient Action Tokenization for Vision-Language-Action Models cs.RO · 2025-01-16 · unverdicted · none · ref 36
FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.
Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference cs.LG · 2026-04-10 · unverdicted · none · ref 12
SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 141
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation cs.RO · 2025-12-29 · unreviewed · ref 15

Beyond sight: Finetuning generalist robot policies with heterogeneous sensors via language grounding.arXiv preprint arXiv:2501.04693

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer