{"total":14,"items":[{"citing_arxiv_id":"2606.29941","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Seeing Touch from Motion: A Unified Modality-Aware Visuo-Tactile Policy with Tactile Motion Correlation","primary_cat":"cs.RO","submitted_at":"2026-06-29T08:20:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A visuo-tactile policy learning method that exploits tactile motion correlation for contact state distinction and Mixture-of-Transformers for cross-modal fusion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.29384","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Event-VLA: Action-Conditioned Event Fusion for Robust Vision-Language-Action Model","primary_cat":"cs.CV","submitted_at":"2026-06-28T13:19:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Event-VLA integrates event streams into VLA models through action-conditioned gated cross-attention to maintain performance in normal light while improving success rates under low-light and near-dark conditions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.29089","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TAP-VLA: Tactile Annotation Prompting for Vision Language Action Models","primary_cat":"cs.RO","submitted_at":"2026-06-27T21:06:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TAP-VLA improves VLA performance in contact-rich manipulation by visually annotating tactile shear fields onto input images, reaching 78% success versus under 50% for vision-only and other tactile methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27886","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tabero: Learning Gentle Manipulation with Closed-Loop Force Feedback from Vision, Touch, and Language","primary_cat":"cs.RO","submitted_at":"2026-05-27T03:08:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Tabero supplies a data pipeline that turns existing robot trajectories into vision-tactile-language tasks and a VTLA model that keeps task success high while cutting average grip force by over 70 percent under gentle instructions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25216","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"InvariantCloud: A Globally Invariant, Uniquely Indexed Point Cloud Framework for Robust 6-DoF Tactile Pose Tracking","primary_cat":"cs.RO","submitted_at":"2026-05-24T18:46:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"InvariantCloud registers marker-based point clouds in one shot via global invariance to deliver drift-free 6-DoF tactile pose tracking with improved yaw accuracy over prior methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17336","ref_index":55,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms","primary_cat":"cs.RO","submitted_at":"2026-05-17T09:09:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey proposing a hierarchical taxonomy for multimodal tactile fusion datasets and methods across perception, generation, and interaction in embodied intelligence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13925","ref_index":116,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Robotic Dexterous Hand Intelligence: A Survey","primary_cat":"cs.RO","submitted_at":"2026-05-13T15:23:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A structured survey of dexterous robotic hand research that reviews hardware, control methods, data resources, and benchmarks while identifying major limitations and future directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"RoboDexVLM [113] and Villa-X [114] decompose long-horizon instructions into structured subgoals, marking a shift from end-to-end language policies toward hierarchical architectures that interface more naturally with low-level controllers. DexVLA [115] introduces plug-in diffusion experts to handle multimodal coordination during reorientation, while OmniVLA [116] unifies visual, linguistic, and haptic inputs within a single Transformer ar- chitecture, improving feedback stability during manipulation. Other.Several additional works, although not belonging to the categories discussed above, also introduce highly constructive approaches for in-hand manipulation and have significantly advanced the field. MyoDex [117] introduces a"},{"citing_arxiv_id":"2605.11048","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching","primary_cat":"cs.RO","submitted_at":"2026-05-11T13:27:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ForceFlow improves success rates by 37% on six real-world contact-rich tasks over ForceVLA by treating force as a global regulatory signal in a flow-matching policy with hierarchical vision-to-force decomposition.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07308","ref_index":14,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2026-05-08T06:17:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AT-VLA proposes adaptive tactile injection and a dual-stream tactile reaction mechanism to enhance VLA models for contact-rich robotic manipulation with real-time responses.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"include tactile information, researchers [ 4, 21, 43] of- ten address this limitation by incorporating these modal- ities during downstream tasks finetuning. Their primary goal is to enable pretrained models to interpret these new types of sensory input. This is typically achieved by us- ing multimodal alignment strategies through representation learning [14, 44, 46, 47], or leveraging chain-of-thought (CoT) [25] reasoning to understand these signals. However, tactile feedback provides fundamentally different types of information compared to the visual or linguistic data used in the pretrained models. While these approaches enhance the model's ability to interpret tactile feedback, the direct introduction of these new modalities may disrupt the existing"},{"citing_arxiv_id":"2605.03269","ref_index":26,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RLDX-1 Technical Report","primary_cat":"cs.RO","submitted_at":"2026-05-05T01:40:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"Training strategies for efficient embodied reasoning. InConference on Robot Learning, 2025b. [25] Xiaoyu Chen, Hangxing Wei, Pushi Zhang, Chuheng Zhang, Kaixin Wang, Yanjiang Guo, Rushuai Yang, Yucen Wang, Xinquan Xiao, Li Zhao, et al. Villa-x: enhancing latent action modeling in vision-language-action models.arXiv preprint arXiv:2507.23682, 2025c. [26] Zhengxue Cheng, Yiqian Zhang, Wenkang Zhang, Haoyu Li, Keyu Wang, Li Song, and Hengdi Zhang. Omnivtla: Vision-tactile-language-action model with semantic-aligned tactile sensing.arXiv preprint arXiv:2508.08706, 2025. [27] Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion."},{"citing_arxiv_id":"2604.04834","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes","primary_cat":"cs.CV","submitted_at":"2026-04-06T16:35:57+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Zhaiet al. OpenVLA [28], RDT-1B [37], and Gemini Robotics [59], demonstrate that scal- ing multimodal transformers with large robot datasets enables generalizable language-conditioned manipulation across diverse tasks. Recent efforts further explore enhanced spatial reasoning via 3D representations [16,31,51,56], tac- tile integration for contact modeling [3,13,24], reinforcement learning refine- ment [12,23,38], and efficient deployment strategies [50,62,64]. Despite architec- tural diversity, existing VLA systems predominantly rely on frame-based RGB cameras as their primary perceptual interface. Frame-based imaging integrates photons over an exposure interval, which inherently limits signal-to-noise ra- tio in low illumination and induces motion blur during fast movements."},{"citing_arxiv_id":"2603.04038","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control","primary_cat":"cs.RO","submitted_at":"2026-03-04T13:18:05+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TER-DAgger improves robotic precision insertion success rates by over 37% via residual policies from edited trajectories and force-aware intervention triggers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.20239","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance","primary_cat":"cs.RO","submitted_at":"2026-01-28T04:22:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.23864","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation","primary_cat":"cs.RO","submitted_at":"2025-12-29T21:06:33+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}