ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents

· 2026 · cs.CV · arXiv 2604.10096

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Current embodied intelligent systems still face a substantial gap between high-level reasoning and low-level physical execution in open-world environments. Although Vision-Language-Action (VLA) models provide strong perception and intuitive responses, their open-loop nature limits long-horizon performance. Agents incorporating System 2 cognitive mechanisms improve planning, but usually operate in closed sandboxes with predefined toolkits and limited real-system control. OpenClaw provides a localized runtime with full system privileges, but lacks the embodied control architecture required for long-duration, multi-robot execution. We therefore propose ABot-Claw, an embodied extension of OpenClaw that integrates: 1) a unified embodiment interface with capability-driven scheduling for heterogeneous robot coordination; 2) a visual-centric cross-embodiment multimodal memory for persistent context retention and grounded retrieval; and 3) a critic-based closed-loop feedback mechanism with a generalist reward model for online progress evaluation, local correction, and replanning. With a decoupled architecture spanning the OpenClaw layer, shared service layer, and robot embodiment layer, ABot-Claw enables real-world interaction, closes the loop from natural language intent to physical action, and supports progressively self-evolving robotic agents in open, dynamic environments.

representative citing papers

Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

VLATIM benchmark reveals large VLMs excel at high-level planning in physics puzzles but struggle with precise visual grounding and mouse control, so they lack human-like problem-solving capabilities.

citing papers explorer

Showing 1 of 1 citing paper.

Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games? cs.AI · 2026-05-11 · unverdicted · none · ref 33 · internal anchor
VLATIM benchmark reveals large VLMs excel at high-level planning in physics puzzles but struggle with precise visual grounding and mouse control, so they lack human-like problem-solving capabilities.

ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents

fields

years

verdicts

representative citing papers

citing papers explorer