Recognition: unknown
AgentClick: A Skill-Based Human-in-the-Loop Review Layer for Terminal AI Agents
Pith reviewed 2026-05-10 12:14 UTC · model grok-4.3
The pith
AgentClick adds a browser interface that lets users review and guide terminal AI agents through structured views instead of raw text.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AgentClick is implemented as a localhost npm server paired with a skill-based plugin that connects the running agent to a browser interface, allowing users to supervise and collaborate with agents through a structured web UI rather than raw terminal text alone. The system supports human-in-the-loop workflows including email drafting and revision, plan review and modification, memory management, trajectory inspection and visualization, and error localization during agent execution. It also turns code generation and execution into a reviewable process and supports persistent preference capture through editable memory plus remote access over HTTP.
What carries the argument
The skill-based plugin that routes agent state and actions through a localhost npm server into an editable browser UI for supervision and intervention.
If this is right
- Users can inspect and edit agent plans before any action executes.
- Memory and preferences become persistently editable objects that survive across sessions.
- Code generation steps become reviewable events rather than immediate executions.
- Agents running on remote servers can be supervised from personal devices over HTTP.
- Trajectory data can be visualized and localized for quicker error identification.
Where Pith is reading between the lines
- The same plugin pattern could be reused to attach review layers to other command-line tools beyond AI agents.
- Persistent memory editing might reduce repeated corrections on similar tasks over time.
- Remote HTTP access opens the possibility of collaborative review where multiple users supervise one agent.
Load-bearing premise
That a structured web UI connected via npm server and skill-based plugin will be meaningfully more efficient and less cumbersome than raw terminal text for non-expert users.
What would settle it
A direct comparison measuring time to complete a fixed set of agent tasks, number of successful interventions, and user-reported effort when the same non-expert participants use either the AgentClick web interface or a standard terminal session.
Figures
read the original abstract
Recent autonomous AI agents such as Codex, and Claude Code have made it increasingly practical for users to delegate complex tasks, including writing emails, executing code, issuing shell commands, and carrying out multi-step plans. However, despite these capabilities, human-agent interaction still largely happens through terminal interfaces or remote text-based channels such as Discord. These interaction modes are often inefficient and unfriendly: long text outputs are difficult to read and review, proposed actions lack clear structure and visual context, and users must express feedback by typing detailed corrections, which is cumbersome and often discourages effective collaboration. As a result, non-expert users in particular face a high barrier to working productively with agents. To address this gap, we present AgentClick, an interactive review layer for terminal-based agents. AgentClick is implemented as a localhost npm server paired with a skill-based plugin that connects the running agent to a browser interface, allowing users to supervise and collaborate with agents through a structured web UI rather than raw terminal text alone. The system supports a range of human-in-the-loop workflows, including email drafting and revision, plan review and modification, memory management, trajectory inspection and visualization, and error localization during agent execution. It also turns code generation and execution into a reviewable process, enabling users to inspect and intervene before consequential actions are taken. In addition, AgentClick supports persistent preference capture through editable memory and remote access over HTTP, allowing users to review agents running on servers from their personal devices. Our goal is to lower the barrier for non-expert users and improve the efficiency and quality of human-agent co-work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents AgentClick, a system consisting of a localhost npm server and skill-based plugin that connects terminal-based AI agents to a browser-based UI. It enables human-in-the-loop workflows including plan review and modification, memory editing, trajectory visualization, error localization, and pre-execution inspection of actions such as code execution or email drafting. The stated objective is to reduce barriers for non-expert users and enhance efficiency and quality of collaboration relative to raw terminal text interfaces, with support for persistent preferences and remote HTTP access.
Significance. If the described architecture functions as intended and the usability assumptions hold, the work could offer a practical contribution to HCI research on agent oversight by providing structured visual interfaces for complex agent behaviors. The design choices around skill-based plugins and editable memory represent a concrete implementation that could be extended, but the lack of any empirical validation means the significance is currently prospective rather than established.
major comments (2)
- [Abstract] Abstract: The claims that AgentClick will 'lower the barrier for non-expert users and improve the efficiency and quality of human-agent co-work' are presented as achieved outcomes of the system, yet the manuscript contains no user studies, task metrics, error rates, NASA-TLX scores, or comparative evaluations against terminal interfaces to substantiate these improvements.
- [Abstract] The central design hypothesis—that the structured web UI (plan review, memory management, trajectory inspection) meaningfully outperforms raw terminal text for non-experts—is load-bearing for the contribution but remains untested; without evidence, the effectiveness claims cannot be evaluated and rest solely on the architectural description.
minor comments (1)
- The manuscript would benefit from explicit discussion of related work in human-in-the-loop AI systems and web-based agent interfaces to better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript describing AgentClick. We agree that the abstract phrasing risks overstating the system's benefits as demonstrated outcomes rather than design goals. We will revise the manuscript accordingly to scope claims to the presented architecture and implementation while clarifying the prospective nature of usability improvements.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claims that AgentClick will 'lower the barrier for non-expert users and improve the efficiency and quality of human-agent co-work' are presented as achieved outcomes of the system, yet the manuscript contains no user studies, task metrics, error rates, NASA-TLX scores, or comparative evaluations against terminal interfaces to substantiate these improvements.
Authors: We agree that the abstract language presents these benefits in a manner that could be read as asserting achieved results. The manuscript is a system and architecture paper focused on the implementation of a localhost npm server with skill-based plugins for structured human review. The referenced statements were meant to describe the intended purpose and rationale behind the design choices (e.g., structured plan review and editable memory), not as validated performance claims. We will revise the abstract to explicitly state these as design objectives and goals, supported by the problem analysis in the introduction, and remove any implication of empirical substantiation. revision: yes
-
Referee: [Abstract] The central design hypothesis—that the structured web UI (plan review, memory management, trajectory inspection) meaningfully outperforms raw terminal text for non-experts—is load-bearing for the contribution but remains untested; without evidence, the effectiveness claims cannot be evaluated and rest solely on the architectural description.
Authors: We concur that the comparative effectiveness of the web UI versus terminal interfaces constitutes an untested hypothesis at this stage. The contribution centers on a concrete, extensible implementation enabling human-in-the-loop workflows such as pre-execution inspection and persistent preference capture. In the revision, we will rephrase the abstract and introduction to present the UI features as a proposed approach to the identified interaction barriers, highlighting the technical novelty (skill-based plugins, remote HTTP access, trajectory visualization) without asserting unproven superiority. We will also expand the limitations and future work section to note the need for controlled user studies to evaluate the hypothesis. revision: yes
Circularity Check
No circularity; purely descriptive system architecture with no derivations or fitted predictions
full rationale
The manuscript is a system description of AgentClick (localhost npm server + browser UI plugin) that outlines supported workflows such as plan review and memory editing. No equations, parameters, or predictive claims appear anywhere in the text. The central goal statement (lowering barriers for non-experts) is presented as a design objective rather than a derived result, and no self-citations or uniqueness theorems are invoked to justify any step. Because there is no derivation chain that could reduce to its own inputs, the paper is self-contained against the circularity criteria.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Terminal and text-based interfaces are inefficient and unfriendly for supervising complex agent actions
- domain assumption A structured browser UI will lower the barrier for non-expert users and improve collaboration quality
Reference graph
Works this paper leans on
-
[1]
Anthropic. 2024. Claude Code. https://docs.anthropic.com/en/docs/claude-code. Accessed: 2026-03-14
2024
-
[2]
Will Epperson, Gagan Bansal, Victor Dibia, Adam Fourney, Jack Gerrits, Erkang Zhu, and Saleema Amershi. 2025. Interactive Debugging and Steering of Multi- Agent AI Systems. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems
2025
-
[3]
K. J. Kevin Feng, Kevin Pu, Matt Latzke, Tal August, Pao Siangliulue, Jonathan Bragg, Daniel S. Weld, Amy X. Zhang, and Joseph Chee Chang. 2026. Cocoa: Co-Planning and Co-Execution with AI Agents. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems
2026
-
[4]
Sven Gronauer and Klaus Diepold. 2022. Multi-agent deep reinforcement learn- ing: a survey.Artificial Intelligence Review55, 2 (2022), 895–943
2022
-
[5]
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680(2024)
work page internal anchor Pith review arXiv 2024
- [6]
- [7]
-
[8]
OpenAI. 2025. Codex CLI: A Coding Agent for the Terminal. https://developers. openai.com/codex/cli. Accessed: 2026-03-14
2025
-
[9]
siteboon. 2026. Cloud CLI (aka Claude Code UI): A desktop and mobile UI for Claude Code, Cursor CLI, Codex, and Gemini-CLI. https://github.com/siteboon/ claudecodeui GitHub repository, accessed 2026-03-14
2026
-
[10]
Peter Steinberger. 2025. OpenClaw: Your Own Personal AI Assistant. https: //github.com/openclaw/openclaw. Accessed: 2026-03-14
2025
-
[11]
Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. 2025. OpenHands: An Open Platform for AI Software Developers as Generalist Agents. InInterna- tional Conference on Learning Representations
2025
-
[12]
Hanwen Xing, Haomin Zhuang, Xuandong Zhao, Yue Huang, Zhenheng Tang, and Xiangliang Zhang. 2026. Recipes for Agents: Understanding Skills and Their Open Questions. Preprint, ResearchGate. doi:10.13140/RG.2.2.11421.99045 Haomin Zhuang, Hanwen Xing, and Xiangliang Zhang Appendix Figure 3: The email reply learns from preference then gener- ate reply with Emo...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.