An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications. To overcome these limitations, LLM-based human-agent systems (LLM-HAS) incorporate human-provided information, feedback, or control into the agent system to enhance system performance, reliability, and safety. These human-agent collaboration systems enable humans and LLM-based agents to collaborate effectively by leveraging their complementary strengths. This paper provides the first comprehensive and structured survey of LLM-HAS. It clarifies fundamental concepts, systematically presents core components shaping these systems, including environment and profiling, human feedback, interaction types, orchestration, and communication, explores emerging applications, and discusses unique challenges and opportunities arising from human-AI collaboration. By consolidating current knowledge and offering a structured overview, we aim to foster further research and innovation in this rapidly evolving interdisciplinary field. Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-Human-Agent-Collaboration-Interaction-Systems.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4representative citing papers
PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
citing papers explorer
-
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
-
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models
PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.
-
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
-
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.