OR-Space is a benchmark for LLM agents performing full-lifecycle optimization tasks across Build, Revise, and Explain modes in executable multi-artifact workspaces.
hub
A Survey of AI Agent Protocols
22 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.
CBCL is a homoiconic agent communication language constrained to DCFL with three Lean 4 machine-checked invariants that prevent unbounded expansion, enforce resource limits, and preserve core vocabulary.
Presents a component-centric PoC dataset of malicious MCP servers and a two-stage behavioral deviation detector Connor achieving 94.6% F1-score.
ToolPRM provides fine-grained intra-call process supervision via a new dataset and reward model, outperforming outcome and coarse-grained alternatives on function-calling benchmarks.
GRAIL achieves over 79 times lower latency than LLM-parsing baselines and higher Recall@10 than vector search by combining SLM-enhanced prediction, pseudo-document expansion, and MaxSim resonance on the new AgentTaxo-9K dataset of 9,240 agents.
CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
The 2025 AI Agent Index catalogs technical and safety details for 30 deployed AI agents and finds low developer transparency on safety, evaluations, and societal impacts.
Holos is a five-layer LLM-based multi-agent system architecture using the Nuwa engine for agent generation, a market-driven Orchestrator for coordination, and an endogenous value cycle for incentive-compatible persistence in the Agentic Web.
Creates a five-dimension taxonomy (counterparty, payload, interaction state, discovery mechanism, schema flexibility) from nine protocols and identifies architectural patterns plus convergence trends.
Interviews with 20 practitioners show MCP supports cross-system collaboration and task decoupling in LLM workflows but is limited by ecosystem fragmentation, coordination issues, and state management problems.
Pramana defines a typed ClaimAttestation protocol with four variants and verify operations, specifies its lifecycle in TLA+, model-checks it with TLC, and provides a tested Python implementation for auditable agent claims.
Tokenizer fertility varies 1.6x across models on Ukrainian legal text, Qwen uses 60% more tokens than Llama-family models, zero-shot outperforms few-shot by up to 26 points, and pre-war classifiers lose 27.9 points on invasion-era decisions.
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.
Proposes a DLT-anchored architecture extending the A2A protocol with on-chain AgentCards and x402 micropayments to enable multi-agent economies.
Current agentic RL systems lack three key components needed for self-evolving agents at scale, requiring new co-designed architectures such as AReaL2.0 to enable policy updates from deployed workloads.
AONA is a proposed four-layer overlay network architecture with dedicated node types and workflows for global, decentralized collaboration among AI agents.
The paper introduces the Foundation Protocol as a unifying coordination layer for heterogeneous agents, humans, and organizations that adds native support for multi-party collaboration, economic primitives, and first-class policy and audit.
Describes an LLM-and-MCP-integrated collaborative teaching model intended to improve software engineering students' practical skills and industry readiness.
citing papers explorer
-
OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents
OR-Space is a benchmark for LLM agents performing full-lifecycle optimization tasks across Build, Revise, and Explain modes in executable multi-artifact workspaces.
-
Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents
SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.
-
CBCL: Safe Self-Extending Agent Communication
CBCL is a homoiconic agent communication language constrained to DCFL with three Lean 4 machine-checked invariants that prevent unbounded expansion, enforce resource limits, and preserve core vocabulary.
-
From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers
Presents a component-centric PoC dataset of malicious MCP servers and a two-stage behavioral deviation detector Connor achieving 94.6% F1-score.
-
ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling
ToolPRM provides fine-grained intra-call process supervision via a new dataset and reward model, outperforming outcome and coarse-grained alternatives on function-calling benchmarks.
-
GRAIL: A Deep-Granularity Hybrid Resonance Framework for Real-Time Agent Discovery via SLM-Enhanced Indexing
GRAIL achieves over 79 times lower latency than LLM-parsing baselines and higher Recall@10 than vector search by combining SLM-enhanced prediction, pseudo-document expansion, and MaxSim resonance on the new AgentTaxo-9K dataset of 9,240 agents.
-
CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness
CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.
-
Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
-
The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
The 2025 AI Agent Index catalogs technical and safety details for 30 deployed AI agents and finds low developer transparency on safety, evaluations, and societal impacts.
-
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web
Holos is a five-layer LLM-based multi-agent system architecture using the Nuwa engine for agent generation, a market-driven Orchestrator for coordination, and an endogenous value cycle for incentive-compatible persistence in the Agentic Web.
-
A Technical Taxonomy of LLM Agent Communication Protocols
Creates a five-dimension taxonomy (counterparty, payload, interaction state, discovery mechanism, schema flexibility) from nine protocols and identifies architectural patterns plus convergence trends.
-
Understanding How Enterprises Adopt the Model Context Protocol for LLM-Driven Software Engineering
Interviews with 20 practitioners show MCP supports cross-system collaboration and task decoupling in LLM workflows but is limited by ecosystem fragmentation, coordination issues, and state management problems.
-
Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks
Pramana defines a typed ClaimAttestation protocol with four variants and verify operations, specifies its lifecycle in TLA+, model-checks it with TLC, and provides a tested Python implementation for auditable agent claims.
-
Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study
Tokenizer fertility varies 1.6x across models on Ukrainian legal text, Qwen uses 60% more tokens than Llama-family models, zero-shot outperforms few-shot by up to 26 points, and pre-war classifiers lose 27.9 points on invasion-era decisions.
-
EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.
-
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
-
Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP
The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.
-
Towards Multi-Agent Economies: Enhancing the A2A Protocol with Ledger-Anchored Identities and x402 Micropayments for AI Agents
Proposes a DLT-anchored architecture extending the A2A protocol with on-chain AgentCards and x402 micropayments to enable multi-agent economies.
-
Next-Generation Agentic Reinforcement Learning Systems Enable Self-Evolving Agents
Current agentic RL systems lack three key components needed for self-evolving agents at scale, requiring new co-designed architectures such as AReaL2.0 to enable policy updates from deployed workloads.
-
AONA: A Comprehensive Architecture and Workflow Design for Global Agentic Collaboration
AONA is a proposed four-layer overlay network architecture with dedicated node types and workflows for global, decentralized collaboration among AI agents.
-
Foundation Protocol: A Coordination Layer for Agentic Society
The paper introduces the Foundation Protocol as a unifying coordination layer for heterogeneous agents, humans, and organizations that adds native support for multi-party collaboration, economic primitives, and first-class policy and audit.
-
Teaching Software Engineering with LLM and MCP Integration: From Classroom to Industry Practice
Describes an LLM-and-MCP-integrated collaborative teaching model intended to improve software engineering students' practical skills and industry readiness.