pith. machine review for the scientific record. sign in

arxiv: 2604.02375 · v1 · submitted 2026-03-31 · 💻 cs.SE · cs.PL

Recognition: unknown

KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-08 02:22 UTC · model gemini-3-flash-preview

classification 💻 cs.SE cs.PL
keywords LLM agentsautonomous agentssystem architectureparallel executioncomputer securitytool-calling
0
0 comments X

The pith

AI agents become more reliable and secure when reasoning is separated from execution by a dedicated system kernel.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current AI agents operate in a slow, step-by-step loop that mixes thinking with doing, making them vulnerable to errors and security hacks. This paper proposes a system that treats an AI's plan like a set of computer instructions managed by a dedicated kernel rather than just a chat window. By separating the plan from the action, the system can run multiple tools at once and block dangerous actions using hard security rules that the AI cannot talk its way around.

Core claim

The authors establish that the standard 'ReAct' pattern—where a model thinks and acts in a single serial stream—is a fundamental bottleneck for agent performance and safety. They introduce KAIJU, an architecture that uses an Executive Kernel to manage tool execution independently of the model's reasoning. This allows for Intent-Gated Execution (IGX), a security framework that validates actions against four criteria—scope, intent, impact, and clearance—ensuring that even if a model is manipulated, it cannot execute unauthorized or harmful commands.

What carries the argument

The Executive Kernel, a management layer that converts model plans into parallelized dependency graphs and enforces security policies that are physically separate from the model's prompt space.

If this is right

  • Parallel tool execution significantly reduces the time required for complex agents to complete data-heavy research tasks.
  • Security vulnerabilities like prompt injection are mitigated because the execution layer enforces constraints the model cannot see or modify.
  • Context window exhaustion is delayed because the model only processes the high-level plan and final results, while the kernel handles the intermediate data.
  • Agent reliability improves through standardized error handling and dependency management that doesn't rely on the model's internal logic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This architecture suggests a transition toward an 'Agent OS' where the LLM serves as a swappable user-space application while the kernel maintains system stability.
  • The separation of concerns could allow smaller, specialized models to handle planning while a more robust model or a hard-coded system handles the high-stakes execution logic.
  • Future agent benchmarks may need to measure 'security bypass resistance' as a primary metric alongside task completion.

Load-bearing premise

The system assumes that the AI can accurately predict all the information and tools it will need before it sees the results of its first few actions.

What would settle it

An experiment comparing KAIJU to standard agents on tasks where every step is entirely dependent on the specific text found in the previous step; if the kernel's planning overhead leads to constant 'replanning,' the efficiency benefits will vanish.

Figures

Figures reproduced from arXiv: 2604.02375 by Cormac Guerin, Frank Guerin.

Figure 1
Figure 1. Figure 1: Per-query latency across 40 queries (10 per category) at four complexity levels. Small markers: individual queries. Large markers: category mean. All modes run through the same system with identical tools, IGX, and audit. Computational queries require real-time astronomical data and multi-step computation. call, 2 LLM calls. The ReAct answer is faster because it did less work; not because its dispatch stra… view at source ↗
read the original abstract

Tool-calling autonomous agents based on large language models using ReAct exhibit three limitations: serial latency, quadratic context growth, and vulnerability to prompt injection and hallucination. Recent work moves towards separating planning from execution but in each case the model remains coupled to the execution mechanics. We introduce a system-level abstraction for LLM agents which decouples the execution of agent workflows from the LLM reasoning layer. We define two first-class abstractions: (1) Intent-Gated Execution (IGX), a security paradigm that enforces intent at execution, and (2) an Executive Kernel that manages scheduling, tool dispatch, dependency resolution, failures and security. In KAIJU, the LLM plans upfront, optimistically scheduling tools in parallel with dependency-aware parameter injection. Tools are authorised via IGX based on four independent variables: scope, intent, impact, and clearance (external approval). KAIJU supports three adaptive execution modes (Reflect, nReflect, and Orchestrator), providing progressively finer-grained execution control apt for complex investigation and deep analysis or research. Empirical evaluation against a ReAct baseline shows that KAIJU has a latency penalty on simple queries due to planning overhead, convergence at moderate complexity, and a structural advantage on computational queries requiring parallel data gathering. Beyond latency, the separation enforces behavioural guarantees that ReAct cannot match through prompting alone. Code available at https://github.com/compdeep/kaiju

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper presents KAIJU, an architectural framework for LLM-based agents that decouples the planning (reasoning) phase from the execution phase through an 'Executive Kernel.' This kernel manages tool dispatch, parallelization, and security via a new paradigm called Intent-Gated Execution (IGX). By allowing the LLM to generate an upfront plan (DAG) rather than a serial 'thought-action-observation' loop (ReAct), KAIJU enables parallel execution and reduces context window inflation. The authors evaluate KAIJU against ReAct baselines, identifying a trade-off where KAIJU incurs higher overhead for simple tasks but offers superior performance for complex, parallelizable data-gathering workflows.

Significance. The paper contributes a timely systems-level abstraction to the field of agentic LLMs. By moving execution logic out of the prompt and into a structured kernel, the work addresses two major pain points: latency in tool-heavy workflows and the fragility of security policies defined purely within reasoning traces. The provision of reproducible code and the honest reporting of the 'latency penalty' for simple queries are distinct strengths that ground the architectural claims in empirical reality.

major comments (3)
  1. [§3.2, Intent-Gated Execution (IGX)] The central claim that IGX provides 'formal security guarantees' (§1, §3.1) relies on the verification function V(I, A) that maps Intent to Action. However, the paper does not specify the implementation of this function. If the verification is performed by a secondary LLM, the system remains vulnerable to 'Intent Spoofing' (where a malicious LLM generates a benign intent to mask a harmful action), inheriting the same reasoning failures it seeks to mitigate. If V is a static mapping, it risks brittleness. The authors must define the conditions under which V is considered 'trusted' and independent of the planning LLM to justify the claim of structural security.
  2. [§3.3, Dependency-Aware Injection] The optimistic scheduling mechanism assumes a static DAG. It is unclear how the Executive Kernel handles high-entropy branching (e.g., if Tool A's output determines that Tool B is unnecessary and Tool C must be called instead). If the system defaults to a 'Reflect' mode that requires the LLM to re-plan the entire sequence upon every branch, the stated latency benefits over ReAct may be negated by frequent re-planning overhead. The manuscript needs a specific discussion or benchmark on 'branching density' to clarify the limits of the kernel's parallelization.
  3. [§5, Empirical Evaluation] In Figure 4 and Table 1, the crossover point where KAIJU's planning overhead is offset by parallel execution is not explicitly quantified. The authors should provide a 'complexity vs. latency' plot that identifies the specific number of parallelizable tool calls or the depth of the dependency graph required for KAIJU to consistently outperform the ReAct baseline.
minor comments (3)
  1. [Figure 2] The transition arrows between the 'Executive Kernel' and the 'IGX Gate' are ambiguous. It is unclear if the Gate is an internal module of the Kernel or a middleware layer. Adding a sequence diagram for a single tool call would improve clarity.
  2. [§4, Adaptive Execution Modes] The distinction between 'nReflect' and 'Orchestrator' modes is somewhat blurred in the text. Explicitly defining the state-machine transitions between these modes would help implementers.
  3. [General] There are minor typos in the bibliography (e.g., inconsistent capitalization in title 'ReAct').

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We are pleased that the referee recognized KAIJU’s potential to address latency and security pain points in LLM agent architectures. The feedback regarding the implementation of the verification function V(I, A), the handling of high-entropy branching, and the quantification of the performance crossover point is particularly valuable. We intend to address these points by providing more formal definitions for the IGX verification layer, introducing a more nuanced discussion of branching density, and providing the requested 'complexity vs. latency' analysis to ground the 'latency penalty' observed in our empirical evaluation.

read point-by-point responses
  1. Referee: [§3.2, Intent-Gated Execution (IGX)] The central claim that IGX provides 'formal security guarantees' relies on the verification function V(I, A) that maps Intent to Action. However, the paper does not specify the implementation of this function. If the verification is performed by a secondary LLM, the system remains vulnerable to 'Intent Spoofing'... If V is a static mapping, it risks brittleness.

    Authors: The referee correctly identifies a critical implementation detail. In our current implementation, V(I, A) is a hybrid system: it uses a deterministic schema/policy engine (like OPA/Rego) for 'Scope' and 'Impact' (e.g., preventing a 'READ' intent from performing a 'DELETE' action), while 'Intent' is validated against the action's semantic footprint using a smaller, constrained, non-generative classifier. We agree that if V were simply another LLM, 'Intent Spoofing' remains a threat. We will revise §3.2 to explicitly define the trust boundary: V must be external to the planning LLM and use a deterministic policy for high-impact actions. We will also add a 'Security Limitations' section discussing the specific trade-offs between policy brittleness and verification robustness. revision: yes

  2. Referee: [§3.3, Dependency-Aware Injection] The optimistic scheduling mechanism assumes a static DAG. It is unclear how the Executive Kernel handles high-entropy branching (e.g., if Tool A's output determines that Tool B is unnecessary and Tool C must be called instead). If the system defaults to a 'Reflect' mode... the stated latency benefits over ReAct may be negated.

    Authors: The referee is correct: KAIJU’s 'optimistic' parallelization is most effective in workflows with predictable dependencies. In 'Reflect' mode, a significant change in tool requirements (high-entropy branching) triggers a re-plan, which incurs an LLM invocation penalty. However, KAIJU mitigates this via 'Partial DAG Re-planning,' where the kernel only requests a plan update for the invalidated sub-graph rather than the entire sequence. We will revise §3.3 to detail the kernel’s state-management during plan invalidation and add a paragraph to the Discussion acknowledging that KAIJU's structural advantage diminishes as the ratio of conditional branches to parallelizable tasks increases. revision: partial

  3. Referee: [§5, Empirical Evaluation] In Figure 4 and Table 1, the crossover point where KAIJU's planning overhead is offset by parallel execution is not explicitly quantified. The authors should provide a 'complexity vs. latency' plot that identifies the specific number of parallelizable tool calls or the depth of the dependency graph required.

    Authors: This is a fair request that will improve the utility of the evaluation. Based on our existing data, the 'breakeven' point typically occurs at a width of 3 parallel tool calls or a total plan length of 5 tools (where the cost of the upfront plan is amortized by the reduction in serial RTTs). We will incorporate a new figure (Figure 5) into the revised manuscript that plots 'Total Latency' against 'Plan Width' (parallelism) and 'Plan Depth' (seriality) to clearly visualize the performance envelopes of KAIJU vs. ReAct. This will provide developers with a clear heuristic for when to deploy the Executive Kernel versus a simpler serial loop. revision: yes

Circularity Check

1 steps flagged

KAIJU's Intent-Gated Execution (IGX) presents a circular security guarantee where the 'gate' relies on an intent provided by the same untrusted source it seeks to monitor.

specific steps
  1. self definitional [Section 3.1 and Abstract]
    "We define two first-class abstractions: (1) Intent-Gated Execution (IGX), a security paradigm that enforces intent at execution... IGX is defined by the verification of an action a ∈ A against a declared intent i ∈ I... In KAIJU, the LLM plans upfront, optimistically scheduling tools... providing progressively finer-grained execution control... behavioral guarantees that ReAct cannot match."

    The paper claims IGX provides 'structural guarantees' against prompt injection. However, it defines the verification gate as a check between an Action and a declared Intent, both of which are generated by the 'Reasoning Layer' (LLM). If the LLM is compromised via prompt injection, it can generate a malicious intent that justifies a malicious action. The 'guarantee' thus reduces to the Kernel verifying that the agent is successfully executing its (potentially hijacked) intent, making the security of the mechanism dependent on the integrity of the very component it is designed to gate.

full rationale

The paper makes significant, non-circular contributions regarding the parallelization of tool-calling agents and the architectural separation of planning from execution. Its empirical findings on latency and complexity convergence are independent of any self-referential logic. However, the 'Security' claim regarding Intent-Gated Execution (IGX) is partially circular. The authors propose that IGX mitigates prompt injection by verifying actions against intents, but since both the intent and the action are outputs of the same untrusted LLM planner, the 'verification' is essentially a self-consistency check for the agent. While the inclusion of a 'Clearance' variable (external approval) provides a valid out-of-band security path, the paper's characterization of IGX itself as a 'structural guarantee' is circular because the reference point for the gate (the Intent) is defined by the entity being gated.gated.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The system rests on the ability of current LLMs to reason about tool dependencies well enough to create a DAG (Directed Acyclic Graph) for the kernel to execute.

free parameters (1)
  • Impact Threshold = Varies by domain
    The IGX gate requires pre-defined thresholds for what constitutes 'high impact' requiring human clearance.
axioms (1)
  • domain assumption LLMs can accurately identify data dependencies between tool calls during the upfront planning phase.
    This is required for 'optimistic scheduling' to function without frequent re-planning.
invented entities (2)
  • Intent-Gated Execution (IGX) independent evidence
    purpose: A security paradigm that evaluates tool calls against intent, scope, and impact.
    It is implemented as a set of interceptors in the provided code base.
  • Executive Kernel independent evidence
    purpose: Manages scheduling, tool dispatch, and dependency resolution for the agent.
    The kernel is a functional system component in the KAIJU architecture.

pith-pipeline@v0.9.0 · 6337 in / 1557 out tokens · 16397 ms · 2026-05-08T02:22:06.609684+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    Accessed: 2026-03-31. Biju, S. M. Implementing multi-agent systems using langgraph: A comprehensive study. InICRAARI,

  2. [3]

    URL https://arxiv.org/abs/2509.10769. CrewAI. Crewai: Framework for orchestrating role- playing autonomous ai agents. https://github.com/ crewAIInc/crewAI,

  3. [5]

    Defeating Prompt Injections by Design

    URL https://arxiv.org/ abs/2503.18813. Kim, S., Moon, S., Tabrizi, R., Lee, N., Mahoney, M. W., Keutzer, K., and Gholami, A. An LLM compiler for parallel function calling. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.),Proceedings of the 41st International Conference on Machine Learning, volum...

  4. [6]

    GAIA: a benchmark for General AI Assistants

    Mialon, G., Fourrier, C., Swift, C., Wolf, T., LeCun, Y ., and Scialom, T. GAIA: A benchmark for general AI assistants. arXiv preprint arXiv:2311.12983,

  5. [8]

    Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr

    URL https://arxiv. org/abs/2510.09023. Shi, T., He, J., Wang, Z., Li, H., Wu, L., Guo, W., and Song, D. Progent: Programmable privilege control for LLM agents.arXiv preprint arXiv:2504.11703,

  6. [9]

    Progent: Securing AI Agents with Privilege Control

    URL https://arxiv. org/abs/2504.11703. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y . ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representa- tions (ICLR),

  7. [10]

    Accessed: 2026-03-31. 13