pith. sign in

arxiv: 2502.01822 · v7 · pith:RBNWYVTOnew · submitted 2025-02-03 · 💻 cs.CR · cs.CY

Firewalls to Secure Dynamic LLM Agentic Networks

classification 💻 cs.CR cs.CY
keywords taskcontextexternalfirewallsontoagentapplyingappropriate
0
0 comments X
read the original abstract

The emergence of agent-to-agent communication protocols mirrors the early internet: powerful connectivity with minimal security infrastructure. When AI agents communicate on behalf of users, every message crosses a trust boundary where the user's personal data and the external agent's unconstrained language each present distinct risks. We address both through a dual-firewall architecture grounded in a unifying principle: each task defines a context, and both sides of the communication carry information far exceeding what that context requires. Our firewalls act as projections onto the task context, allowing only contextually appropriate content to cross each boundary. The Language Converter Firewall projects incoming messages onto a closed, domain-specific, structured protocol; an external agent's message is converted to validated fields while persuasive framing, urgency tactics, and embedded instructions are structurally eliminated through deterministic verification. This replaces the asymmetric challenge of resisting every possible manipulation with the structural guarantee that manipulation has no channel through which to arrive. The Data Abstraction Firewall projects outgoing information onto the granularity appropriate for the task, rather than applying binary disclose-or-redact filtering, as previous airgapping solutions did. Both firewalls operate in a trusted environment isolated from external input, applying domain-specific rules learned automatically from demonstrations. Across 864 attacks spanning three domains on the ConVerse benchmark, our architecture reduces privacy attack success rates (e.g., from 84% to 10% for GPT-5) and security attacks (from 60% to 3%), while maintaining or even improving task completion quality.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

    cs.CR 2026-04 unverdicted novelty 7.0

    A parameterized DFA firewall enforces safe tool sequences for structured AI agents, reducing attack success rates to 2.2% in tested workflows with low added latency.

  2. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy replaces privacy-sensitive spans with structured placeholders on edge devices to enable effective cloud memory management while limiting utility loss to 1.6% and outperforming general models on privacy extraction.

  3. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy uses edge detection of sensitive spans and type-aware placeholders to enable cloud-side memory management for LLM agents without exposing private data, achieving under 1.6% utility loss.

  4. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy uses edge-side privacy span detection and semantic placeholders to enable cloud memory management for LLM agents while limiting utility loss to 1.6% and outperforming masking baselines.

  5. Alignment Contracts for Agentic Security Systems

    cs.CR 2026-04 conditional novelty 6.0 full

    Alignment contracts define scope, allowed effects, budgets and disclosure rules as safety properties over finite effect traces, with decidable admissibility, refinement rules, and Lean-verified soundness under an obse...

  6. Security Considerations for Multi-agent Systems

    cs.CR 2026-03 unverdicted novelty 6.0

    No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

  7. It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

    cs.LG 2026-05 unverdicted novelty 4.0

    SELFCI uses complementary self-distillation with two reverse KL divergences to align LLMs to contextual integrity while preserving utility, outperforming RL baselines like GRPO in agentic settings.

  8. Reinforcement Learning for Scalable and Trustworthy Intelligent Systems

    cs.LG 2026-05 unverdicted novelty 3.0

    Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.

  9. Large Language Model Agent: A Survey on Methodology, Applications and Challenges

    cs.CL 2025-03 accept novelty 3.0

    A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.