pith. sign in

arxiv: 2606.23277 · v1 · pith:LNYPFHUPnew · submitted 2026-06-22 · 💻 cs.AI

GIF: Locally Sound Geometric Information Flow Control for LLMs

Pith reviewed 2026-06-26 08:22 UTC · model grok-4.3

classification 💻 cs.AI
keywords information flow controlgeometric boundslarge language modelsprompt injectionprivacy leakageJacobianmutual informationLean proof
0
0 comments X

The pith

GIF uses the LLM Jacobian and local output geometry to upper-bound Shannon mutual information between perturbed input spans and model outputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Geometric Information Flow as a framework for tracking how input tokens affect outputs in autoregressive language models. It replaces heuristic attribution methods with a bound derived from the model's Jacobian and local output geometry, which can be computed via automatic differentiation and low-rank approximation. The approach aims to prevent taint explosion while providing a sound semantic foundation for information flow control. A mechanized Lean 4 proof establishes that the bound holds under local regularity assumptions. Evaluation on prompt-injection and privacy-leakage tasks shows high recall and transferability from small surrogate models to larger ones.

Core claim

GIF is a semantic framework that uses the LLM Jacobian and local output geometry to upper-bound the Shannon mutual information between perturbed input spans and model outputs, yielding a scalable measure that satisfies local geometric soundness and is supported by a fully mechanized Lean 4 proof under local regularity assumptions.

What carries the argument

The GIF upper bound on mutual information, computed from the LLM Jacobian via automatic differentiation and low-rank approximation.

If this is right

  • GIF achieves near-perfect recall on integrity and confidentiality benchmarks without a downstream declassifier.
  • GIF outperforms attention-based baselines and matches or exceeds F1 of direct LLM-as-judge methods at up to 81x lower token cost when paired with lightweight declassifiers.
  • Information flows detected using small surrogate models transfer to state-of-the-art models up to 200x larger and across model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bound could support runtime enforcement of information flow policies inside agentic LLM systems without requiring full model gradients at inference time.
  • Transferability from surrogates suggests a path to black-box deployment where only query access is available.
  • The geometric approach might extend to measuring flow in other sequence models whose Jacobians can be approximated.

Load-bearing premise

The local regularity assumptions under which the GIF bound is proven to upper-limit true information flow.

What would settle it

An input-output example on a model satisfying the local regularity assumptions where the actual mutual information between an input span and output exceeds the computed GIF value.

Figures

Figures reproduced from arXiv: 2606.23277 by Adam Storek, Nikolaus Holzer, Suman Jana, Zhuo Zhang.

Figure 1
Figure 1. Figure 1: Prompt injection in an HR-agent workflow. The trusted system prompt specifies hiring criteria, but the untrusted CV injects an instruction to [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: GIF-guided declassification vs. full-trajectory judging. Each row block is one model acting as both the monitored agent and its own declassifier. Left: detection quality of GIF-k, as a ratio to the same model’s full-trajectory judge (judge row: absolute %); ≥ 1 means the reduced transcript matches or beats the full one. Right: declassifier token cost per benchmark, split into input, output, and reasoning t… view at source ↗
Figure 3
Figure 3. Figure 3: Surrogate analysis models. Detection precision (P), recall (R), F1, F2, and accuracy (Acc) for the three GIF score variants when Agent￾Dojo trajectories from Qwen 3 32B (top) and GPT OSS 120B (bottom) are attributed using the original target model (Original) versus a fixed Qwen 3 0.6B surrogate. The surrogate tracks the original across both targets and all variants, including the cross-family GPT OSS 120B … view at source ↗
Figure 4
Figure 4. Figure 4: Attribution cutoff k. AUROC (left) and AUPRC (right) of GIF attribution as a function of the cutoff k in PSP@k, for six models averaged over AgentDojo, MSB, and AgentDAM. Both metrics saturate: 95% of the AUROC gain is reached by k=34 and of the AUPRC gain by k=29. moderate attribution cutoffs around k ≈ 30 capture nearly all of the discriminative signal, and the headline results are not an artifact of cho… view at source ↗
read the original abstract

Large language models increasingly mediate interactions between sensitive data, untrusted inputs, and privileged actions in agentic systems, creating security and privacy risks. These range from prompt injections that manipulate downstream tool use to leakage of confidential information through model outputs. Recent Information Flow Control (IFC)-based defenses show promise but lack a principled semantic foundation for reasoning about information flow through the model itself. Since any input token may influence any output token in an autoregressive LLM, existing approaches suffer from severe taint explosion. We present Geometric Information Flow (GIF), a semantic framework for tracking information flow from input tokens to outputs. GIF uses the LLM Jacobian and local output geometry to upper-bound the Shannon mutual information between perturbed input spans and model outputs, yielding a scalable measure computable on large models via automatic differentiation and low-rank approximation. Unlike attention-based or correlational attribution heuristics, GIF satisfies local geometric soundness, and we provide a fully mechanized Lean 4 proof that it upper-bounds the true information flow induced by a given prompt under local regularity assumptions. We evaluate GIF on integrity and confidentiality tasks across multiple prompt-injection and privacy-leakage benchmarks. GIF achieves near-perfect recall even without a downstream declassifier, outperforming attention-based baselines. Combined with lightweight LLM-based declassifiers, it matches or exceeds the F1 of direct LLM-as-judge baselines such as GPT-5.5 xhigh reasoning while using up to 81x lower token cost. GIF flows detected with small surrogate models transfer to larger state-of-the-art models and other model families, even when the surrogate is up to 200x smaller, suggesting black-box deployment without gradient access.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Geometric Information Flow (GIF), a semantic framework that uses the LLM Jacobian and local output geometry to upper-bound Shannon mutual information between perturbed input spans and model outputs. It claims a fully mechanized Lean 4 proof establishing this bound under local regularity assumptions, and reports empirical results on prompt-injection and privacy-leakage benchmarks showing near-perfect recall, outperformance of attention baselines, transfer from small surrogates to large models, and efficiency gains when combined with lightweight declassifiers.

Significance. If the bound and transfer hold, GIF supplies a scalable, geometrically grounded alternative to heuristic attribution methods for information-flow control in LLMs, directly addressing taint explosion. The provision of a mechanized Lean 4 proof is a clear strength that supplies independent verification of the local soundness claim under the stated assumptions.

major comments (2)
  1. [Proof section / abstract paragraph on mechanized proof] The Lean 4 proof (abstract and proof section) establishes the upper bound only under local regularity assumptions on the output geometry and Jacobian. The manuscript provides no diagnostic, check, or verification that these assumptions (Lipschitz constants, differentiability, curvature bounds) hold for the evaluated models, perturbation sizes, or input regimes in the prompt-injection and privacy benchmarks. This is load-bearing for the claim that GIF upper-bounds true information flow in the reported experiments.
  2. [Evaluation sections on benchmarks and transfer] Experimental evaluation sections: the transfer results (surrogate-to-large-model, up to 200x size difference) and benchmark F1/recall claims rest on the assumption that the local geometric bound remains valid across model scales and families, yet no evidence is given that the regularity conditions are preserved or that the low-rank Jacobian approximation does not violate them.
minor comments (1)
  1. [Abstract] Notation for the Jacobian and local geometry quantities is introduced without an explicit equation reference in the abstract; a forward pointer to the defining equation would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and for identifying key points regarding the connection between our theoretical results and empirical evaluations. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Proof section / abstract paragraph on mechanized proof] The Lean 4 proof (abstract and proof section) establishes the upper bound only under local regularity assumptions on the output geometry and Jacobian. The manuscript provides no diagnostic, check, or verification that these assumptions (Lipschitz constants, differentiability, curvature bounds) hold for the evaluated models, perturbation sizes, or input regimes in the prompt-injection and privacy benchmarks. This is load-bearing for the claim that GIF upper-bounds true information flow in the reported experiments.

    Authors: We agree that the mechanized Lean 4 proof is conditional on local regularity assumptions and that the manuscript does not include explicit empirical diagnostics verifying these assumptions for the specific models, perturbation sizes, and input regimes in the benchmarks. This is a valid observation. In the revised version, we will add a new subsection (tentatively in Section 5) that provides supporting evidence for the assumptions. This will include: (i) empirical estimates of local Lipschitz constants computed via finite differences on representative benchmark inputs for the evaluated models; (ii) discussion of the practical impact of the low-rank Jacobian approximation; and (iii) clarification that the assumptions are intended to hold locally for the small perturbations used in GIF. While exhaustive verification of curvature bounds across all inputs is computationally prohibitive at LLM scale, these additions will make the link between theory and experiments more explicit. revision: yes

  2. Referee: [Evaluation sections on benchmarks and transfer] Experimental evaluation sections: the transfer results (surrogate-to-large-model, up to 200x size difference) and benchmark F1/recall claims rest on the assumption that the local geometric bound remains valid across model scales and families, yet no evidence is given that the regularity conditions are preserved or that the low-rank Jacobian approximation does not violate them.

    Authors: The transfer results are presented as empirical evidence that GIF flows computed on small surrogate models can be applied to larger models. We acknowledge, however, that the manuscript does not explicitly demonstrate preservation of the regularity conditions or the validity of the low-rank approximation across scales and families. In the revision, we will expand the transfer discussion (in Section 6) with additional analysis comparing Jacobian rank, spectral norms, and local geometry statistics between the surrogate models and the target models (including the 200x size difference cases). This will provide evidence that the low-rank approximation does not materially violate the bound under the perturbation regimes tested. We will also clarify that the local character of the bound makes scale-invariance more plausible, while noting that the empirical performance stands independently as a practical result. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on mechanized Lean 4 proof under stated assumptions, independent of fitted parameters or self-referential definitions

full rationale

The paper's central claim is that the GIF quantity (Jacobian-based local geometry) upper-bounds true Shannon mutual information, with the bound established by a fully mechanized Lean 4 proof under local regularity assumptions. No equations or steps in the provided text reduce the bound to a fitted parameter, self-definition, or self-citation chain. The mechanized proof counts as independent support per the evaluation rules. The unverified status of the regularity assumptions on evaluated models is a correctness/verification gap, not a circularity reduction. No load-bearing self-citations, ansatzes smuggled via citation, or renaming of known results are present. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility; only the local regularity assumptions for the proof are identifiable as load-bearing.

axioms (1)
  • domain assumption Local regularity assumptions under which GIF upper-bounds true information flow
    Invoked to support the Lean 4 proof claim in the abstract.

pith-pipeline@v0.9.1-grok · 5835 in / 1098 out tokens · 20786 ms · 2026-06-26T08:22:55.685791+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 3 canonical work pages

  1. [1]

    Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,” in Proceedings of the 16th ACM workshop on artificial intelligence and security, 2023, pp. 79–90

  2. [2]

    LLM01:2025 prompt injection, OW ASP top 10 for LLM applications,

    OW ASP GenAI Security Project, “LLM01:2025 prompt injection, OW ASP top 10 for LLM applications,” https://genai.owasp.org/ llmrisk/llm01-prompt-injection/, 2025, accessed: 2026-06-11

  3. [3]

    OW ASP top 10 for LLM applications 2025,

    ——, “OW ASP top 10 for LLM applications 2025,” https://genai. owasp.org/resource/owasp-top-10-for-llm-applications-2025/, 2025, accessed: 2026-06-11

  4. [4]

    Simple prompt injection attacks can leak personal data observed by llm agents during task execution,

    M. Alizadeh, Z. Samei, D. Stetsenko, and F. Gilardi, “Simple prompt injection attacks can leak personal data observed by llm agents during task execution,” 2025. [Online]. Available: https://arxiv.org/abs/2506.01055

  5. [5]

    A lattice model of secure information flow,

    D. E. Denning, “A lattice model of secure information flow,”Com- munications of the ACM, vol. 19, no. 5, pp. 236–243, 1976

  6. [6]

    Language-based information-flow security,

    A. Sabelfeld and A. C. Myers, “Language-based information-flow security,”IEEE Journal on selected areas in communications, vol. 21, no. 1, pp. 5–19, 2003

  7. [7]

    A decentralized model for information flow control,

    A. C. Myers and B. Liskov, “A decentralized model for information flow control,”ACM SIGOPS Operating Systems Review, vol. 31, no. 5, pp. 129–142, 1997

  8. [8]

    A sound type system for secure flow analysis,

    D. V olpano, C. Irvine, and G. Smith, “A sound type system for secure flow analysis,”Journal of computer security, vol. 4, no. 2-3, pp. 167– 187, 1996

  9. [9]

    Security policies and security mod- els,

    J. A. Goguen and J. Meseguer, “Security policies and security mod- els,” in1982 IEEE symposium on security and privacy. IEEE, 1982, pp. 11–11

  10. [10]

    System-level defense against indirect prompt injection attacks: An information flow control per- spective,

    F. Wu, E. Cecchetti, and C. Xiao, “System-level defense against indirect prompt injection attacks: An information flow control per- spective,”CoRR, vol. abs/2409.19091, 2024

  11. [11]

    Securing ai agents with information-flow control,

    M. Costa, B. K ¨opf, A. Kolluri, A. Paverd, M. Russinovich, A. Salem, S. Tople, L. Wutschitz, and S. Zanella-B ´eguelin, “Securing ai agents with information-flow control,”CoRR, vol. abs/2505.23643, 2025

  12. [12]

    Defeating Prompt Injections by Design,

    E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tram `er, “Defeating Prompt Injections by Design,” Jun. 2025, arXiv:2503.18813. [Online]. Available: http://arxiv.org/abs/2503.18813

  13. [13]

    Design Patterns for Securing LLM Agents against Prompt Injections,

    L. Beurer-Kellner, B. B. A.-M. Cret ¸u, E. Debenedetti, D. Dobos, D. Fabian, M. Fischer, D. Froelicher, K. Grosse, D. Naeff, E. Ozoani, A. Paverd, F. Tram `er, and V . V olhejn, “Design Patterns for Securing LLM Agents against Prompt Injections,” Jun. 2025, arXiv:2506.08837 version: 1. [Online]. Available: http://arxiv.org/abs/2506.08837

  14. [14]

    Rtbas: Defending llm agents against prompt injection and privacy leakage,

    P. Y . Zhong, S. Chen, R. Wang, M. McCall, B. L. Titzer, H. Miller, and P. B. Gibbons, “Rtbas: Defending llm agents against prompt injection and privacy leakage,” 2025. [Online]. Available: https://arxiv.org/abs/2502.08966

  15. [15]

    Defending against indirect prompt injection attacks with spotlighting,

    K. Hines, G. Lopez, M. Hall, F. Zarfati, Y . Zunger, and E. Kiciman, “Defending against indirect prompt injection attacks with spotlighting,” 2024. [Online]. Available: https://arxiv.org/abs/ 2403.14720

  16. [16]

    Promptarmor: Simple yet effective prompt injection defenses,

    T. Shi, K. Zhu, Z. Wang, Y . Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchi, B. Alomair, X. Zhao, W. Y . Wang, N. Gong, W. Guo, and D. Song, “Promptarmor: Simple yet effective prompt injection defenses,” 2025. [Online]. Available: https://arxiv.org/abs/2507.15219

  17. [17]

    Progent: Securing ai agents with privilege control,

    T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, and D. Song, “Progent: Securing ai agents with privilege control,” 2026. [Online]. Available: https://arxiv.org/abs/2504.11703

  18. [18]

    Judging llm-as-a-judge with mt-bench and chatbot arena,

    L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,”

  19. [19]

    Available: https://arxiv.org/abs/2306.05685

    [Online]. Available: https://arxiv.org/abs/2306.05685

  20. [20]

    Systems security foun- dations for agentic computing,

    M. Christodorescu, E. Fernandes, A. Hooda, S. Jha, J. Rehberger, K. Chaudhuri, X. Fu, K. Shams, G. Amir, J. Choi, S. Choudhary, N. Palumbo, A. Labunets, and N. V . Pandya, “Systems security foun- dations for agentic computing,” IEEE Secure Generative AI (SAGAI) Agents Workshop, Workshop report, 2025

  21. [21]

    Automatic discovery and quantification of information leaks,

    M. Backes, B. K ¨opf, and A. Rybalchenko, “Automatic discovery and quantification of information leaks,” in2009 30th IEEE Symposium on Security and Privacy. IEEE, 2009, pp. 141–153

  22. [22]

    A statistical test for information leaks using continuous mutual information,

    T. Chothia and A. Guha, “A statistical test for information leaks using continuous mutual information,” in2011 IEEE 24th Computer Security Foundations Symposium. IEEE, 2011, pp. 177–190

  23. [23]

    Statistical mea- surement of information leakage,

    K. Chatzikokolakis, T. Chothia, and A. Guha, “Statistical mea- surement of information leakage,” inInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2010, pp. 390–404

  24. [24]

    On the foundations of quantitative information flow,

    G. Smith, “On the foundations of quantitative information flow,” in International Conference on Foundations of Software Science and Computational Structures. Springer, 2009, pp. 288–302

  25. [25]

    A static analysis for quanti- fying information flow in a simple imperative language,

    D. Clark, S. Hunt, and P. Malacaria, “A static analysis for quanti- fying information flow in a simple imperative language,”Journal of Computer Security, vol. 15, no. 3, pp. 321–371, 2007

  26. [26]

    Locally sound geometric information flow control for llms,

    “Locally sound geometric information flow control for llms,” https: //geomifc.github.io/, 2026, accessed: 2026-06-04

  27. [27]

    Amari and H

    S.-i. Amari and H. Nagaoka,Methods of information geometry. American Mathematical Soc., 2000, vol. 191

  28. [28]

    Understanding black-box predictions via influence functions,

    P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” inProceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, D. Precup and Y . W. Teh, Eds., vol. 70. PMLR, 06–11 Aug 2017, pp. 1885–1894. [Online]. Available: https://proceedings.mlr.press/v70/koh17a.html

  29. [29]

    Deep inside convolutional networks: Visualising image classification models and saliency maps,

    K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” 2014. [Online]. Available: https://arxiv.org/abs/1312. 6034

  30. [30]

    ” why should i trust you?

    M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” inProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144

  31. [31]

    New insights and perspectives on the natural gradient method,

    J. Martens, “New insights and perspectives on the natural gradient method,”Journal of Machine Learning Research, vol. 21, no. 146, pp. 1–76, 2020

  32. [32]

    Explaining and harness- ing adversarial examples,

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harness- ing adversarial examples,”arXiv preprint arXiv:1412.6572, 2014

  33. [33]

    T. M. Cover,Elements of information theory. John Wiley & Sons, 1999

  34. [34]

    Automatic differentiation in machine learning: a survey,

    A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, “Automatic differentiation in machine learning: a survey,”Journal of machine learning research, vol. 18, no. 153, pp. 1–43, 2018

  35. [35]

    GPT-OSS-120B configuration,

    OpenAI, “GPT-OSS-120B configuration,” https://huggingface.co/ openai/gpt-oss-120b/blob/main/config.json, 2025, accessed 2026-06- 07

  36. [36]

    Gemma 4 31B configuration,

    Google DeepMind, “Gemma 4 31B configuration,” https: //huggingface.co/google/gemma-4-31B-it/blob/main/config.json, 2026, accessed 2026-06-07

  37. [37]

    DeepSeek-V4-Pro configuration,

    DeepSeek, “DeepSeek-V4-Pro configuration,” https://huggingface.co/ deepseek-ai/DeepSeek-V4-Pro/blob/main/config.json, 2026, accessed 2026-06-07

  38. [38]

    A stochastic estimator of the trace of the in- fluence matrix for laplacian smoothing splines,

    M. F. Hutchinson, “A stochastic estimator of the trace of the in- fluence matrix for laplacian smoothing splines,”Communications in Statistics-Simulation and Computation, vol. 18, no. 3, pp. 1059–1076, 1989

  39. [39]

    Hutch++: Optimal stochastic trace estimation,

    R. A. Meyer, C. Musco, C. Musco, and D. P. Woodruff, “Hutch++: Optimal stochastic trace estimation,” inSymposium on Simplicity in Algorithms (SOSA). SIAM, 2021, pp. 142–155

  40. [40]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,

    E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tram `er, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” inThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. [Online]. Available: https://openreview.net/forum?id=m1YY AQjO3w

  41. [41]

    MCP security bench (MSB): Benchmarking attacks against model context protocol in LLM agents,

    D. Zhang, Z. Li, X. Luo, X. Liu, P. P. Li, and W. Xu, “MCP security bench (MSB): Benchmarking attacks against model context protocol in LLM agents,” inThe Fourteenth International Conference on Learning Representations, 2026. [Online]. Available: https://openreview.net/forum?id=irxxkFMrry

  42. [42]

    AgentDAM: Privacy leakage evaluation for autonomous web agents,

    A. Zharmagambetov, C. Guo, I. Evtimov, M. Pavlova, R. Salakhutdinov, and K. Chaudhuri, “AgentDAM: Privacy leakage evaluation for autonomous web agents,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2026. [Online]. Available: https://openreview.net/forum?id=qaxf7q41aK

  43. [43]

    Webarena: A realistic web environment for building autonomous agents,

    S. Zhou, F. F. Xu, H. Zhu, X. Zhou, R. Lo, A. Sridhar, X. Cheng, T. Ou, Y . Bisk, D. Fried, U. Alon, and G. Neubig, “Webarena: A realistic web environment for building autonomous agents,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=oKn9c6ytLx

  44. [44]

    VisualWebArena: Evaluating multimodal agents on realistic visual web tasks,

    J. Y . Koh, R. Lo, L. Jang, V . Duvvur, M. Lim, P.-Y . Huang, G. Neubig, S. Zhou, R. Salakhutdinov, and D. Fried, “VisualWebArena: Evaluating multimodal agents on realistic visual web tasks,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangk...

  45. [45]

    Agentleak: A full-stack benchmark for privacy leakage in multi-agent llm systems,

    F. El Yagoubi, G. Badu-Marfo, and R. Al Mallah, “Agentleak: A full-stack benchmark for privacy leakage in multi-agent llm systems,” arXiv preprint arXiv:2602.11510, 2026, submitted to arXiv on 12 Feb 2026. [Online]. Available: https://arxiv.org/abs/2602.11510

  46. [46]

    Jflow: practical mostly-static information flow control,

    A. C. Myers, “Jflow: practical mostly-static information flow control,” inProceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’99. New York, NY , USA: Association for Computing Machinery, 1999, p. 228–241. [Online]. Available: https://doi.org/10.1145/292540.292561

  47. [47]

    Declassification: Dimensions and principles,

    A. Sabelfeld and D. Sands, “Declassification: Dimensions and principles,”Journal of Computer Security, vol. 17, no. 5, pp. 517–548, Oct. 2009. [Online]. Available: https://journals.sagepub. com/doi/full/10.3233/JCS-2009-0352

  48. [48]

    Hyperproperties,

    M. R. Clarkson and F. B. Schneider, “Hyperproperties,”J. Comput. Secur., vol. 18, no. 6, p. 1157–1210, Sep. 2010

  49. [49]

    Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints,

    P. Cousot and R. Cousot, “Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints,” inProceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, ser. POPL ’77. New York, NY , USA: Association for Computing Machinery, 1977, p. 238–252. [Online]. Availabl...

  50. [50]

    Axiomatic attribution for deep networks,

    M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” inProceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, p. 3319–3328

  51. [51]

    Attention is not Explanation,

    S. Jain and B. C. Wallace, “Attention is not Explanation,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2...

  52. [52]

    Interpreting predictions of NLP models,

    E. Wallace, M. Gardner, and S. Singh, “Interpreting predictions of NLP models,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, 2020, pp. 20–23

  53. [53]

    Jacobian scopes: token-level causal attributions in llms,

    T. J. B. Liu, B. Zadeo ˘glu, N. Boull ´e, R. Sarfati, and C. J. Earls, “Jacobian scopes: token-level causal attributions in llms,” 2026. [Online]. Available: https://arxiv.org/abs/2601.16407

  54. [54]

    Mechanistic data attribution: Tracing the training origins of interpretable llm units,

    J. Chen, Y . Luo, and L. Pan, “Mechanistic data attribution: Tracing the training origins of interpretable llm units,” 2026. [Online]. Available: https://arxiv.org/abs/2601.21996

  55. [55]

    Date-lm: Benchmarking data attribution evaluation for large language models,

    C. Jiao, Y . Pan, E. Xiao, D. Sheng, N. Jain, H. Zhao, I. Dasgupta, J. W. Ma, and C. Xiong, “Date-lm: Benchmarking data attribution evaluation for large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2507.09424

  56. [56]

    Adaptive attacks break defenses against indirect prompt injection attacks on llm agents,

    Q. Zhan, R. Fang, H. S. Panchal, and D. Kang, “Adaptive attacks break defenses against indirect prompt injection attacks on llm agents,” 2025. [Online]. Available: https://arxiv.org/abs/2503.00061

  57. [57]

    Mitigating indirect prompt injection via instruction-following intent analysis,

    M. Kang, C. Xiang, S. Kariyappa, C. Xiao, B. Li, and E. Suh, “Mitigating indirect prompt injection via instruction-following intent analysis,” 2025. [Online]. Available: https://arxiv.org/abs/2512.00966

  58. [58]

    Agentsentry: Mitigating indirect prompt injection in llm agents via temporal causal diagnostics and context purification,

    T. Zhang, Y . Xu, J. Wang, K. Guo, X. Xu, B. Xiao, Q. Guan, J. Fan, J. Liu, Z. Liu, and H. Hu, “Agentsentry: Mitigating indirect prompt injection in llm agents via temporal causal diagnostics and context purification,” 2026. [Online]. Available: https://arxiv.org/abs/2602.22724