pith. sign in

arxiv: 2606.19588 · v1 · pith:RZ2G2CGGnew · submitted 2026-06-17 · 💻 cs.AI · cs.CR· cs.LO

Analyzing the Narration Gap in LLM-Solver Loops

Pith reviewed 2026-06-26 20:40 UTC · model grok-4.3

classification 💻 cs.AI cs.CRcs.LO
keywords LLM-solver loopsnarration gapprompt injectionsoundness guaranteehybrid pipelinesSAT solversSMT solversadversarial attacks
0
0 comments X

The pith

Soundness from formal solvers is lost when LLMs narrate the results in hybrid loops.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models LLM-solver loops as a verified decision procedure with three components: formalizing the question, deciding it with a solver, and narrating the result. It shows that the solver provides a sound answer, but the narration step allows prompt injection to invert the verified conclusion in the final user answer. Experiments across five models confirm that certificate gating keeps the solver sound while narration remains vulnerable, and hardened prompts reduce but do not stop adaptive attacks. This matters because hybrid systems are used for safety-critical questions where formal guarantees are expected to hold through to the end. The central finding is that robustness stops before reaching the answer the user reads.

Core claim

We model the LLM-solver loop as a verified decision procedure consisting of formalizing, deciding, and narrating. Formal analysis combined with empirical studies on prompt injection shows that certificate gating preserves the solver verdict while an adversary can invert a verified conclusion across phrasings and channels. Hardened prompts reduce injection attacks significantly but cannot eliminate them and still fail under adaptive attack. Therefore, in the LLM-solver loop, robustness does not reach the answer that the user finally reads.

What carries the argument

The narration step that converts the solver's formal output into the user-facing answer, identified as the component where the soundness guarantee can be lost.

If this is right

  • Certificate gating makes the solver verdict sound.
  • An adversary can invert a verified conclusion in the narration step across different phrasings and channels.
  • Hardened prompts reduce injection success significantly.
  • Adaptive attacks still succeed against hardened prompts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Safety-critical applications may need to output solver results directly without any LLM narration step.
  • Consistency checks between solver output and narrated text could be added as an extra verification layer.
  • The same narration vulnerability could appear in other LLM-formal tool combinations beyond SAT and SMT solvers.

Load-bearing premise

The modeling of the LLM-solver loop as three distinct components accurately isolates narration as the point where soundness can be lost.

What would settle it

An experiment in which prompt injection attempts consistently fail to produce a narrated answer that differs from the solver's verified conclusion.

Figures

Figures reproduced from arXiv: 2606.19588 by Songgaojun Deng, Zunchen Huang.

Figure 1
Figure 1. Figure 1: A solver only makes the verdict (blue) sound in the LLM-solver loop. Our focus is the narration gap (red). formula φ. In decision, a solver decides φ and returns a verdict v together with a certificate, which can be checked independently of the solver, such as a resolution or enumeration proof for an unsatisfiable instance. In the final narration step, LLM turns the verdict into a user received answer. Pre… view at source ↗
Figure 2
Figure 2. Figure 2: Illustrating the three narration behaviors on a certified UNSAT verdict. The verdict stays correct, yet compliant and stealthy narrations deliver the wrong answer. Gemma 0.17) while gpt-oss never flips its answer. This confirms that without the certificate from the solver, the verdict component inside the loop is genuinely corruptible and the certificate precisely removes that weak spot. Note that the vari… view at source ↗
Figure 3
Figure 3. Figure 3: Narration injection success for the four templates. and the user is misled. A hardened narrator prompt lowers how often either flip occurs but a stealthy attacks still remain (RQ3). Under neutral framings, every model performs the narration to final answer reliably. Four models follows strictly with the verdict. Only Llama has 4 cases out of 240 flipped the verdict answer back to user. Note that these 4 fl… view at source ↗
read the original abstract

Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question can be formulated in logic. Unlike chain of thought whose steps are sampled from the model distribution without formal guarantee, a solver produces a sound and independently verifiable answer. However, the soundness guarantee can be lost in the interaction between the solver and the model. The hybrid pipeline has three components: formalizing the question, deciding it, and narrating the result. Prior work has studied the formalization and decision, but not narration, which is the step that turns a formal tool's output into the user answer. To fill the narration gap, we first model the LLM-solver loop as a verified decision procedure. We further evaluate five open-sourced models under prompt injection, and we find certificate gating makes the solver verdict sound, while an adversary can invert a verified conclusion across phrasings and channels. We study the mitigation through hardened prompt that reduces injection significantly but cannot eliminate it and still suffers under adaptive attack. Combining the formal analysis and empirical studies, we show in the LLM-solver loop, robustness does not reach to the answer that the user finally reads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript models LLM-solver loops for safety-critical questions as verified decision procedures with three components (formalizing the question, deciding it via a solver, and narrating the result to the user). Prior work addressed formalization and decision; this work focuses on narration, evaluating five open-source models under prompt injection. It reports that certificate gating preserves solver soundness while adversaries can invert verified conclusions via narration across phrasings and channels; hardened prompts reduce but do not eliminate attacks (especially adaptive ones). The central claim is that robustness does not reach the final user-facing answer.

Significance. If the empirical isolation of narration holds, the result identifies a concrete, previously unexamined attack surface in hybrid LLM-formal pipelines. This is relevant for any deployment where a formally verified solver output is rendered by an LLM for human consumption, and the work supplies both a modeling framework and concrete attack/mitigation data.

major comments (2)
  1. [Abstract and modeling description] The three-component decomposition (formalizing, deciding, narrating) is load-bearing for the claim that soundness loss is localized to narration. The abstract states that certificate gating keeps the solver verdict sound while narration inverts it, yet the described experiments do not report explicit controls that hold the solver certificate fixed independently of the narration prompt; without such controls it remains possible that injection affects earlier components as well.
  2. [Abstract] The abstract describes evaluation of five models and mitigation testing but provides no attack-success metrics, statistical controls, sample sizes, or exact experimental protocols. This absence directly limits the strength of the empirical support for the claim that narration is the isolated failure point.
minor comments (1)
  1. Notation for the three components and the certificate-gating mechanism should be introduced with explicit definitions and a diagram early in the modeling section to make the formal analysis easier to follow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract and modeling description] The three-component decomposition (formalizing, deciding, narrating) is load-bearing for the claim that soundness loss is localized to narration. The abstract states that certificate gating keeps the solver verdict sound while narration inverts it, yet the described experiments do not report explicit controls that hold the solver certificate fixed independently of the narration prompt; without such controls it remains possible that injection affects earlier components as well.

    Authors: We agree that explicit controls are necessary to isolate the narration component. The modeling framework treats the solver output as fixed once certificate gating is applied, and our experiments were constructed to hold the verified solver certificate constant while injecting only into the narration prompt. To make this isolation fully transparent, we will add a dedicated subsection in the revised manuscript describing the experimental protocol that fixes the solver certificate independently of the narration prompt. revision: yes

  2. Referee: [Abstract] The abstract describes evaluation of five models and mitigation testing but provides no attack-success metrics, statistical controls, sample sizes, or exact experimental protocols. This absence directly limits the strength of the empirical support for the claim that narration is the isolated failure point.

    Authors: We accept that the abstract is currently too high-level and omits quantitative details. In the revised version we will expand the abstract to report attack-success rates across the five models, sample sizes, and a concise statement of the experimental protocols used for both standard and adaptive attacks. revision: yes

Circularity Check

0 steps flagged

No significant circularity; modeling assumption and external empirical tests are independent of the result.

full rationale

The paper introduces a three-component model (formalizing, deciding, narrating) as an explicit modeling choice to analyze the LLM-solver loop, then performs empirical prompt-injection tests on five open-sourced external models. No equations, fitted parameters, or self-citations are shown to reduce the central claim (narration as the soundness-loss point) back to the inputs by construction. The derivation chain is self-contained against external benchmarks and does not rely on load-bearing self-citation or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on the standard assumption that solvers produce sound outputs and that LLM prompting behavior can be studied via injection attacks, without introducing new fitted parameters or postulated entities.

axioms (1)
  • domain assumption The LLM-solver loop can be modeled as a verified decision procedure with distinct formalization, decision, and narration steps.
    Stated as the initial modeling step in the abstract.

pith-pipeline@v0.9.1-grok · 5737 in / 1145 out tokens · 21395 ms · 2026-06-26T20:40:08.011740+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 21 canonical work pages · 4 internal anchors

  1. [1]

    Ollama — ollama.com.https://ollama.com/(2026)

  2. [2]

    Aceto, L., Achilleos, A., Francalanza, A., Ingólfsdóttir, A., Lehtinen, K.: Adventures in monitorability: from branching to linear time and back again. Proc. ACM Program. Lang.3(POPL), 52:1–52:29 (2019).https://doi.org/10.1145/3290365, https://doi.org/10.1145/3290365

  3. [3]

    Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

    Agarwal, A., Neamtu, N., Aggarwal, P., Kim, S., Limperg, J., Flamant, C., Shimizu, K., Parno, B., Welleck, S.: Verus-specgym: An agentic environment for evaluating specification autoformalization. CoRRabs/2605.26457(2026). https://doi.org/ 10.48550/ARXIV.2605.26457,https://doi.org/10.48550/arXiv.2605.26457

  4. [4]

    cvc5: A Versatile and Industrial-Strength SMT Solver

    Barbosa, H., Barrett, C.W., Brain, M., Kremer, G., Lachnitt, H., Mann, M., Mohamed, A., Mohamed, M., Niemetz, A., Nötzli, A., Ozdemir, A., Preiner, M., Reynolds, A., Sheng, Y., Tinelli, C., Zohar, Y.: cvc5: A versatile and industrial-strength SMT solver. In: Fisman, D., Rosu, G. (eds.) Tools and Al- gorithms for the Construction and Analysis of Systems - ...

  5. [5]

    Runtime verification for LTL and TLTL.ACM Transactions on Software Engineering and Methodology, 20(4):14:1–14:64, 2011

    Bauer, A., Leucker, M., Schallhart, C.: Runtime verification for LTL and TLTL. ACM Trans. Softw. Eng. Methodol.20(4), 14:1–14:64 (2011).https://doi.org/ 10.1145/2000799.2000800,https://doi.org/10.1145/2000799.2000800

  6. [6]

    Springer Berlin Heidelberg, Berlin, Heidelberg, 1999

    Biere, A., Cimatti, A., Clarke, E.M., Zhu, Y.: Symbolic model checking without bdds. In: Cleaveland, R. (ed.) Tools and Algorithms for Construction and Anal- ysis of Systems, 5th International Conference, TACAS ’99, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS’99, Amsterdam, The Netherlands, March 22-28, 199...

  7. [7]

    Advances in neural information processing systems33, 1877–1901 (2020)

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems33, 1877–1901 (2020)

  8. [8]

    Cai, Y., Hou, Z., Sanán, D., Luan, X., Lin, Y., Sun, J., Dong, J.S.: Automated program refinement: Guide and verify code large language model with refinement calculus. Proc. ACM Program. Lang.9(POPL), 2057–2089 (2025).https://doi. org/10.1145/3704905,https://doi.org/10.1145/3704905

  9. [9]

    In: Bouamor, H., Pino, J., Bali, K

    Chakraborty, S., Lahiri, S.K., Fakhoury, S., Lal, A., Musuvathi, M., Rastogi, A., Senthilnathan, A., Sharma, R., Swamy, N.: Ranking llm-generated loop invariants for program verification. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10,

  10. [10]

    findings-acl.856/

    Findings of ACL, vol. EMNLP 2023, pp. 9164–9175. Association for Computa- tional Linguistics (2023).https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP. 614,https://doi.org/10.18653/v1/2023.findings-emnlp.614

  11. [11]

    Reasoning Models Don't Always Say What They Think

    Chen, Y., Benton, J., Radhakrishnan, A., Uesato, J., Denison, C., Schulman, J., Somani, A., Hase, P., Wagner, M., Roger, F., Mikulik, V., Bowman, S.R., Leike, J., Kaplan, J., Perez, E.: Reasoning models don’t always say what they think. CoRRabs/2505.05410(2025). https://doi.org/10.48550/ARXIV.2505.05410, https://doi.org/10.48550/arXiv.2505.05410 18 Zunche...

  12. [13]

    Journal of machine learning research24(240), 1–113 (2023)

    Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al.: Palm: Scaling language modeling with pathways. Journal of machine learning research24(240), 1–113 (2023)

  13. [14]

    arXiv preprint arXiv:2110.14168 (2021)

    Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., et al.: Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021)

  14. [15]

    In: Enea, C., Lal, A

    Cosler, M., Hahn, C., Mendoza, D., Schmitt, F., Trippel, C.: nl2spec: Interac- tively translating unstructured natural language to temporal logics with large language models. In: Enea, C., Lal, A. (eds.) Computer Aided Verification - 35th International Conference, CAV 2023, Paris, France, July 17-22, 2023, Pro- ceedings, Part II. Lecture Notes in Computer...

  15. [16]

    Springer (2023).https://doi.org/10.1007/978-3-031-37703-7_18, https: //doi.org/10.1007/978-3-031-37703-7_18

  16. [17]

    Falcone, Y., Fernandez, J., Mounier, L.: What can you verify and enforce at runtime? Int. J. Softw. Tools Technol. Transf.14(3), 349–382 (2012).https://doi.org/10. 1007/S10009-011-0196-8,https://doi.org/10.1007/s10009-011-0196-8

  17. [18]

    Design by contract for deep learning APIs,

    First, E., Rabe, M.N., Ringer, T., Brun, Y.: Baldur: Whole-proof generation and repair with large language models. In: Chandra, S., Blincoe, K., Tonella, P. (eds.) Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco,CA,USA,December3-9,2023.pp.1229...

  18. [19]

    In: Büning, H.K., Zhao, X

    Goldberg, E.: A decision-making procedure for resolution-based sat-solvers. In: Büning, H.K., Zhao, X. (eds.) Theory and Applications of Satisfiability Testing - SAT 2008, 11th International Conference, SAT 2008, Guangzhou, China, May 12-15, 2008. Proceedings. Lecture Notes in Computer Science, vol. 4996, pp. 119–

  19. [20]

    Springer (2008).https://doi.org/10.1007/978-3-540-79719-7_11, https: //doi.org/10.1007/978-3-540-79719-7_11

  20. [21]

    In: Proceedings of the 16th ACM workshop on artificial intelligence and security

    Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M.: Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In: Proceedings of the 16th ACM workshop on artificial intelligence and security. pp. 79–90 (2023)

  21. [22]

    In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023

    Jiang, A.Q., Welleck, S., Zhou, J.P., Lacroix, T., Liu, J., Li, W., Jamnik, M., Lample, G., Wu, Y.: Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net (2023),https://openreview.net/forum?id=SMa9EAovKMC

  22. [23]

    In: Narodytska, N., Rümmer, P

    Kamath, A., Mohammed, J.N., Senthilnathan, A., Chakraborty, S., Deligiannis, P., Lahiri, S.K., Lal, A., Rastogi, A., Roy, S., Sharma, R.: Leveraging llms for program verification. In: Narodytska, N., Rümmer, P. (eds.) Formal Methods in Computer- Aided Design, FMCAD 2024, Prague, Czech Republic, October 15-18, 2024. pp. 107–

  23. [24]

    https://doi.org/10.34727/2024/ISBN.978-3-85448-065-5_ 16,https://doi.org/10.34727/2024/isbn.978-3-85448-065-5_16 Analyzing the Narration Gap in LLM-Solver Loops 19

    IEEE (2024). https://doi.org/10.34727/2024/ISBN.978-3-85448-065-5_ 16,https://doi.org/10.34727/2024/isbn.978-3-85448-065-5_16 Analyzing the Narration Gap in LLM-Solver Loops 19

  24. [25]

    In: ICLR 2026 Workshop: VerifAI-2: The Second Workshop on AI Verification in the Wild

    Kim, K., Poiroux, A., Bosselut, A.: Do llms game formalization? evaluating faithful- ness in logical reasoning. In: ICLR 2026 Workshop: VerifAI-2: The Second Workshop on AI Verification in the Wild

  25. [26]

    Measuring Faithfulness in Chain-of-Thought Reasoning

    Lanham, T., Chen, A., Radhakrishnan, A., Steiner, B., Denison, C., Hernan- dez, D., Li, D., Durmus, E., Hubinger, E., Kernion, J., Lukosiute, K., Nguyen, K., Cheng, N., Joseph, N., Schiefer, N., Rausch, O., Larson, R., McCandlish, S., Kundu, S., Kadavath, S., Yang, S., Henighan, T., Maxwell, T., Telleen- Lawton, T., Hume, T., Hatfield-Dodds, Z., Kaplan, J...

  26. [27]

    In: Globersons, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J.M., Zhang, C

    Li, Z., Wu, Y., Li, Z., Wei, X., Zhang, X., Yang, F., Ma, X.: Autoformalize mathemat- ical statements by symbolic equivalence and semantic consistency. In: Globersons, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J.M., Zhang, C. (eds.) Ad- vances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing...

  27. [28]

    Ligatti, J., Bauer, L., Walker, D.: Edit automata: enforcement mechanisms for run-time security policies. Int. J. Inf. Sec.4(1-2), 2–16 (2005).https://doi.org/ 10.1007/S10207-004-0046-8,https://doi.org/10.1007/s10207-004-0046-8

  28. [29]

    In: 33rd USENIX Security Symposium (USENIX Security 24)

    Liu, Y., Jia, Y., Geng, R., Jia, J., Gong, N.Z.: Formalizing and benchmarking prompt injection attacks and defenses. In: 33rd USENIX Security Symposium (USENIX Security 24). pp. 1831–1847 (2024)

  29. [30]

    Leveraging large language models for enhancing the understandability of generated unit tests,

    Ma, L., Liu, S., Li, Y., Xie, X., Bu, L.: Specgen: Automated generation of formal program specifications via large language models. In: 47th IEEE/ACM International Conference on Software Engineering, ICSE 2025, Ottawa, ON, Canada, April 26 - May 6, 2025. pp. 16–28. IEEE (2025).https://doi.org/10.1109/ICSE55347.2025. 00129,https://doi.org/10.1109/ICSE55347...

  30. [31]

    Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing

    Moore, H., Shah, A.: Evaluating autoformalization robustness via semantically sim- ilar paraphrasing. CoRRabs/2511.12784(2025). https://doi.org/10.48550/ ARXIV.2511.12784,https://doi.org/10.48550/arXiv.2511.12784

  31. [32]

    Z3: An Efficient SMT Solver

    de Moura, L.M., Bjørner, N.S.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedi...

  32. [33]

    In: Salakhutdinov, R., Kolter, Z., Heller, K.A., Weller, A., Oliver, N., Scarlett, J., Berkenkamp, F

    Murphy, L., Yang, K., Sun, J., Li, Z., Anandkumar, A., Si, X.: Autoformalizing euclidean geometry. In: Salakhutdinov, R., Kolter, Z., Heller, K.A., Weller, A., Oliver, N., Scarlett, J., Berkenkamp, F. (eds.) Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. Proceedings of Machine Learning Research, vol...

  33. [34]

    NeMo guardrails: A toolkit for controllable and safe LLM applications with pro- grammable rails

    Olausson, T., Gu, A., Lipkin, B., Zhang, C.E., Solar-Lezama, A., Tenenbaum, J.B., Levy, R.: LINC: A neurosymbolic approach for logical reasoning by combining language models with first-order logic provers. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapor...

  34. [35]

    Findings of the Association for Computational Linguistics:

    Pan, L., Albalak, A., Wang, X., Wang, W.Y.: Logic-lm: Empowering large lan- guage models with symbolic solvers for faithful logical reasoning. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. Findings of ACL, vol. EMNLP 2023, pp. 3806–3824. Association for Com...

  35. [36]

    arXiv preprint arXiv:2211.09527 (2022)

    Perez, F., Ribeiro, I.: Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527 (2022)

  36. [37]

    In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

    Raina, V., Liusie, A., Gales, M.: Is llm-as-a-judge robust? investigating univer- sal adversarial attacks on zero-shot llm assessment. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. pp. 7499–7517 (2024)

  37. [38]

    Schneider

    Schneider, F.B.: Enforceable security policies. ACM Trans. Inf. Syst. Secur.3(1), 30–50 (2000).https://doi.org/10.1145/353323.353382, https://doi.org/10. 1145/353323.353382

  38. [39]

    In: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

    Shi, J., Yuan, Z., Liu, Y., Huang, Y., Zhou, P., Sun, L., Gong, N.Z.: Optimization- based prompt injection attack to llm-as-a-judge. In: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. pp. 660–674 (2024)

  39. [40]

    In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

    Talmor, A., Herzig, J., Lourie, N., Berant, J.: Commonsenseqa: A question answering challenge targeting commonsense knowledge. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4149–4158 (2019)

  40. [41]

    Advances in Neural Information Processing Systems36, 74952–74965 (2023)

    Turpin, M., Michael, J., Perez, E., Bowman, S.: Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems36, 74952–74965 (2023)

  41. [42]

    Advances in neural information processing systems35, 24824–24837 (2022)

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems35, 24824–24837 (2022)

  42. [43]

    In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A

    Wu, Y., Jiang, A.Q., Li, W., Rabe, M.N., Staats, C., Jamnik, M., Szegedy, C.: Autoformalization with large language models. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Informa- tion Processing Systems 35: Annual Conference on Neural Information Process- ing Systems 2022, NeurIPS 2022, New Orleans, LA, ...

  43. [44]

    In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

    Yang, K., Swope, A.M., Gu, A., Chalamala, R., Song, P., Yu, S., Godil, S., Prenger, R.J., Anandkumar, A.: Leandojo: Theorem proving with retrieval- augmented language models. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Process...

  44. [45]

    In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

    Ye, X., Chen, Q., Dillig, I., Durrett, G.: Satlm: Satisfiability-aided language models using declarative prompting. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Informa- tion Processing Systems 36: Annual Conference on Neural Information Pro- cessing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, D...

  45. [46]

    In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

    Yi, J., Xie, Y., Zhu, B., Kiciman, E., Sun, G., Xie, X., Wu, F.: Benchmarking and defending against indirect prompt injection attacks on large language models. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. pp. 1809–1820 (2025)

  46. [47]

    In: Findings of the Association for Computational Linguistics: ACL 2024

    Zhan, Q., Liang, Z., Ying, Z., Kang, D.: Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. In: Findings of the Association for Computational Linguistics: ACL 2024. pp. 10471–10506 (2024)

  47. [48]

    In: International Conference on Learning Representations

    Zhang, H., Huang, J., Mei, K., Yao, Y., Wang, Z., Zhan, C., Wang, H., Zhang, Y.: Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents. In: International Conference on Learning Representations. vol. 2025, pp. 35331–35366 (2025)

  48. [49]

    <property question?>

    Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al.: Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems36, 46595–46623 (2023) 22 Zunchen Huang and Songgaojun Deng A Ethical considerations and responsible disclosure Our study uses synthetic inst...