pith. machine review for the scientific record. sign in

arxiv: 2512.05439 · v2 · submitted 2025-12-05 · 💻 cs.AI · cs.FL

Recognition: 1 theorem link

· Lean Theorem

BEAVER: An Efficient Deterministic LLM Verifier

Authors on Pith no claims yet

Pith reviewed 2026-05-17 01:54 UTC · model grok-4.3

classification 💻 cs.AI cs.FL
keywords LLM verificationsafety propertiesdeterministic boundstoken trieprobability boundsAI safetymodel risk
0
0 comments X

The pith

BEAVER computes deterministic probability bounds on whether LLMs meet safety properties by exploring token sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BEAVER as a framework to verify large language models against safety properties using guaranteed probability bounds rather than estimates from sampling. It builds a token trie to represent possible outputs and maintains a frontier of partial sequences, updating sound lower and upper bounds on the probability of satisfying the property at each step. This replaces ad-hoc sampling with systematic search that is provably correct. A reader would care because it turns safety verification from an intuition into a calculation that can surface tail risks missed by looser methods while using far less computation.

Core claim

BEAVER is the first practical framework for computing deterministic, sound probability bounds on LLM satisfaction of safety properties. Given a prompt and any safety property, BEAVER systematically explores the model output space using novel Token trie and Frontier data structures, maintaining provably sound bounds at every iteration. The authors formalize the verification problem, prove soundness of the approach, and evaluate it on four safety properties across twelve open-weight LLMs, where it identifies two to three times more risky instances than baselines at one-tenth the compute budget.

What carries the argument

Token trie and Frontier data structures that systematically enumerate output sequences while tracking provably sound probability bounds on property satisfaction.

If this is right

  • Sound bounds replace sampling estimates for any safety property that can be checked on complete outputs.
  • More risky generations are surfaced than random sampling achieves at the same budget.
  • The bounds remain valid after every exploration step, allowing early stopping with guarantees.
  • The approach applies across multiple open-weight models and different safety properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same exploration structure could be adapted to verify other output properties such as factual correctness or style constraints.
  • When only black-box access is available, combining the method with targeted sampling might still yield useful approximate bounds.
  • Efficiency improvements open the door to running verification inside generation loops rather than after the fact.
  • Scaling to longer outputs would likely require pruning rules on the frontier to keep the search tractable.

Load-bearing premise

The LLM must allow token-by-token queries that return logits or probabilities, and safety properties must be expressible as predicates on finite output sequences.

What would settle it

Running the method on a black-box API that returns only complete strings without intermediate probabilities, and showing that sound bounds can no longer be maintained.

Figures

Figures reproduced from arXiv: 2512.05439 by Debangshu Banerjee, Gagandeep Singh, Nalin Wadhwa, Tarun Suresh.

Figure 1
Figure 1. Figure 1: BEAVER workflow for computing sound probability bounds. Given a prompt, language model, and a prefix-closed semantic constraint, BEAVER iteratively: (1) selects an incomplete leaf from the frontier, (2) expands it by querying the model and adding valid continuations to the token trie, and (3) updates the sound probability bounds [𝑃𝐿𝐵, 𝑃𝑈 𝐵] based on the new frontier state. thousand tokens and even moderate… view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of the token trie T through three iterations of BEAVER on the bash command safety constraint. Starting from the empty trie, BEAVER expands nodes 𝑛0, 𝑛1, and 𝑛4 in sequence. Green nodes indicate incomplete sequences eligible for expansion, turquoise nodes indicate complete sequences (ending in ⟨eos⟩). Probability bounds tighten from [0.01, 0.9] after iteration 1, to [0.213, 0.815] after iteration … view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of Avg probability bounds by BEAVER and Rejection Sampling over Forward Passes and Time for Qwen2.5-14B Instruct on GSM-Symbolic Dataset 6.1.3 Secure code generation [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
read the original abstract

As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify model outputs and characterize tail risk for safe deployment. While sampling-based estimates provide an ad-hoc intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM satisfaction of safety properties. Given a prompt & any safety property, BEAVER systematically explores the model output space using novel Token trie and Frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on 4 safety properties across 12 open-weight LLMs. BEAVER identifies 2-3x more risky instances compared to baselines while taking 1/10 of the compute budget, surfacing tail risks that loose bounds and ad-hoc evaluation misses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents BEAVER, a framework that uses novel Token Trie and Frontier data structures to systematically explore the token-level output space of LLMs, computing deterministic sound probability bounds on satisfaction of arbitrary safety properties expressed as predicates over finite sequences. It formalizes the verification problem, proves soundness with bounds maintained at every iteration, and reports empirical results on 4 safety properties across 12 open-weight LLMs, claiming 2-3x more risky instance detections than baselines at 1/10 the compute budget.

Significance. If the central claims hold, the work provides a meaningful step toward guaranteed, non-sampling-based characterization of tail risks in LLM outputs, which is valuable for safe deployment. The formalization, soundness proof, and data-structure-driven exploration are strengths that distinguish it from ad-hoc sampling methods. The reported efficiency gains could be impactful if they generalize beyond the evaluated predicates.

major comments (2)
  1. [Algorithm description and predicate integration (around the Frontier update and early-termination rules)] The efficiency and 'first practical framework' claims rest on the ability to prune or bound the frontier using partial-sequence evaluations of the safety predicate. For general predicates (e.g., semantic toxicity classifiers or checks over large forbidden-phrase sets) that are only decidable on complete sequences, the algorithm reduces to exhaustive expansion of all frontier paths to leaves before tightening the remaining-mass bound, restoring exponential cost in output length. No section explicitly analyzes this case or provides a complexity bound conditioned on predicate monotonicity or prefix-decidability.
  2. [Soundness proof and experimental setup] The soundness proof is stated to maintain bounds at every iteration, but the manuscript does not detail how the predicate is encoded to support incremental checks or how early termination preserves the guarantee when the predicate cannot be evaluated on prefixes. This leaves the practical 1/10-compute claim only partially supported for the 'any safety property' scope asserted in the abstract.
minor comments (2)
  1. [Preliminaries] Notation for the probability bounds and remaining mass could be clarified with a single running example that shows bound evolution over iterations.
  2. [Experiments] The baseline comparison would benefit from explicit pseudocode or parameter settings for the sampling methods to ensure reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We value the recognition of the formalization, soundness proof, and data structures as distinguishing strengths. We respond point-by-point to the major comments below, offering clarifications on the framework's design and committing to revisions that will better delineate the conditions for efficiency while preserving the soundness guarantees.

read point-by-point responses
  1. Referee: [Algorithm description and predicate integration (around the Frontier update and early-termination rules)] The efficiency and 'first practical framework' claims rest on the ability to prune or bound the frontier using partial-sequence evaluations of the safety predicate. For general predicates (e.g., semantic toxicity classifiers or checks over large forbidden-phrase sets) that are only decidable on complete sequences, the algorithm reduces to exhaustive expansion of all frontier paths to leaves before tightening the remaining-mass bound, restoring exponential cost in output length. No section explicitly analyzes this case or provides a complexity bound conditioned on predicate monotonicity or prefix-decidability.

    Authors: We thank the referee for this precise observation. Our approach is defined for arbitrary predicates over finite sequences, and the Token Trie and Frontier structures maintain a conservative bound on unexplored probability mass at every step. When a predicate is only decidable on complete sequences, expansion continues until leaves are reached for those paths, after which the predicate is evaluated and the bound is tightened; however, the Frontier still permits early termination of the overall search once the remaining mass is below the target precision, without requiring full enumeration of all possible outputs. We agree that the manuscript would be strengthened by an explicit discussion of this distinction. In revision, we will add a subsection analyzing predicate classes (prefix-decidable, monotonic, or terminal-only) and the resulting complexity implications, including a high-level bound showing that the cost is exponential in sequence length only for the terminal-only case but remains practical due to mass-based pruning. revision: yes

  2. Referee: [Soundness proof and experimental setup] The soundness proof is stated to maintain bounds at every iteration, but the manuscript does not detail how the predicate is encoded to support incremental checks or how early termination preserves the guarantee when the predicate cannot be evaluated on prefixes. This leaves the practical 1/10-compute claim only partially supported for the 'any safety property' scope asserted in the abstract.

    Authors: The soundness result (Theorem 1) holds unconditionally: the upper and lower bounds on the probability of predicate satisfaction are invariants maintained by the Frontier updates, regardless of when or whether the predicate can be evaluated on a given prefix. If the predicate cannot be decided on a prefix, that path's full probability mass is retained in the Frontier and no predicate-based pruning is applied; termination occurs only when the total unexplored mass is sufficiently small, which conservatively over-approximates any possible violations in the remaining space. We will revise the manuscript to include a clearer description of the predicate interface (with examples of incremental vs. non-incremental encodings), additional pseudocode for the update rules, and an explicit statement that soundness is independent of prefix decidability while efficiency gains are largest for predicates admitting early decisions. The reported 1/10-compute results are for the four evaluated properties, which admit varying degrees of prefix information; we will add this qualification to the experimental section. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic formalization with independent soundness proof

full rationale

The paper describes BEAVER as an algorithmic procedure that systematically explores LLM output space via Token trie and Frontier data structures while maintaining provably sound probability bounds. It explicitly states that the verification problem is formalized and soundness is proven, with evaluation performed on concrete safety properties across models. No step reduces a claimed result to a fitted parameter, self-defined quantity, or self-citation chain by construction; the central claims rest on data-structure invariants and the formalization rather than on any enumerated circular pattern. The method is therefore self-contained against external benchmarks such as sampling-based estimates.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the existence of a formalization of the verification problem that admits a sound exploration procedure; the paper introduces two new data structures whose invariants must hold for the bounds to remain valid.

axioms (2)
  • domain assumption The LLM can be queried for next-token probabilities at any prefix, enabling systematic enumeration of the output space.
    Required for the token-trie exploration to be feasible; stated implicitly by the method description.
  • domain assumption Safety properties can be decided on finite output sequences.
    Necessary for the predicate to be evaluable during search.
invented entities (2)
  • Token trie no independent evidence
    purpose: Compact representation of possible output sequences for bound maintenance
    New data structure introduced to support the deterministic search.
  • Frontier no independent evidence
    purpose: Data structure that tracks unexplored branches while preserving sound probability bounds
    New data structure introduced to support the deterministic search.

pith-pipeline@v0.9.0 · 5454 in / 1492 out tokens · 34852 ms · 2026-05-17T01:54:34.219136+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 11 internal anchors

  1. [1]

    Gpt-4 technical report, 2023

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report, 2023

  2. [2]

    Strong mixed-integer programming formulations for trained neural networks.Mathematical Programming, 2020

    Ross Anderson, Joey Huchette, Will Ma, Christian Tjandraatmadja, and Juan Pablo Vielma. Strong mixed-integer programming formulations for trained neural networks.Mathematical Programming, 2020

  3. [3]

    Stanley Bak, Hoang-Dung Tran, Kerianne Hobbs, and Taylor T. Johnson. Improved geometric path enumeration for verifying relu neural networks. In Shuvendu K. Lahiri and Chao Wang, editors,Computer Aided Verification - 32nd International Conference, CA V 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part I, volume 12224 ofLecture Notes in Comput...

  4. [4]

    Certifying geometric robustness of neural networks

    Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, and Martin Vechev. Certifying geometric robustness of neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https: //proceedings.neurips....

  5. [5]

    Interpreting robustness proofs of deep neural networks

    Debangshu Banerjee, Avaljot Singh, and Gagandeep Singh. Interpreting robustness proofs of deep neural networks. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id= Ev10F9TWML

  6. [6]

    Crane: Reasoning with constrained llm generation, 2025

    Debangshu Banerjee, Tarun Suresh, Shubham Ugare, Sasa Misailovic, and Gagandeep Singh. Crane: Reasoning with constrained llm generation, 2025. URL https://arxiv.org/abs/2502.09061

  7. [7]

    Purple llama cyberseceval: A secure coding benchmark for language models, 2023

    Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, and Joshua Saxe. Purple l...

  8. [8]

    Branch and bound for piecewise linear neural network verification.Journal of Machine Learning Research, 21(2020), 2020

    Rudy Bunel, Jingyue Lu, Ilker Turkaslan, Pushmeet Kohli, P Torr, and P Mudigonda. Branch and bound for piecewise linear neural network verification.Journal of Machine Learning Research, 21(2020), 2020

  9. [9]

    An efficient nonconvex re- formulation of stagewise convex optimization problems.Advances in Neural Information Processing Systems, 33, 2020

    Rudy R Bunel, Oliver Hinder, Srinadh Bhojanapalli, and Krishnamurthy Dvijotham. An efficient nonconvex re- formulation of stagewise convex optimization problems.Advances in Neural Information Processing Systems, 33, 2020

  10. [10]

    Nlp verification: towards a general methodology for certifying robustness.European Journal of Applied Mathematics, pages 1–58, 2025

    Marco Casadio, Tanvi Dinkar, Ekaterina Komendantskaya, Luca Arnaboldi, Matthew L Daggitt, Omri Isac, Guy Katz, Verena Rieser, and Oliver Lemon. Nlp verification: towards a general methodology for certifying robustness.European Journal of Applied Mathematics, pages 1–58, 2025

  11. [11]

    Quantitative certification of knowledge comprehension in llms

    Isha Chaudhary, Vedaant V Jain, and Gagandeep Singh. Quantitative certification of knowledge comprehension in llms. InICLR 2024 Workshop on Secure and Trustworthy Large Language Models

  12. [12]

    Quantitative certification of bias in large language models.arXiv e-prints, pages arXiv–2405, 2024

    Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, and Gagandeep Singh. Quantitative certification of bias in large language models.arXiv e-prints, pages arXiv–2405, 2024

  13. [13]

    Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V

    Yuri Chervonyi, Trieu H. Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V. Le, and Thang Luong. Gold-medalist performance in solving olympiad geometry with alphageometry2, 2025. URL https://arxiv.org/abs/2502.03544

  14. [14]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems, 2021. URL https://arxiv.org/abs/2110.14168

  15. [15]

    Certified adversarial robustness via randomized smoothing

    Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 1310–1320. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.pres...

  16. [16]

    File searching using variable length keys

    Rene De La Briandais. File searching using variable length keys. InPapers Presented at the the March 3-5, 1959, Western Joint Computer Conference, IRE-AIEE-ACM ’59 (Western), page 295–298, New York, NY, USA, 1959. Association for Computing Machinery. ISBN 9781450378659. doi: 10.1145/1457838.1457895. URL https://doi.org/10.1145/1457838. 1457895

  17. [17]

    Z3: an efficient smt solver

    Leonardo De Moura and Nikolaj Bjørner. Z3: an efficient smt solver. InProceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, page 337–340, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 3540787992

  18. [18]

    Formal verification of piece-wise linear feed-forward neural networks

    Ruediger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. InInternational Symposium on Automated Technology for Verification and Analysis, 2017

  19. [19]

    Shh, don’t say that! domain certification in llms.arXiv preprint arXiv:2502.19320, 24 Suresh et al

    Cornelius Emde, Alasdair Paren, Preetham Arvind, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Bernard Ghanem, Philip HS Torr, and Adel Bibi. Shh, don’t say that! domain certification in llms.arXiv preprint arXiv:2502.19320, 24 Suresh et al. 2025

  20. [20]

    Complete verification via multi-neuron relaxation guided branch-and-bound

    Claudio Ferrari, Mark Niklas Mueller, Nikola Jovanović, and Martin Vechev. Complete verification via multi-neuron relaxation guided branch-and-bound. InInternational Conference on Learning Representations, 2022. URL https: //openreview.net/forum?id=l_amHf1oaK

  21. [21]

    Fast geometric projections for local robustness certification

    Aymeric Fromherz, Klas Leino, Matt Fredrikson, Bryan Parno, and Corina Pasareanu. Fast geometric projections for local robustness certification. InInternational Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=zWy1uxjDdZJ

  22. [22]

    Ai2: Safety and robustness certification of neural networks with abstract interpretation

    Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In2018 IEEE Symposium on Security and Privacy (SP), 2018

  23. [23]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava S...

  24. [24]

    The Curious Case of Neural Text Degeneration

    Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration, 2020. URL https://arxiv.org/abs/1904.09751

  25. [25]

    Are large pre-trained language models leaking your personal information?, 2022

    Jie Huang, Hanyin Shao, and Kevin Chen-Chuan Chang. Are large pre-trained language models leaking your personal information?, 2022. URL https://arxiv.org/abs/2205.12628

  26. [26]

    Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

    Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen. Catastrophic jailbreak of open-source llms via exploiting generation, 2023. URL https://arxiv.org/abs/2310.06987

  27. [27]

    Towards Generalized Certified Robustness with Multi-Norm Training

    Enyi Jiang and Gagandeep Singh. Towards universal certified robustness with multi-norm training, 2024. URL https://arxiv.org/abs/2410.03000

  28. [28]

    Highly accurate protein structure prediction with alphafold.Nature, 596(7873):583–589, 2021

    John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunya- suvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold.Nature, 596(7873):583–589, 2021

  29. [29]

    Guy Katz, Derek Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, David Dill, Mykel Kochenderfer, and Clark Barrett.The Marabou Framework for Verification and Analysis of Deep Neural Networks, pages 443–452. 07 2019. ISBN 978-3-030-25539-8

  30. [30]

    Certifying llm safety against adversarial prompting.arXiv preprint arXiv:2309.02705, 2023

    Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, and Himabindu Lakkaraju. Certifying llm safety against adversarial prompting.arXiv preprint arXiv:2309.02705, 2023

  31. [31]

    Double sampling randomized smoothing

    Linyi Li, Jiawei Zhang, Tao Xie, and Bo Li. Double sampling randomized smoothing. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 13163–13208. PMLR, 17–23 Jul

  32. [32]

    URL https://proceedings.mlr.press/v162/li22aa.html

  33. [33]

    Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D Manning, Christopher Re, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu ...

  34. [34]

    GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

    Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. Gsm- symbolic: Understanding the limitations of mathematical reasoning in large language models, 2024. URL https: //arxiv.org/abs/2410.05229

  35. [35]

    Certified training: Small boxes are all you need

    Mark Niklas Mueller, Franziska Eckert, Marc Fischer, and Martin Vechev. Certified training: Small boxes are all you need. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=7oFuxtJtUMH

  36. [36]

    The enron corpus: Where the email bodies are buried?, 2020

    David Noever. The enron corpus: Where the email bodies are buried?, 2020. URL https://arxiv.org/abs/2001.10374

  37. [37]

    Behl, Rudy R

    Alessandro De Palma, Harkirat S. Behl, Rudy R. Bunel, Philip H. S. Torr, and M. Pawan Kumar. Scaling the convex barrier with active sets. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021

  38. [38]

    Pawan Kumar, Robert Stanforth, and Alessio Lomuscio

    Alessandro De Palma, Rudy R Bunel, Krishnamurthy Dj Dvijotham, M. Pawan Kumar, Robert Stanforth, and Alessio Lomuscio. Expressive losses for verified robustness via convex combinations. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=mzyZ4wzKlM

  39. [39]

    Red Teaming Language Models with Language Models

    Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. Red teaming language models with language models, 2022. URL https://arxiv.org/abs/2202.03286

  40. [40]

    Perplexity ai

    Perplexity. Perplexity ai. AI Chatbot, 2023. URL https://www.perplexity.ai/

  41. [41]

    Qwen2.5 Technical Report

    Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

  42. [42]

    Position: Formal methods are the principled foundation of safe AI

    Gagandeep Singh and Deepika Chawla. Position: Formal methods are the principled foundation of safe AI. InICML Workshop on Technical AI Governance (TAIG), 2025. URL https://openreview.net/forum?id=7V5CDSsjB7

  43. [43]

    Fast and effective robustness certification.Advances in Neural Information Processing Systems, 31, 2018

    Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification.Advances in Neural Information Processing Systems, 31, 2018

  44. [44]

    Beyond the single neuron convex barrier for neural network certification

    Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. InAdvances in Neural Information Processing Systems, 2019

  45. [45]

    An abstract domain for certifying neural networks

    Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL), 2019

  46. [46]

    Safety and trust in artificial intelligence with abstract interpretation.Found

    Gagandeep Singh, Jacob Laurel, Sasa Misailovic, Debangshu Banerjee, Avaljot Singh, Changming Xu, Shubham Ugare, and Huan Zhang. Safety and trust in artificial intelligence with abstract interpretation.Found. Trends Program. Lang., 8(3–4):250–408, June 2025. ISSN 2325-1107. doi: 10.1561/2500000062. URL https://doi.org/10.1561/2500000062

  47. [47]

    59 Shubham Ugare, Debangshu Banerjee, Sasa Misailovic, and Gagandeep Singh

    Shubham Ugare, Debangshu Banerjee, Sasa Misailovic, and Gagandeep Singh. Incremental verification of neural networks.Proc. ACM Program. Lang., 7(PLDI), June 2023. doi: 10.1145/3591299. URL https://doi.org/10.1145/3591299

  48. [48]

    Incremental randomized smoothing certification

    Shubham Ugare, Tarun Suresh, Debangshu Banerjee, Gagandeep Singh, and Sasa Misailovic. Incremental randomized smoothing certification. InThe Twelfth International Conference on Learning Representations, 2024. URL https: //openreview.net/forum?id=SdeAPV1irk

  49. [49]

    Itergen: Iterative semantic- aware structured llm generation with backtracking, 2025

    Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, and Sasa Misailovic. Itergen: Iterative semantic- aware structured llm generation with backtracking, 2025. URL https://arxiv.org/abs/2410.07295

  50. [50]

    61 Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana

    Caterina Urban and Antoine Miné. A review of formal methods applied to machine learning, 2021. URL https: //arxiv.org/abs/2104.02466

  51. [51]

    Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li

    Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li. Decodingtrust: A comprehensive assessment of trustworthiness in gpt models, 2024. URL https://arxiv.org/ab...

  52. [52]

    Quantifying risks in multi-turn conversation with large language models, 2025

    Chengxiao Wang, Isha Chaudhary, Qian Hu, Weitong Ruan, Rahul Gupta, and Gagandeep Singh. Quantifying risks in multi-turn conversation with large language models, 2025. URL https://arxiv.org/abs/2510.03969

  53. [53]

    Efficient formal safety analysis of neural networks

    Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Efficient formal safety analysis of neural networks. InAdvances in Neural Information Processing Systems, 2018

  54. [54]

    Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural network robustness verification

    Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural network robustness verification. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021. URL https://openrevi...

  55. [55]

    Automatic perturbation analysis for scalable certified robustness and beyond

    Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. InProceedings of the BEAVER: An Efficient Deterministic LLM Verifier 27 34th International Conference on Neural Information Processing Systems, NIPS’20, Red...

  56. [56]

    Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers

    Kaidi Xu, Huan Zhang, Shiqi Wang, Yihan Wang, Suman Jana, Xue Lin, and Cho-Jui Hsieh. Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=nVZtXBI6LNn

  57. [57]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

  58. [58]

    Quantifying distributional robustness of agentic tool-selection,

    Jehyeok Yeon, Isha Chaudhary, and Gagandeep Singh. Quantifying distributional robustness of agentic tool-selection,

  59. [59]

    URL https://arxiv.org/abs/2510.03992

  60. [60]

    Efficient neural network robustness certification with general activation functions.Advances in neural information processing systems, 31, 2018

    Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions.Advances in neural information processing systems, 31, 2018

  61. [61]

    General cutting planes for bound-propagation-based neural network verification

    Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. General cutting planes for bound-propagation-based neural network verification. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/ forum?id=5haAJAcofjc

  62. [62]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Maier, Daniel Liu, J Zachary Meng, Teodora Serrano, Matt Fredrikson, Pavol Mazeika, Jacob Stein- hardt, and Zico Kolter. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 28 Suresh et al. A Decoding Strategies Greedy decodingis a deterministic strategy that picks the highest pr...