arxiv: 2512.05439 · v2 · submitted 2025-12-05 · 💻 cs.AI · cs.FL

Recognition: 1 theorem link

· Lean Theorem

BEAVER: An Efficient Deterministic LLM Verifier

Tarun Suresh , Nalin Wadhwa , Debangshu Banerjee , Gagandeep Singh

Authors on Pith no claims yet

Pith reviewed 2026-05-17 01:54 UTC · model grok-4.3

classification 💻 cs.AI cs.FL

keywords LLM verificationsafety propertiesdeterministic boundstoken trieprobability boundsAI safetymodel risk

0 comments

The pith

BEAVER computes deterministic probability bounds on whether LLMs meet safety properties by exploring token sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BEAVER as a framework to verify large language models against safety properties using guaranteed probability bounds rather than estimates from sampling. It builds a token trie to represent possible outputs and maintains a frontier of partial sequences, updating sound lower and upper bounds on the probability of satisfying the property at each step. This replaces ad-hoc sampling with systematic search that is provably correct. A reader would care because it turns safety verification from an intuition into a calculation that can surface tail risks missed by looser methods while using far less computation.

Core claim

BEAVER is the first practical framework for computing deterministic, sound probability bounds on LLM satisfaction of safety properties. Given a prompt and any safety property, BEAVER systematically explores the model output space using novel Token trie and Frontier data structures, maintaining provably sound bounds at every iteration. The authors formalize the verification problem, prove soundness of the approach, and evaluate it on four safety properties across twelve open-weight LLMs, where it identifies two to three times more risky instances than baselines at one-tenth the compute budget.

What carries the argument

Token trie and Frontier data structures that systematically enumerate output sequences while tracking provably sound probability bounds on property satisfaction.

If this is right

Sound bounds replace sampling estimates for any safety property that can be checked on complete outputs.
More risky generations are surfaced than random sampling achieves at the same budget.
The bounds remain valid after every exploration step, allowing early stopping with guarantees.
The approach applies across multiple open-weight models and different safety properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same exploration structure could be adapted to verify other output properties such as factual correctness or style constraints.
When only black-box access is available, combining the method with targeted sampling might still yield useful approximate bounds.
Efficiency improvements open the door to running verification inside generation loops rather than after the fact.
Scaling to longer outputs would likely require pruning rules on the frontier to keep the search tractable.

Load-bearing premise

The LLM must allow token-by-token queries that return logits or probabilities, and safety properties must be expressible as predicates on finite output sequences.

What would settle it

Running the method on a black-box API that returns only complete strings without intermediate probabilities, and showing that sound bounds can no longer be maintained.

Figures

Figures reproduced from arXiv: 2512.05439 by Debangshu Banerjee, Gagandeep Singh, Nalin Wadhwa, Tarun Suresh.

**Figure 1.** Figure 1: BEAVER workflow for computing sound probability bounds. Given a prompt, language model, and a prefix-closed semantic constraint, BEAVER iteratively: (1) selects an incomplete leaf from the frontier, (2) expands it by querying the model and adding valid continuations to the token trie, and (3) updates the sound probability bounds [𝑃𝐿𝐵, 𝑃𝑈 𝐵] based on the new frontier state. thousand tokens and even moderate… view at source ↗

**Figure 2.** Figure 2: Evolution of the token trie T through three iterations of BEAVER on the bash command safety constraint. Starting from the empty trie, BEAVER expands nodes 𝑛0, 𝑛1, and 𝑛4 in sequence. Green nodes indicate incomplete sequences eligible for expansion, turquoise nodes indicate complete sequences (ending in ⟨eos⟩). Probability bounds tighten from [0.01, 0.9] after iteration 1, to [0.213, 0.815] after iteration … view at source ↗

**Figure 3.** Figure 3: Comparison of Avg probability bounds by BEAVER and Rejection Sampling over Forward Passes and Time for Qwen2.5-14B Instruct on GSM-Symbolic Dataset 6.1.3 Secure code generation [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

read the original abstract

As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify model outputs and characterize tail risk for safe deployment. While sampling-based estimates provide an ad-hoc intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM satisfaction of safety properties. Given a prompt & any safety property, BEAVER systematically explores the model output space using novel Token trie and Frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on 4 safety properties across 12 open-weight LLMs. BEAVER identifies 2-3x more risky instances compared to baselines while taking 1/10 of the compute budget, surfacing tail risks that loose bounds and ad-hoc evaluation misses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BEAVER's trie and frontier structures give a way to maintain sound bounds while exploring LLM outputs, with experiments showing better risk detection than sampling at lower cost, though efficiency likely depends on the properties.

read the letter

The punchline is that BEAVER uses a token trie and frontier data structures to explore LLM output possibilities in a way that maintains sound probability bounds on safety properties at every step, and their tests show it detects more risky outputs than baselines while using far less compute. What is new here is the specific algorithmic setup for deterministic verification instead of relying on sampling. The paper formalizes the verification problem and includes a soundness proof for the approach. They evaluate it on four safety properties with twelve different open-weight LLMs, which provides a decent range for seeing how it performs across models. The empirical results are the part that stands out as potentially practical. Finding two to three times more risky instances at one tenth the compute budget suggests that systematic exploration can be more effective than random sampling for surfacing tail risks. One place where it could be stronger is in addressing how the method scales with safety properties that are not easy to evaluate on partial outputs. If a predicate requires the complete sequence, the frontier might grow large before bounds can be tightened, and the efficiency claim might not hold as broadly as stated. The abstract does not go into detail on this, so the full paper needs to show how they handle general cases or restrict to suitable properties. This work is for people interested in making LLM deployments safer through better verification techniques rather than just more sampling. Anyone thinking about formal methods for generative models or regulated AI applications would get something out of reading it. The paper deserves a serious referee because it introduces a new framework with claimed guarantees and some experimental backing. I would recommend sending it to peer review so that the details of the proof and the predicate assumptions can be checked by experts.

Referee Report

2 major / 2 minor

Summary. The paper presents BEAVER, a framework that uses novel Token Trie and Frontier data structures to systematically explore the token-level output space of LLMs, computing deterministic sound probability bounds on satisfaction of arbitrary safety properties expressed as predicates over finite sequences. It formalizes the verification problem, proves soundness with bounds maintained at every iteration, and reports empirical results on 4 safety properties across 12 open-weight LLMs, claiming 2-3x more risky instance detections than baselines at 1/10 the compute budget.

Significance. If the central claims hold, the work provides a meaningful step toward guaranteed, non-sampling-based characterization of tail risks in LLM outputs, which is valuable for safe deployment. The formalization, soundness proof, and data-structure-driven exploration are strengths that distinguish it from ad-hoc sampling methods. The reported efficiency gains could be impactful if they generalize beyond the evaluated predicates.

major comments (2)

[Algorithm description and predicate integration (around the Frontier update and early-termination rules)] The efficiency and 'first practical framework' claims rest on the ability to prune or bound the frontier using partial-sequence evaluations of the safety predicate. For general predicates (e.g., semantic toxicity classifiers or checks over large forbidden-phrase sets) that are only decidable on complete sequences, the algorithm reduces to exhaustive expansion of all frontier paths to leaves before tightening the remaining-mass bound, restoring exponential cost in output length. No section explicitly analyzes this case or provides a complexity bound conditioned on predicate monotonicity or prefix-decidability.
[Soundness proof and experimental setup] The soundness proof is stated to maintain bounds at every iteration, but the manuscript does not detail how the predicate is encoded to support incremental checks or how early termination preserves the guarantee when the predicate cannot be evaluated on prefixes. This leaves the practical 1/10-compute claim only partially supported for the 'any safety property' scope asserted in the abstract.

minor comments (2)

[Preliminaries] Notation for the probability bounds and remaining mass could be clarified with a single running example that shows bound evolution over iterations.
[Experiments] The baseline comparison would benefit from explicit pseudocode or parameter settings for the sampling methods to ensure reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We value the recognition of the formalization, soundness proof, and data structures as distinguishing strengths. We respond point-by-point to the major comments below, offering clarifications on the framework's design and committing to revisions that will better delineate the conditions for efficiency while preserving the soundness guarantees.

read point-by-point responses

Referee: [Algorithm description and predicate integration (around the Frontier update and early-termination rules)] The efficiency and 'first practical framework' claims rest on the ability to prune or bound the frontier using partial-sequence evaluations of the safety predicate. For general predicates (e.g., semantic toxicity classifiers or checks over large forbidden-phrase sets) that are only decidable on complete sequences, the algorithm reduces to exhaustive expansion of all frontier paths to leaves before tightening the remaining-mass bound, restoring exponential cost in output length. No section explicitly analyzes this case or provides a complexity bound conditioned on predicate monotonicity or prefix-decidability.

Authors: We thank the referee for this precise observation. Our approach is defined for arbitrary predicates over finite sequences, and the Token Trie and Frontier structures maintain a conservative bound on unexplored probability mass at every step. When a predicate is only decidable on complete sequences, expansion continues until leaves are reached for those paths, after which the predicate is evaluated and the bound is tightened; however, the Frontier still permits early termination of the overall search once the remaining mass is below the target precision, without requiring full enumeration of all possible outputs. We agree that the manuscript would be strengthened by an explicit discussion of this distinction. In revision, we will add a subsection analyzing predicate classes (prefix-decidable, monotonic, or terminal-only) and the resulting complexity implications, including a high-level bound showing that the cost is exponential in sequence length only for the terminal-only case but remains practical due to mass-based pruning. revision: yes
Referee: [Soundness proof and experimental setup] The soundness proof is stated to maintain bounds at every iteration, but the manuscript does not detail how the predicate is encoded to support incremental checks or how early termination preserves the guarantee when the predicate cannot be evaluated on prefixes. This leaves the practical 1/10-compute claim only partially supported for the 'any safety property' scope asserted in the abstract.

Authors: The soundness result (Theorem 1) holds unconditionally: the upper and lower bounds on the probability of predicate satisfaction are invariants maintained by the Frontier updates, regardless of when or whether the predicate can be evaluated on a given prefix. If the predicate cannot be decided on a prefix, that path's full probability mass is retained in the Frontier and no predicate-based pruning is applied; termination occurs only when the total unexplored mass is sufficiently small, which conservatively over-approximates any possible violations in the remaining space. We will revise the manuscript to include a clearer description of the predicate interface (with examples of incremental vs. non-incremental encodings), additional pseudocode for the update rules, and an explicit statement that soundness is independent of prefix decidability while efficiency gains are largest for predicates admitting early decisions. The reported 1/10-compute results are for the four evaluated properties, which admit varying degrees of prefix information; we will add this qualification to the experimental section. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic formalization with independent soundness proof

full rationale

The paper describes BEAVER as an algorithmic procedure that systematically explores LLM output space via Token trie and Frontier data structures while maintaining provably sound probability bounds. It explicitly states that the verification problem is formalized and soundness is proven, with evaluation performed on concrete safety properties across models. No step reduces a claimed result to a fitted parameter, self-defined quantity, or self-citation chain by construction; the central claims rest on data-structure invariants and the formalization rather than on any enumerated circular pattern. The method is therefore self-contained against external benchmarks such as sampling-based estimates.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the existence of a formalization of the verification problem that admits a sound exploration procedure; the paper introduces two new data structures whose invariants must hold for the bounds to remain valid.

axioms (2)

domain assumption The LLM can be queried for next-token probabilities at any prefix, enabling systematic enumeration of the output space.
Required for the token-trie exploration to be feasible; stated implicitly by the method description.
domain assumption Safety properties can be decided on finite output sequences.
Necessary for the predicate to be evaluable during search.

invented entities (2)

Token trie no independent evidence
purpose: Compact representation of possible output sequences for bound maintenance
New data structure introduced to support the deterministic search.
Frontier no independent evidence
purpose: Data structure that tracks unexplored branches while preserving sound probability bounds
New data structure introduced to support the deterministic search.

pith-pipeline@v0.9.0 · 5454 in / 1492 out tokens · 34852 ms · 2026-05-17T01:54:34.219136+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM satisfaction of safety properties... using novel Token trie and Frontier data structures

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 11 internal anchors

[1]

Gpt-4 technical report, 2023

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report, 2023

work page 2023
[2]

Strong mixed-integer programming formulations for trained neural networks.Mathematical Programming, 2020

Ross Anderson, Joey Huchette, Will Ma, Christian Tjandraatmadja, and Juan Pablo Vielma. Strong mixed-integer programming formulations for trained neural networks.Mathematical Programming, 2020

work page 2020
[3]

Stanley Bak, Hoang-Dung Tran, Kerianne Hobbs, and Taylor T. Johnson. Improved geometric path enumeration for verifying relu neural networks. In Shuvendu K. Lahiri and Chao Wang, editors,Computer Aided Verification - 32nd International Conference, CA V 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part I, volume 12224 ofLecture Notes in Comput...

work page doi:10.1007/978-3-030-53288-8 2020
[4]

Certifying geometric robustness of neural networks

Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, and Martin Vechev. Certifying geometric robustness of neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https: //proceedings.neurips....

work page 2019
[5]

Interpreting robustness proofs of deep neural networks

Debangshu Banerjee, Avaljot Singh, and Gagandeep Singh. Interpreting robustness proofs of deep neural networks. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id= Ev10F9TWML

work page 2024
[6]

Crane: Reasoning with constrained llm generation, 2025

Debangshu Banerjee, Tarun Suresh, Shubham Ugare, Sasa Misailovic, and Gagandeep Singh. Crane: Reasoning with constrained llm generation, 2025. URL https://arxiv.org/abs/2502.09061

work page arXiv 2025
[7]

Purple llama cyberseceval: A secure coding benchmark for language models, 2023

Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, and Joshua Saxe. Purple l...

work page arXiv 2023
[8]

Branch and bound for piecewise linear neural network verification.Journal of Machine Learning Research, 21(2020), 2020

Rudy Bunel, Jingyue Lu, Ilker Turkaslan, Pushmeet Kohli, P Torr, and P Mudigonda. Branch and bound for piecewise linear neural network verification.Journal of Machine Learning Research, 21(2020), 2020

work page 2020
[9]

An efficient nonconvex re- formulation of stagewise convex optimization problems.Advances in Neural Information Processing Systems, 33, 2020

Rudy R Bunel, Oliver Hinder, Srinadh Bhojanapalli, and Krishnamurthy Dvijotham. An efficient nonconvex re- formulation of stagewise convex optimization problems.Advances in Neural Information Processing Systems, 33, 2020

work page 2020
[10]

Nlp verification: towards a general methodology for certifying robustness.European Journal of Applied Mathematics, pages 1–58, 2025

Marco Casadio, Tanvi Dinkar, Ekaterina Komendantskaya, Luca Arnaboldi, Matthew L Daggitt, Omri Isac, Guy Katz, Verena Rieser, and Oliver Lemon. Nlp verification: towards a general methodology for certifying robustness.European Journal of Applied Mathematics, pages 1–58, 2025

work page 2025
[11]

Quantitative certification of knowledge comprehension in llms

Isha Chaudhary, Vedaant V Jain, and Gagandeep Singh. Quantitative certification of knowledge comprehension in llms. InICLR 2024 Workshop on Secure and Trustworthy Large Language Models

work page 2024
[12]

Quantitative certification of bias in large language models.arXiv e-prints, pages arXiv–2405, 2024

Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, and Gagandeep Singh. Quantitative certification of bias in large language models.arXiv e-prints, pages arXiv–2405, 2024

work page 2024
[13]

Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V

Yuri Chervonyi, Trieu H. Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V. Le, and Thang Luong. Gold-medalist performance in solving olympiad geometry with alphageometry2, 2025. URL https://arxiv.org/abs/2502.03544

work page arXiv 2025
[14]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems, 2021. URL https://arxiv.org/abs/2110.14168

work page internal anchor Pith review Pith/arXiv arXiv 2021
[15]

Certified adversarial robustness via randomized smoothing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 1310–1320. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.pres...

work page 2019
[16]

File searching using variable length keys

Rene De La Briandais. File searching using variable length keys. InPapers Presented at the the March 3-5, 1959, Western Joint Computer Conference, IRE-AIEE-ACM ’59 (Western), page 295–298, New York, NY, USA, 1959. Association for Computing Machinery. ISBN 9781450378659. doi: 10.1145/1457838.1457895. URL https://doi.org/10.1145/1457838. 1457895

work page doi:10.1145/1457838.1457895 1959
[17]

Z3: an efficient smt solver

Leonardo De Moura and Nikolaj Bjørner. Z3: an efficient smt solver. InProceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, page 337–340, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 3540787992

work page 2008
[18]

Formal verification of piece-wise linear feed-forward neural networks

Ruediger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. InInternational Symposium on Automated Technology for Verification and Analysis, 2017

work page 2017
[19]

Shh, don’t say that! domain certification in llms.arXiv preprint arXiv:2502.19320, 24 Suresh et al

Cornelius Emde, Alasdair Paren, Preetham Arvind, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Bernard Ghanem, Philip HS Torr, and Adel Bibi. Shh, don’t say that! domain certification in llms.arXiv preprint arXiv:2502.19320, 24 Suresh et al. 2025

work page arXiv 2025
[20]

Complete verification via multi-neuron relaxation guided branch-and-bound

Claudio Ferrari, Mark Niklas Mueller, Nikola Jovanović, and Martin Vechev. Complete verification via multi-neuron relaxation guided branch-and-bound. InInternational Conference on Learning Representations, 2022. URL https: //openreview.net/forum?id=l_amHf1oaK

work page 2022
[21]

Fast geometric projections for local robustness certification

Aymeric Fromherz, Klas Leino, Matt Fredrikson, Bryan Parno, and Corina Pasareanu. Fast geometric projections for local robustness certification. InInternational Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=zWy1uxjDdZJ

work page 2021
[22]

Ai2: Safety and robustness certification of neural networks with abstract interpretation

Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In2018 IEEE Symposium on Security and Privacy (SP), 2018

work page 2018
[23]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava S...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration, 2020. URL https://arxiv.org/abs/1904.09751

work page internal anchor Pith review Pith/arXiv arXiv 2020
[25]

Are large pre-trained language models leaking your personal information?, 2022

Jie Huang, Hanyin Shao, and Kevin Chen-Chuan Chang. Are large pre-trained language models leaking your personal information?, 2022. URL https://arxiv.org/abs/2205.12628

work page arXiv 2022
[26]

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen. Catastrophic jailbreak of open-source llms via exploiting generation, 2023. URL https://arxiv.org/abs/2310.06987

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Towards Generalized Certified Robustness with Multi-Norm Training

Enyi Jiang and Gagandeep Singh. Towards universal certified robustness with multi-norm training, 2024. URL https://arxiv.org/abs/2410.03000

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Highly accurate protein structure prediction with alphafold.Nature, 596(7873):583–589, 2021

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunya- suvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold.Nature, 596(7873):583–589, 2021

work page 2021
[29]

Guy Katz, Derek Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, David Dill, Mykel Kochenderfer, and Clark Barrett.The Marabou Framework for Verification and Analysis of Deep Neural Networks, pages 443–452. 07 2019. ISBN 978-3-030-25539-8

work page 2019
[30]

Certifying llm safety against adversarial prompting.arXiv preprint arXiv:2309.02705, 2023

Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, and Himabindu Lakkaraju. Certifying llm safety against adversarial prompting.arXiv preprint arXiv:2309.02705, 2023

work page arXiv 2023
[31]

Double sampling randomized smoothing

Linyi Li, Jiawei Zhang, Tao Xie, and Bo Li. Double sampling randomized smoothing. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 13163–13208. PMLR, 17–23 Jul

work page
[32]

URL https://proceedings.mlr.press/v162/li22aa.html

work page
[33]

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D Manning, Christopher Re, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu ...

work page 2023
[34]

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. Gsm- symbolic: Understanding the limitations of mathematical reasoning in large language models, 2024. URL https: //arxiv.org/abs/2410.05229

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Certified training: Small boxes are all you need

Mark Niklas Mueller, Franziska Eckert, Marc Fischer, and Martin Vechev. Certified training: Small boxes are all you need. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=7oFuxtJtUMH

work page 2023
[36]

The enron corpus: Where the email bodies are buried?, 2020

David Noever. The enron corpus: Where the email bodies are buried?, 2020. URL https://arxiv.org/abs/2001.10374

work page arXiv 2020
[37]

Behl, Rudy R

Alessandro De Palma, Harkirat S. Behl, Rudy R. Bunel, Philip H. S. Torr, and M. Pawan Kumar. Scaling the convex barrier with active sets. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021

work page 2021
[38]

Pawan Kumar, Robert Stanforth, and Alessio Lomuscio

Alessandro De Palma, Rudy R Bunel, Krishnamurthy Dj Dvijotham, M. Pawan Kumar, Robert Stanforth, and Alessio Lomuscio. Expressive losses for verified robustness via convex combinations. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=mzyZ4wzKlM

work page 2024
[39]

Red Teaming Language Models with Language Models

Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. Red teaming language models with language models, 2022. URL https://arxiv.org/abs/2202.03286

work page internal anchor Pith review Pith/arXiv arXiv 2022
[40]

Perplexity ai

Perplexity. Perplexity ai. AI Chatbot, 2023. URL https://www.perplexity.ai/

work page 2023
[41]

Qwen2.5 Technical Report

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

Position: Formal methods are the principled foundation of safe AI

Gagandeep Singh and Deepika Chawla. Position: Formal methods are the principled foundation of safe AI. InICML Workshop on Technical AI Governance (TAIG), 2025. URL https://openreview.net/forum?id=7V5CDSsjB7

work page 2025
[43]

Fast and effective robustness certification.Advances in Neural Information Processing Systems, 31, 2018

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification.Advances in Neural Information Processing Systems, 31, 2018

work page 2018
[44]

Beyond the single neuron convex barrier for neural network certification

Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. InAdvances in Neural Information Processing Systems, 2019

work page 2019
[45]

An abstract domain for certifying neural networks

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL), 2019

work page 2019
[46]

Safety and trust in artificial intelligence with abstract interpretation.Found

Gagandeep Singh, Jacob Laurel, Sasa Misailovic, Debangshu Banerjee, Avaljot Singh, Changming Xu, Shubham Ugare, and Huan Zhang. Safety and trust in artificial intelligence with abstract interpretation.Found. Trends Program. Lang., 8(3–4):250–408, June 2025. ISSN 2325-1107. doi: 10.1561/2500000062. URL https://doi.org/10.1561/2500000062

work page doi:10.1561/2500000062 2025
[47]

59 Shubham Ugare, Debangshu Banerjee, Sasa Misailovic, and Gagandeep Singh

Shubham Ugare, Debangshu Banerjee, Sasa Misailovic, and Gagandeep Singh. Incremental verification of neural networks.Proc. ACM Program. Lang., 7(PLDI), June 2023. doi: 10.1145/3591299. URL https://doi.org/10.1145/3591299

work page doi:10.1145/3591299 2023
[48]

Incremental randomized smoothing certification

Shubham Ugare, Tarun Suresh, Debangshu Banerjee, Gagandeep Singh, and Sasa Misailovic. Incremental randomized smoothing certification. InThe Twelfth International Conference on Learning Representations, 2024. URL https: //openreview.net/forum?id=SdeAPV1irk

work page 2024
[49]

Itergen: Iterative semantic- aware structured llm generation with backtracking, 2025

Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, and Sasa Misailovic. Itergen: Iterative semantic- aware structured llm generation with backtracking, 2025. URL https://arxiv.org/abs/2410.07295

work page arXiv 2025
[50]

61 Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana

Caterina Urban and Antoine Miné. A review of formal methods applied to machine learning, 2021. URL https: //arxiv.org/abs/2104.02466

work page arXiv 2021
[51]

Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li

Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li. Decodingtrust: A comprehensive assessment of trustworthiness in gpt models, 2024. URL https://arxiv.org/ab...

work page arXiv 2024
[52]

Quantifying risks in multi-turn conversation with large language models, 2025

Chengxiao Wang, Isha Chaudhary, Qian Hu, Weitong Ruan, Rahul Gupta, and Gagandeep Singh. Quantifying risks in multi-turn conversation with large language models, 2025. URL https://arxiv.org/abs/2510.03969

work page arXiv 2025
[53]

Efficient formal safety analysis of neural networks

Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Efficient formal safety analysis of neural networks. InAdvances in Neural Information Processing Systems, 2018

work page 2018
[54]

Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural network robustness verification

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural network robustness verification. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021. URL https://openrevi...

work page 2021
[55]

Automatic perturbation analysis for scalable certified robustness and beyond

Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. InProceedings of the BEAVER: An Efficient Deterministic LLM Verifier 27 34th International Conference on Neural Information Processing Systems, NIPS’20, Red...

work page 2020
[56]

Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers

Kaidi Xu, Huan Zhang, Shiqi Wang, Yihan Wang, Suman Jana, Xue Lin, and Cho-Jui Hsieh. Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=nVZtXBI6LNn

work page 2021
[57]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[58]

Quantifying distributional robustness of agentic tool-selection,

Jehyeok Yeon, Isha Chaudhary, and Gagandeep Singh. Quantifying distributional robustness of agentic tool-selection,

work page
[59]

URL https://arxiv.org/abs/2510.03992

work page internal anchor Pith review Pith/arXiv arXiv
[60]

Efficient neural network robustness certification with general activation functions.Advances in neural information processing systems, 31, 2018

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions.Advances in neural information processing systems, 31, 2018

work page 2018
[61]

General cutting planes for bound-propagation-based neural network verification

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. General cutting planes for bound-propagation-based neural network verification. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/ forum?id=5haAJAcofjc

work page 2022
[62]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Maier, Daniel Liu, J Zachary Meng, Teodora Serrano, Matt Fredrikson, Pavol Mazeika, Jacob Stein- hardt, and Zico Kolter. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 28 Suresh et al. A Decoding Strategies Greedy decodingis a deterministic strategy that picks the highest pr...

work page internal anchor Pith review Pith/arXiv arXiv 2023