BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Alireza Taban; Ali Taheri; Ashutosh Trivedi; Sadegh Soudjani

arxiv: 2511.09363 · v2 · submitted 2025-11-12 · 💻 cs.AI · cs.SY· eess.SY

BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

Ali Taheri , Alireza Taban , Sadegh Soudjani , Ashutosh Trivedi This is my paper

Pith reviewed 2026-05-17 22:33 UTC · model grok-4.3

classification 💻 cs.AI cs.SYeess.SY

keywords barrier certificatesdynamical systemssafety verificationlarge language modelsSMT verificationagentic frameworkbenchmark

0 comments

The pith

Large language models can synthesize valid barrier certificates for safety verification across diverse dynamical systems with over 90 percent success.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether language models can capture and apply the expert reasoning needed to find mathematical functions that certify safety in dynamical systems. Traditional synthesis relies on manual template choices, exhaustive searches, and deep domain knowledge that does not scale well. The authors respond with an agentic framework in which the model reasons in natural language to propose and refine candidate certificates, then hands them to an SMT solver for formal checks, including joint synthesis of controllers. They evaluate the approach on a new collection of 100 systems that covers linear and nonlinear dynamics in both discrete and continuous time. The results show the framework produces valid certificates in more than 90 percent of cases, while also testing retrieval augmentation and multi-agent coordination to raise reliability.

Core claim

An LLM-based agentic framework that employs natural language reasoning to discover templates, propose barrier certificates, and refine them through iterative interaction, combined with SMT-based formal verification, generates valid certificates for more than 90 percent of the 100 dynamical systems in BarrierBench and supports consistent co-synthesis of barriers and controllers.

What carries the argument

The agentic framework that pairs LLM-driven natural language reasoning for certificate proposal and refinement with SMT solver verification.

If this is right

Safety certificate generation becomes far less dependent on manual template design and hyperparameter tuning by specialists.
Barrier and controller synthesis can be performed together to guarantee their mutual consistency.
A public benchmark now exists for systematically comparing language-model approaches against traditional methods on the same set of systems.
Retrieval-augmented generation and agentic coordination measurably raise the reliability of the LLM component.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pattern of language-model proposal plus solver verification could be tested on other certificate problems such as Lyapunov stability or reachability.
Releasing the benchmark creates a shared testbed that lets researchers measure progress in combining informal reasoning with formal tools.
If the approach holds on larger or stochastic systems, it could lower the barrier for engineers to obtain machine-checked safety guarantees without full theoretical training.

Load-bearing premise

That proposals generated through the model's natural language reasoning will be mathematically sound enough for the SMT solver to confirm them correctly without overlooking real safety violations or accepting flawed certificates.

What would settle it

A dynamical system in BarrierBench or a similar new example where a certificate accepted by the SMT solver is later shown by direct simulation or manual analysis to permit the system to reach an unsafe state.

Figures

Figures reproduced from arXiv: 2511.09363 by Alireza Taban, Ali Taheri, Ashutosh Trivedi, Sadegh Soudjani.

**Figure 1.** Figure 1: Architecture of the agentic framework integrating Retrieval, Synthesis, and Verifier agents for LLM-guided [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Safety verification of dynamical systems via barrier certificates is essential for ensuring correctness in autonomous applications. Synthesizing these certificates involves discovering mathematical functions with current methods suffering from poor scalability, dependence on carefully designed templates, and exhaustive or incremental function-space searches. They also demand substantial manual expertise--selecting templates, solvers, and hyperparameters, and designing sampling strategies--requiring both theoretical and practical knowledge traditionally shared through linguistic reasoning rather than formalized methods. This motivates a key question: can such expert reasoning be captured and operationalized by language models? We address this by introducing an LLM-based agentic framework for barrier certificate synthesis. The framework uses natural language reasoning to propose, refine, and validate candidate certificates, integrating LLM-driven template discovery with SMT-based verification, and supporting barrier-controller co-synthesis to ensure consistency between safety certificates and controllers. To evaluate this capability, we introduce BarrierBench, a benchmark of 100 dynamical systems spanning linear, nonlinear, discrete-time, and continuous-time settings. Our experiments assess not only the effectiveness of LLM-guided barrier synthesis but also the utility of retrieval-augmented generation and agentic coordination strategies in improving its reliability and performance. Across these tasks, the framework achieves more than 90% success in generating valid certificates. By releasing BarrierBench and the accompanying toolchain, we aim to establish a community testbed for advancing the integration of language-based reasoning with formal verification in dynamical systems. The benchmark is publicly available at https://hycodev.com/dataset/barrierbench

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BarrierBench introduces a useful new test set and LLM pipeline for barrier certificate synthesis, but the 90% success claim rests on thin evaluation details and potentially shaky SMT checks for nonlinear cases.

read the letter

The paper brings two concrete things: BarrierBench, a collection of 100 dynamical systems that covers linear, nonlinear, discrete-time, and continuous-time examples, and an agentic LLM setup that uses natural-language reasoning to suggest templates and candidate barrier functions, then hands them to an SMT solver for verification while also trying co-synthesis of the controller. That combination has not shown up in the earlier literature they cite, so the benchmark and the specific pipeline are the actual additions. Releasing the dataset and toolchain is a practical step that could let other groups test their own ideas against the same systems. The motivation is also reasonable: barrier-certificate work has long required manual template choice and solver tuning, and trying to capture some of that reasoning in language models is a direct response to that bottleneck. The reported success rate above 90 percent is the headline number, but the abstract gives almost no information on what counts as a valid certificate, what the failure modes were, or how the method compares to standard template-based or optimization-based baselines. Without those pieces it is difficult to judge whether the framework is genuinely moving the needle or mainly succeeding on the easier instances in the set. The stress-test concern about SMT soundness is worth taking seriously here. For continuous nonlinear systems the key condition involves the Lie derivative being non-positive over a region, and nonlinear real arithmetic is not always decidable in current SMT tools; they can return unknown or accept encodings that miss subtle violations. If the full paper shows explicit handling of these cases or reports how often the solver was inconclusive, that would strengthen the result. Otherwise it remains a soft spot in the central claim. This is the kind of paper that belongs in a reading group focused on AI for formal methods or safe control. It has enough new material and a clear release to justify sending it to peer review, provided the authors add clearer metrics, baselines, and discussion of solver limitations in the evaluation section.

Referee Report

2 major / 2 minor

Summary. The paper introduces BarrierBench, a benchmark of 100 dynamical systems spanning linear/nonlinear and discrete/continuous-time settings, to evaluate an LLM-based agentic framework for barrier certificate synthesis. The framework uses natural language reasoning to propose and refine candidate certificates (including templates), integrates this with SMT-based verification, and supports barrier-controller co-synthesis. Experiments assess LLM-guided synthesis along with retrieval-augmented generation and agentic coordination, claiming more than 90% success in generating valid certificates.

Significance. If the >90% success rate is shown to rest on sound verification without false positives from SMT encodings or hallucinations, the work could meaningfully reduce manual expertise required for barrier certificate synthesis and provide a reusable testbed for combining language-model reasoning with formal methods in dynamical systems safety. The public release of BarrierBench and the toolchain is a clear strength for reproducibility.

major comments (2)

[Abstract and §5] Abstract and §5 (Experiments): the central claim of >90% success in generating valid certificates provides no details on the precise success criteria, how SMT solver outcomes (including 'unknown' for nonlinear real arithmetic) are treated as success or failure, failure modes, baseline comparisons, or statistical significance testing. This directly affects assessment of the performance result.
[§4] Verification module (described in §4): for continuous-time nonlinear systems in BarrierBench, the Lie derivative condition Ḃ(x) ≤ 0 over the domain is encoded into SMT; the manuscript must specify the exact encoding, any relaxations or bounds used, and how 'unknown' or inconclusive results are resolved, because solver limitations in nonlinear real arithmetic are a load-bearing risk for false-positive valid certificates.

minor comments (2)

[Tables and Figures] Figure captions and Table 1 could more explicitly list the distribution of linear vs. nonlinear and discrete vs. continuous systems to allow readers to assess coverage.
[Preliminaries] Notation for barrier functions and Lie derivatives should be introduced with a short definitions subsection to aid readers outside control theory.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity for the success metrics and verification soundness. We address each major comment below and will revise the manuscript accordingly to provide the requested details.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (Experiments): the central claim of >90% success in generating valid certificates provides no details on the precise success criteria, how SMT solver outcomes (including 'unknown' for nonlinear real arithmetic) are treated as success or failure, failure modes, baseline comparisons, or statistical significance testing. This directly affects assessment of the performance result.

Authors: We agree that the current presentation of the >90% figure would benefit from greater precision. Success is defined in our experiments as the SMT solver returning a definitive result confirming that the proposed barrier certificate satisfies all conditions (no counterexamples found for the safety invariants). 'Unknown' outcomes from the solver, particularly for nonlinear real arithmetic, are treated as failures. We will revise §5 to include an explicit definition of these criteria, a breakdown of observed failure modes (such as template invalidity or solver timeouts), comparisons against non-LLM baselines like fixed template libraries, and a note that statistical significance testing was not applied because BarrierBench is a fixed, deterministic collection of 100 systems with per-instance results reported for transparency. revision: yes
Referee: [§4] Verification module (described in §4): for continuous-time nonlinear systems in BarrierBench, the Lie derivative condition Ḃ(x) ≤ 0 over the domain is encoded into SMT; the manuscript must specify the exact encoding, any relaxations or bounds used, and how 'unknown' or inconclusive results are resolved, because solver limitations in nonlinear real arithmetic are a load-bearing risk for false-positive valid certificates.

Authors: The referee rightly flags the need for explicit encoding details to support soundness claims. In the framework, the Lie derivative condition for continuous nonlinear systems is encoded by restricting the state domain to compact sets and applying polynomial relaxations or interval bounds to render the query amenable to SMT solvers such as Z3. 'Unknown' or inconclusive solver results are always resolved as invalid certificates, ensuring we report only those cases where the solver provides a conclusive confirmation. We will expand §4 with a dedicated subsection detailing the exact SMT encoding, the specific relaxations and bounds employed, and the conservative handling of inconclusive outcomes to eliminate any ambiguity regarding false positives. revision: yes

Circularity Check

0 steps flagged

No circularity: success measured on independent new benchmark via external SMT verification

full rationale

The paper introduces BarrierBench as a new benchmark of 100 dynamical systems and reports an empirical >90% success rate for its LLM-agentic framework in producing certificates that pass SMT verification. No derivation chain, equation, or result reduces by construction to fitted parameters, self-defined quantities, or prior self-citations. The central claim rests on external verification of LLM-proposed candidates against the benchmark systems rather than on quantities defined from the same data or method tuning, rendering the evaluation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that large language models can operationalize expert linguistic reasoning for mathematical template discovery and refinement in barrier certificate synthesis; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Natural language reasoning by LLMs can capture and operationalize the theoretical and practical knowledge traditionally required for selecting templates, solvers, and sampling strategies in barrier certificate synthesis.
Invoked in the motivation and framework description as the justification for using LLMs instead of exhaustive search or hand-designed templates.

pith-pipeline@v0.9.0 · 5586 in / 1248 out tokens · 31152 ms · 2026-05-17T22:33:39.923055+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LLM-guided template discovery with SMT-based verification... agentic architecture... Barrier Retrieval Agent... Synthesis Agent... Verifier Agent
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Barrier certificate conditions: B(x)≤0 on XI, B(x)>0 on XU, Lie derivative <0 on level set

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[3]

CRITICAL:Use ONLY real numbers in the barrier expression

{CONDITION_3} Be very careful - don’t make it more complex than needed. CRITICAL:Use ONLY real numbers in the barrier expression. No variables like ’c’ or ’ε’. Solve specifically for THIS problem with appropriate coefficients. Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): BAR...

work page
[8]

Design both barrier certificate B(x) AND controller expressions that work together to satisfy all conditions

{CONDITION_3} Be very careful - don’t make it more complex than needed. Design both barrier certificate B(x) AND controller expressions that work together to satisfy all conditions. CRITICAL: - Use ONLY real numbers in both barrier and controller expressions. No variables like ’c’ or ’ε’. - Solve specifically for THIS problem with appropriate coefficients...

work page
[11]

You can change structure of TEMPLATE if needed

{CONDITION_3} Learn from previous failures. You can change structure of TEMPLATE if needed. In this step, the goal is to improve the structure of the templates, not refine the parameters. CRITICAL:Use ONLY real numbers in the barrier expression. No variables like ’c’ or ’ε’. Solve specifically for THIS problem with appropriate coefficients. Analyze carefu...

work page
[12]

Barrier certificate B(x)

work page
[13]

Design a barrier certificate B(x) that satisfies:

Controller expressions for {CONTROLLER_PARAMETERS} The controller u(x) will be substituted into dynamics to create closed-loop system. Design a barrier certificate B(x) that satisfies:

work page
[14]

B(x)≤0 in initial set 13 BarrierBench: Evaluating LLMs for Safety Verification in Dynamical SystemsA PREPRINT

work page
[16]

You can change structure of TEMPLATE if needed

{CONDITION_3} Learn from previous failures. You can change structure of TEMPLATE if needed. In this step, the goal is to improve the structure of the templates, not refine the parameters. Design both barrier certificate B(x) AND controller expressions that work together to satisfy all conditions. CRITICAL: - Use ONLY real numbers in both barrier and contr...

work page
[19]

Give precise answer without long explanations

{CONDITION_3} Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): REFINED_BARRIER: [expression with numbers only] A.4.2 Systems With Control Inputs Coefficient Refinement - Barrier and Controller Original barrier: {BARRIER}, Original controller: {CONTROLLER}, {FAILED_INFO} {REFINEM...

work page
[26]

Give precise answer without long explanations

{CONDITION_3} Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): REFINED_BARRIER: [barrier expression with numbers only] REFINED_CONTROLLER: [controller expressions for each parameter, comma-separated] A.5 Prompt 7: Structure Refinement in Subsequent Iterations For refinement iter...

work page
[29]

Give precise answer without long explanations

{CONDITION_3} Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): REFINED_BARRIER: [expression with numbers only] 15 BarrierBench: Evaluating LLMs for Safety Verification in Dynamical SystemsA PREPRINT A.5.2 Systems With Control Input Structure Refinement - Barrier and Controller O...

work page
[30]

Controller parameters: {CONTROLLER_PARAMETERS}

work page
[31]

Use smooth, bounded functions (avoid extremely large values)

work page
[32]

Controller must work harmoniously with the barrier

work page
[33]

Ensure closed-loop stability Requirements:

work page
[34]

B(x)≤0 in initial set

work page
[35]

B(x) > 0 in unsafe set

work page
[36]

Give precise answer without long explanations

{CONDITION_3} Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): REFINED_BARRIER: [barrier expression with numbers only] REFINED_CONTROLLER: [controller expressions for each parameter, comma-separated] Refinement History Format: Refinement 1: {BARRIER} {FAILED_INFO} Refinement 2: ...

work page

[1] [3]

CRITICAL:Use ONLY real numbers in the barrier expression

{CONDITION_3} Be very careful - don’t make it more complex than needed. CRITICAL:Use ONLY real numbers in the barrier expression. No variables like ’c’ or ’ε’. Solve specifically for THIS problem with appropriate coefficients. Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): BAR...

work page

[2] [8]

Design both barrier certificate B(x) AND controller expressions that work together to satisfy all conditions

{CONDITION_3} Be very careful - don’t make it more complex than needed. Design both barrier certificate B(x) AND controller expressions that work together to satisfy all conditions. CRITICAL: - Use ONLY real numbers in both barrier and controller expressions. No variables like ’c’ or ’ε’. - Solve specifically for THIS problem with appropriate coefficients...

work page

[3] [11]

You can change structure of TEMPLATE if needed

{CONDITION_3} Learn from previous failures. You can change structure of TEMPLATE if needed. In this step, the goal is to improve the structure of the templates, not refine the parameters. CRITICAL:Use ONLY real numbers in the barrier expression. No variables like ’c’ or ’ε’. Solve specifically for THIS problem with appropriate coefficients. Analyze carefu...

work page

[4] [12]

Barrier certificate B(x)

work page

[5] [13]

Design a barrier certificate B(x) that satisfies:

Controller expressions for {CONTROLLER_PARAMETERS} The controller u(x) will be substituted into dynamics to create closed-loop system. Design a barrier certificate B(x) that satisfies:

work page

[6] [14]

B(x)≤0 in initial set 13 BarrierBench: Evaluating LLMs for Safety Verification in Dynamical SystemsA PREPRINT

work page

[7] [16]

You can change structure of TEMPLATE if needed

{CONDITION_3} Learn from previous failures. You can change structure of TEMPLATE if needed. In this step, the goal is to improve the structure of the templates, not refine the parameters. Design both barrier certificate B(x) AND controller expressions that work together to satisfy all conditions. CRITICAL: - Use ONLY real numbers in both barrier and contr...

work page

[8] [19]

Give precise answer without long explanations

{CONDITION_3} Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): REFINED_BARRIER: [expression with numbers only] A.4.2 Systems With Control Inputs Coefficient Refinement - Barrier and Controller Original barrier: {BARRIER}, Original controller: {CONTROLLER}, {FAILED_INFO} {REFINEM...

work page

[9] [26]

Give precise answer without long explanations

{CONDITION_3} Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): REFINED_BARRIER: [barrier expression with numbers only] REFINED_CONTROLLER: [controller expressions for each parameter, comma-separated] A.5 Prompt 7: Structure Refinement in Subsequent Iterations For refinement iter...

work page

[10] [29]

Give precise answer without long explanations

{CONDITION_3} Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): REFINED_BARRIER: [expression with numbers only] 15 BarrierBench: Evaluating LLMs for Safety Verification in Dynamical SystemsA PREPRINT A.5.2 Systems With Control Input Structure Refinement - Barrier and Controller O...

work page

[11] [30]

Controller parameters: {CONTROLLER_PARAMETERS}

work page

[12] [31]

Use smooth, bounded functions (avoid extremely large values)

work page

[13] [32]

Controller must work harmoniously with the barrier

work page

[14] [33]

Ensure closed-loop stability Requirements:

work page

[15] [34]

B(x)≤0 in initial set

work page

[16] [35]

B(x) > 0 in unsafe set

work page

[17] [36]

Give precise answer without long explanations

{CONDITION_3} Analyze carefully but be concise. Give precise answer without long explanations. Format your response as (don’t make it bold): REFINED_BARRIER: [barrier expression with numbers only] REFINED_CONTROLLER: [controller expressions for each parameter, comma-separated] Refinement History Format: Refinement 1: {BARRIER} {FAILED_INFO} Refinement 2: ...

work page