Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Parand A. Alamdari; Sheila A. McIlraith; Toryn Q. Klassen

arxiv: 2605.16198 · v1 · pith:J4SFV5DYnew · submitted 2026-05-15 · 💻 cs.AI · cs.CY· cs.LG· cs.LO

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Parand A. Alamdari , Toryn Q. Klassen , Sheila A. McIlraith This is my paper

Pith reviewed 2026-05-20 17:20 UTC · model grok-4.3

classification 💻 cs.AI cs.CYcs.LGcs.LO

keywords AI governanceformal methodslinear temporal logicLLM auditingruntime monitoringbehavioral compliancesafety constraintspredictive intervention

0 comments

The pith

Formal methods using Linear Temporal Logic outperform LLM-based methods for auditing and monitoring compliance in advanced AI systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops techniques that merge formal methods and machine learning to audit and monitor AI products for compliance with rules and safety constraints throughout their lifecycle. It focuses on using Linear Temporal Logic to specify and check temporally extended behaviors in black-box LLMs. The approach includes offline auditing, online monitoring, predictive sampling, and intervening actions to prevent violations. Experiments indicate these methods detect violations more effectively than relying on LLMs for judgment, with small models performing comparably to large ones. Predictive and intervening monitors cut down on violations in AI agents while keeping task performance mostly the same.

Core claim

By exploiting the formal syntax and semantics of Linear Temporal Logic (LTL), the proposed auditing and monitoring techniques are superior to LLM baseline methods in detecting violations of temporally extended behavioral constraints. Even small-model labelers match or exceed frontier LLM judges. Predictive and intervening monitors significantly reduce the violation rates of LLM-based agents while largely preserving task performance. LLMs show degraded temporal reasoning accuracy with increasing event distance, number of constraints, and number of propositions.

What carries the argument

Linear Temporal Logic (LTL) for formal specification of behavioral constraints, enabling sampling-based auditing and runtime predictive and intervening monitoring against black-box AI systems.

Load-bearing premise

That formal LTL specifications of behavioral constraints can be reliably checked against black-box LLMs via sampling or external observation without requiring internal model access or complete state information.

What would settle it

Observing a case where an LLM-based agent violates a specified temporal constraint but the LTL monitor does not flag it, or where intervening reduces task performance substantially.

Figures

Figures reproduced from arXiv: 2605.16198 by Parand A. Alamdari, Sheila A. McIlraith, Toryn Q. Klassen.

**Figure 1.** Figure 1: Overview of Temporal Rule Assessment and Compliance (TRAC): This figure depicts the base TRAC algorithm (inner green box) and TRAC with predictive and intervening capabilities (TRACP+I) (outer blue box). An AI agent interacts with an environment over time, producing a sequence of inputs (from the environment) and outputs (from the agent). The Labeler extracts atomic propositions from the sequence of inputs… view at source ↗

**Figure 2.** Figure 2: F1 scores (higher is better) of auditing approaches across environments (ordered by difficulty), reported as mean ±95% confidence intervals (SEM). The LLMs named in the x-axis are used by the auditing approaches (for judging and/or labeling); the logs being audited are the same for all and have been created using multiple LLM agents. Results are shown for multiple models ordered by size. Few-shot judges re… view at source ↗

**Figure 3.** Figure 3: Accuracy of LLMs at judging temporal constraint satisfaction across three scaling dimensions (higher is better). Top row: simple formula, bottom row: complex formula. Temporal elasticity: accuracy as the gap between relevant events grows. Constraint scalability: accuracy as the number of constraints grows. Proposition scalability: accuracy as the number of propositions per step increases. Smaller models de… view at source ↗

**Figure 4.** Figure 4: Impact of interventions on violation rate of constraints (lower is better) across models and intervention strategies in three environments, reported as mean ±95% confidence intervals (SEM). Using TRACP+I consistently reduces constraint violations in different models and environments. Violation reductions are achieved without significant task-performance degradation (see Appendix E). 6 Discussion and Conclu… view at source ↗

**Figure 5.** Figure 5: Tree structure of the complex formula. Each edge represents [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of specification language on LLM accuracy across different temporal patterns (higher is better). Three specification levels: informal NL, precise NL, and precise NL + LTL. No specification level reliably improves accuracy across models and patterns [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Task performance of approaches in Section 5.3 (higher is better). Cumulative reward achieved by agents under different TRACP+I intervention strategies, compared with a no-intervention baseline. Despite targeting safety rather than reward, the interventions preserve comparable task performance. E.3 Performance of Approaches in Section 5.3 A comparison of task performance across methods in Section 5.3 is sho… view at source ↗

read the original abstract

We examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and services throughout the AI development lifecycle, from pre-deployment testing to post-deployment auditing. Combining principles from formal methods with SoTA machine learning, we propose techniques that enable AI-enabled product and service developers, as well as third party AI developers and evaluators, to perform offline auditing and online (runtime) monitoring of product-specific (temporally extended) behavioral constraints such as safety constraints, norms, rules and regulations with respect to black-box advanced AI systems, notably LLMs. We further provide practical techniques for predictive monitoring, such as sampling-based methods, and we introduce intervening monitors that act at runtime to preempt and potentially mitigate predicted violations. Experimental results show that by exploiting the formal syntax and semantics of Linear Temporal Logic (LTL), our proposed auditing and monitoring techniques are superior to LLM baseline methods in detecting violations of temporally extended behavioral constraints; with our approach, even small-model labelers match or exceed frontier LLM judges. Our predictive and intervening monitors significantly reduce the violation rates of LLM-based agents while largely preserving task performance. We further show through controlled experiments that LLMs' temporal reasoning shows a pronounced degradation in accuracy with increasing event distance, number of constraints, and number of propositions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LTL monitoring gives a workable way to check and steer LLM behavior over time, with experiments showing gains over pure LLM judges and clear drops in LLM temporal accuracy.

read the letter

This paper takes Linear Temporal Logic and applies it to black-box LLMs for offline auditing and online monitoring of rules that stretch across sequences of actions, such as safety or compliance constraints. They add sampling-based prediction and intervening monitors that step in to head off violations before they occur. The experiments indicate their approach catches more violations than LLM baselines, that small labeler models can match or beat larger ones, and that LLM performance on temporal properties falls off noticeably as event distance, constraint count, or proposition count grows. Those degradation results line up with what people see in practice and give a concrete reason to prefer formal structure over pure prompting. The work is useful because it ships actual techniques that developers or auditors could try on deployed systems without needing model internals. The main soft spot is the reliance on accurate proposition labeling from external traces alone. For formulas like G(P → F Q) or nested ones, any systematic slip in labeling or gaps in observation would break the formal guarantees and weaken the reported superiority. The abstract gives little on how they measure or correct labeling error, handle ambiguity, or choose baselines, so the full paper needs to show that the pipeline stays sound under realistic partial views. This is aimed at people working on AI governance, runtime verification, or safety tooling who want something more structured than ad-hoc checks. A reader looking for deployable methods to enforce temporally extended properties would get concrete value here. It deserves peer review because the integration is fresh enough and the claims are testable, even if the experimental section needs more detail to hold up.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a hybrid approach combining Linear Temporal Logic (LTL) formal methods with machine learning for auditing and monitoring compliance of black-box LLMs with temporally extended behavioral constraints. It proposes offline auditing techniques and online runtime monitoring, including sampling-based predictive monitoring and intervening monitors that preempt violations. The paper reports experimental results demonstrating that LTL-exploiting methods outperform LLM-only baselines in violation detection, that small-model labelers can match or exceed frontier LLMs, and that the monitors reduce violation rates while preserving task performance. It also shows degradation in LLMs' temporal reasoning with increased event distance, constraints, and propositions.

Significance. Should the results be confirmed with detailed experimental protocols, this work has potential significance for AI safety and governance by enabling formal verification techniques to be applied to opaque advanced AI systems. The practical techniques for predictive and intervening monitoring could inform regulatory compliance tools. The empirical findings on LLM limitations in handling complex temporal logic add to the literature on LLM capabilities and failure modes.

major comments (3)

[Experimental Evaluation] The central claims of superiority in detecting violations and effectiveness of monitors rest on experimental comparisons, but the abstract provides no details on experimental design, baselines used, statistical tests, number of trials, or potential confounds. The full paper must include these to substantiate that the data supports the reported superiority and violation reductions.
[Runtime Monitoring and Intervention] The LTL-based monitoring assumes reliable labeling of atomic propositions from sampled external observations of black-box LLM behavior. For formulas involving nested temporal operators or implications like G(P → F Q), any inconsistency or error in proposition evaluation due to partial observability or natural language ambiguity could invalidate the formal guarantees and the claimed performance advantages over LLM baselines.
[LLM Temporal Reasoning Experiments] The controlled experiments on degradation with event distance, number of constraints, and propositions are presented as supporting evidence, but it is unclear whether these directly validate the auditing pipeline or if they account for variations in LTL formula complexity and observation completeness.

minor comments (2)

[Abstract] The term 'SoTA machine learning' is used without specifying the particular ML components integrated with LTL; a brief clarification would improve readability.
[Introduction] Ensure that all acronyms such as LTL are defined at first use, and consider adding a reference to prior work on runtime verification for AI systems.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving the clarity and robustness of our presentation. We address each major comment point by point below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Experimental Evaluation] The central claims of superiority in detecting violations and effectiveness of monitors rest on experimental comparisons, but the abstract provides no details on experimental design, baselines used, statistical tests, number of trials, or potential confounds. The full paper must include these to substantiate that the data supports the reported superiority and violation reductions.

Authors: We agree that the abstract is high-level and would benefit from additional experimental context to better support the claims. The full manuscript already details the experimental design in Section 4, including baselines (LLM-only judges and random sampling), number of trials (100 per condition across 5 LTL formulas), statistical tests (paired t-tests with reported p-values), and controls for confounds such as prompt phrasing and event ordering. In the revision, we will expand the abstract with a concise summary of these elements (e.g., 'evaluated over 500 trials with statistical significance testing') while preserving the word limit. revision: yes
Referee: [Runtime Monitoring and Intervention] The LTL-based monitoring assumes reliable labeling of atomic propositions from sampled external observations of black-box LLM behavior. For formulas involving nested temporal operators or implications like G(P → F Q), any inconsistency or error in proposition evaluation due to partial observability or natural language ambiguity could invalidate the formal guarantees and the claimed performance advantages over LLM baselines.

Authors: We acknowledge this as a substantive limitation of the formal guarantees, which depend on accurate proposition evaluation. Our empirical results rely on LLM labelers whose accuracy we measure (small models reaching 92% human agreement), and the performance gains are shown experimentally rather than purely formally. We will add a dedicated limitations paragraph in Section 3.2 discussing error propagation for nested operators, including a sensitivity analysis with simulated label noise, and note that conservative intervention thresholds can mitigate risks in practice. revision: partial
Referee: [LLM Temporal Reasoning Experiments] The controlled experiments on degradation with event distance, number of constraints, and propositions are presented as supporting evidence, but it is unclear whether these directly validate the auditing pipeline or if they account for variations in LTL formula complexity and observation completeness.

Authors: These experiments characterize inherent LLM limitations in temporal reasoning to motivate the hybrid approach, rather than serving as direct validation of the auditing pipeline. To strengthen the connection, we will revise the text in the introduction and Section 5 to explicitly link them to the auditing results. We will also incorporate additional controls for LTL formula complexity (varying nesting depth) and observation completeness (partial vs. full traces) in the revised experimental protocol. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation of LTL monitoring against LLM baselines

full rationale

The paper's central claims rest on experimental comparisons of LTL-based auditing/monitoring (using sampling for proposition evaluation on black-box traces) to LLM baseline judges, plus controlled tests of temporal reasoning degradation with distance and constraints. These are direct empirical results rather than derivations that reduce by construction to fitted parameters, self-definitions, or self-citation chains. No load-bearing step equates a prediction to its input or imports uniqueness via author overlap. The formal semantics of LTL are applied as a standard tool to external observations, with performance measured against independent baselines, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Approach rests on standard LTL semantics and introduces practical monitoring constructs without evident free parameters or new physical entities.

axioms (1)

standard math Linear Temporal Logic has well-defined syntax and semantics suitable for specifying temporally extended properties such as safety constraints.
Invoked as the foundation for auditing and monitoring behavioral constraints.

invented entities (1)

intervening monitors no independent evidence
purpose: Runtime components that preempt and mitigate predicted violations of constraints.
New practical technique introduced to act on predictions from the monitoring system.

pith-pipeline@v0.9.0 · 5778 in / 1234 out tokens · 45342 ms · 2026-05-20T17:20:50.392269+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a family of Temporal Rule Assessment and Compliance (TRAC) algorithms... leveraging LTL progression... TRAC significantly outperforms LLM-as-a-Judge auditing on temporally extended patterns
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction and embed_strictMono unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LTL progression... prg(φ, σ_i) ... soundness follows directly from that of LTL3 progression

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

89 extracted references · 89 canonical work pages · 11 internal anchors

[1]

Lama Ahmad, Sandhini Agarwal, Michael Lampe, and Pamela Mishkin. 2025. OpenAI’s Approach to External Red Teaming for AI Models and Systems.arXiv preprint arXiv:2503.16431(2025)

work page arXiv 2025
[2]

Alamdari, Toryn Q

Parand A. Alamdari, Toryn Q. Klassen, Elliot Creager, and Sheila A. McIlraith. 2024. Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making. InProceedings of the 41st International Conference on Machine Learning. PMLR, 906–920. https://proceedings.mlr.press/v235/alamdari24a.html

work page 2024
[3]

Alamdari, Toryn Q

Parand A. Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, and Sheila A. McIlraith. 2024. Being considerate as a pathway towards pluralistic alignment for agentic AI.arXiv preprint arXiv:2411.10613(2024)

work page arXiv 2024
[4]

Alamdari, Toryn Q

Parand A. Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, and Sheila A. McIlraith. 2022. Be Considerate: Avoiding Negative Side Effects in Reinforcement Learning. InProceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 18–26

work page 2022
[5]

Maryam Amirizaniani, Elias Martin, Tanya Roosta, Aman Chadha, and Chirag Shah. 2024. AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24). Association for Computing Machinery, 5174–5179. doi:10.1145/3627673.3679222

work page doi:10.1145/3627673.3679222 2024
[6]

Maryam Amirizaniani, Jihan Yao, Adrian Lavergne, Elizabeth Snell Okada, Aman Chadha, Tanya Roosta, and Chirag Shah. 2024. Developing a framework for auditing large language models using human-in-the-loop.arXiv preprint arXiv:2402.09346(2024)

work page arXiv 2024
[7]

Anthropic. 2024. Introducing Claude 3. https://www.anthropic.com/news/claude-3-family

work page 2024
[8]

Marco Autili, Lars Grunske, Markus Lumpe, Patrizio Pelliccione, and Antony Tang. 2015. Aligning qualitative, real-time, and probabilistic property specification patterns using a structured English grammar.IEEE Transactions on Software Engineering41, 7 (2015), 620–638

work page 2015
[9]

Fahiem Bacchus and Froduald Kabanza. 2000. Using temporal logics to express search control knowledge for planning.Artificial Intelligence116, 1-2 (2000), 123–191

work page 2000
[10]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[12]

2014.Principles of Model Checking

Christel Baier, Joost-Pieter Katoen, and Kim Guldstrand Larsen. 2014.Principles of Model Checking. MIT Press. Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page 2014
[13]

Bowen Baker, Joost Huizinga, Leo Gao, Zehao Dou, Melody Y Guan, Aleksander Madry, Wojciech Zaremba, Jakub Pachocki, and David Farhi. 2025. Monitoring reasoning models for misbehavior and the risks of promoting obfuscation.arXiv preprint arXiv:2503.11926 (2025)

work page internal anchor Pith review arXiv 2025
[14]

Howard Barringer, Allen Goldberg, Klaus Havelund, and Koushik Sen. 2004. Rule-based runtime verification. InInternational Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, 44–57

work page 2004
[15]

Andreas Bauer and Yliès Falcone. 2016. Decentralised LTL monitoring.Formal Methods in System Design48, 1-2 (2016), 46–93. doi:10.1007/S10703-016-0253-8

work page doi:10.1007/s10703-016-0253-8 2016
[16]

Andreas Bauer, Martin Leucker, and Christian Schallhart. 2006. Monitoring of real-time properties. InInternational Conference on Foundations of Software Technology and Theoretical Computer Science. Springer, 260–272

work page 2006
[17]

Andreas Bauer, Martin Leucker, and Christian Schallhart. 2010. Comparing LTL semantics for runtime verification.Journal of Logic and Computation20, 3 (2010), 651–674

work page 2010
[18]

Andreas Bauer, Martin Leucker, and Christian Schallhart. 2011. Runtime verification for LTL and TLTL.ACM Transactions on Software Engineering and Methodology (TOSEM)20, 4 (2011), 1–64

work page 2011
[19]

Dietterich, Edward W

Yoshua Bengio, Stephen Clare, Carina Prunkl, Malcolm Murray, Maksym Andriushchenko, Ben Bucknall, Rishi Bommasani, Stephen Casper, Tom Davidson, Raymond Douglas, David Duvenaud, Philip Fox, Usman Gohar, Rose Hadshar, Anson Ho, Tiancheng Hu, Cameron Jones, Sayash Kapoor, Atoosa Kasirzadeh, Sam Manning, Nestor Maslej, Vasilios Mavroudis, Conor McGlynn, Rich...

work page 2026
[20]

Yoshua Bengio, Sören Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Philip Fox, Ben Garfinkel, Danielle Goldfarb, Hoda Heidari, Anson Ho, Sayash Kapoor, Leila Khalatbari, Shayne Longpre, Sam Manning, Vasilios Mavroudis, Mantas Mazeika, Julian Michael, Jessica Newman, Kwan Yee Ng, Chinasa T. Okolo, Deborah Raji,...

work page 2025
[21]

Meghyn Bienvenu, Christian Fritz, and Sheila A McIlraith. 2011. Specifying and computing preferred plans.Artificial Intelligence175, 7-8 (2011), 1308–1345

work page 2011
[22]

Abeba Birhane, Ryan Steed, Victor Ojewale, Briana Vecchione, and Inioluwa Deborah Raji. 2024. AI auditing: The broken bus on the road to AI accountability. In2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 612–643

work page 2024
[23]

Andrea Brunello, Angelo Montanari, and Mark Reynolds. 2019. Synthesis of LTL Formulas from Natural Language Texts: State of the Art and Research Directions. In26th International Symposium on Temporal Representation and Reasoning (TIME 2019) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 147). Schloss Dagstuhl – Leibniz-Zentrum für Inform...

work page doi:10.4230/lipics.time 2019
[24]

Klassen, Richard Valenzano, and Sheila A

Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, and Sheila A. McIlraith. 2019. LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. 6065–6073. doi:10.24963/ijcai.2019/840

work page doi:10.24963/ijcai.2019/840 2019
[25]

Zouying Cao, Yifei Yang, and Hai Zhao. 2025. SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering.Proceedings of the AAAI Conference on Artificial Intelligence39 (2025), 23523–23531. doi:10.1609/aaai.v39i22.34521 FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Parand A. Alamdari, Toryn Q. Klassen, and Sheila A. McIlraith

work page doi:10.1609/aaai.v39i22.34521 2025
[26]

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip J. K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Mic...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, and Dylan Hadfield-Menell. 2024. Black-box access i...

work page 2024
[28]

Stephen Casper, Jason Lin, Joe Kwon, Gatlen Culp, and Dylan Hadfield-Menell. 2023. Explore, establish, exploit: Red teaming language models from scratch.arXiv preprint arXiv:2306.09442(2023)

work page arXiv 2023
[29]

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. 2024. Visibility into AI Agents. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024. ACM, 958–973. https://doi.org/10.1145/3630106.3658948

work page doi:10.1145/3630106.3658948 2024
[30]

Yongchao Chen, Rujul Gandhi, Yang Zhang, and Chuchu Fan. 2023. NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 15880–15903. doi:10.18653/v1/2023.emnlp-main.985

work page doi:10.18653/v1/2023.emnlp-main.985 2023
[31]

Noam Chomsky. 1956. Three models for the description of language.IRE Transactions on Information Theory2, 3 (1956), 113–124. https://doi.org/10.1109/TIT.1956.1056813

work page doi:10.1109/tit.1956.1056813 1956
[32]

2024 BCCRT 149 (canLII) Civil Resolution Tribunal of British Columbia. 2024. Moffatt v. Air Canada. https://canlii.ca/t/k2spq File Number SC-2023-005609

work page 2024
[33]

Matthias Cosler, Christopher Hahn, Daniel Mendoza, Frederik Schmitt, and Caroline Trippel. 2023. nl2spec: Interactively translating unstructured natural language to temporal logics with large language models. InInternational Conference on Computer Aided Verification. Springer, 383–396

work page 2023
[34]

Sasha Costanza-Chock, Inioluwa Deborah Raji, and Joy Buolamwini. 2022. Who Audits the Auditors? Recommendations from a field scan of the algorithmic auditing ecosystem. InProceedings of the 2022 ACM conference on Fairness, Accountability, and Transparency. 1571–1583

work page 2022
[35]

Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, and Adam Trischler. 2019. Textworld: A learning environment for text-based games. InComputer Games: 7th Workshop, CGW 2018, Held in Conjunction with the 27th International Conference on Artificial...

work page 2019
[36]

Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang. 2024. Safe RLHF: Safe Reinforcement Learning from Human Feedback. InThe Twelfth International Conference on Learning Representations

work page 2024
[37]

Ben d’Angelo, Sriram Sankaranarayanan, César Sánchez, Will Robinson, Bernd Finkbeiner, Henny B Sipma, Sandeep Mehrotra, and Zohar Manna. 2005. LOLA: runtime monitoring of synchronous systems. In12th International Symposium on Temporal Representation and Reasoning (TIME’05). IEEE, 166–174

work page 2005
[38]

Google DeepMind. 2023. Gemini: A Family of Highly Capable Multimodal Models.arXiv preprint arXiv:2312.11805(2023). https: //arxiv.org/abs/2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Matthew B Dwyer, George S Avrunin, and James C Corbett. 1999. Patterns in property specifications for finite-state verification. In Proceedings of the 21st International Conference on Software Engineering. 411–420

work page 1999
[40]

Henzinger, and Bernhard Kragl

Thomas Ferrère, Thomas A. Henzinger, and Bernhard Kragl. 2020. Monitoring Event Frequencies. In28th EACSL Annual Conference on Computer Science Logic (CSL 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 152), Maribel Fernández and Anca Muscholl (Eds.). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 20:1–20:16....

work page doi:10.4230/lipics.csl.2020.20 2020
[41]

Francesco Fuggitti and Tathagata Chakraborti. 2023. NL2LTL–a Python package for converting natural language (NL) instructions to linear temporal logic (LTL) formulas. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 16428–16430

work page 2023
[42]

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Joh...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[43]

Alfonso E Gerevini, Patrik Haslum, Derek Long, Alessandro Saetti, and Yannis Dimopoulos. 2009. Deterministic planning in the fifth international planning competition: PDDL3 and experimental evaluation of the planners.Artificial Intelligence173, 5-6 (2009), 619–668. Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems FAccT ’26, Jun...

work page 2009
[44]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The Llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, and Fabien Roger. 2024. AI Control: Improving Safety Despite Intentional Subversion. InProceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 235). PMLR, 16295–16336

work page 2024
[46]

Lara Groves, Jacob Metcalf, Alayna Kennedy, Briana Vecchione, and Andrew Strait. 2024. Auditing work: Exploring the New York City algorithmic bias audit regime. InProceedings of the 2024 ACM conference on Fairness, Accountability, and Transparency. 1107–1120

work page 2024
[47]

Guan, Miles Wang, Micah Carroll, Zehao Dou, Annie Y

Melody Y. Guan, Miles Wang, Micah Carroll, Zehao Dou, Annie Y. Wei, Marcus Williams, Benjamin Arnav, Joost Huizinga, Ian Kivlichan, Mia Glaese, Jakub Pachocki, and Bowen Baker. 2025. Monitoring Monitorability.arXiv preprint arXiv:2512.18311(2025)

work page arXiv 2025
[48]

Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning.arXiv preprint arXiv:1801.08099(2018). http://arxiv.org/abs/1801.08099

work page internal anchor Pith review Pith/arXiv arXiv 2018
[49]

Klaus Havelund and Grigore Roşu. 2001. Monitoring programs using rewriting. InProceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001). IEEE, 135–143

work page 2001
[50]

Klaus Havelund and Grigore Roşu. 2004. Efficient monitoring of safety properties.International Journal on Software Tools for Technology Transfer6, 2 (2004), 158–173

work page 2004
[51]

Thomas A Henzinger, Mahyar Karimi, Konstantin Kueffner, and Kaushik Mallik. 2023. Monitoring algorithmic fairness. InInternational Conference on Computer Aided Verification. Springer, 358–382

work page 2023
[52]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration.arXiv preprint arXiv:1904.09751(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[53]

Erik Jones, Anca Dragan, Aditi Raghunathan, and Jacob Steinhardt. 2023. Automatically auditing large language models via discrete optimization. InInternational Conference on Machine Learning. PMLR, 15307–15329

work page 2023
[54]

Jeffrey Joyce, Greg Lomow, Konrad Slind, and Brian Unger. 1987. Monitoring distributed systems.ACM Transactions on Computer Systems (TOCS)5, 2 (1987), 121–150

work page 1987
[55]

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec,...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[56]

Hannes Kallwies, Martin Leucker, César Sánchez, and Torben Scheffel. 2022. Anticipatory recurrent monitoring with uncertainty and assumptions. InInternational Conference on Runtime Verification. Springer, 181–199

work page 2022
[57]

Taesup Kim, Sungjin Ahn, and Yoshua Bengio. 2019. Variational temporal abstraction.Advances in Neural Information Processing Systems 32 (2019)

work page 2019
[58]

Klassen, Parand A

Toryn Q. Klassen, Parand A. Alamdari, and Sheila A. McIlraith. 2023. Epistemic Side Effects: An AI Safety Problem. InProceedings of International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 1797–1801

work page 2023
[59]

Klassen, Parand A

Toryn Q. Klassen, Parand A. Alamdari, and Sheila A. McIlraith. 2024. Pluralistic Alignment Over Time.arXiv preprint arXiv:2411.10654 (2024)

work page arXiv 2024
[60]

Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, et al. 2025. Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.arXiv preprint arXiv:2507.11473 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[61]

Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, and Shane Legg. 2020. Avoiding Side Effects By Considering Future Tasks. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc

work page 2020
[62]

Khoa Lam, Benjamin Lange, Borhane Blili-Hamelin, Jovana Davidovic, Shea Brown, and Ali Hasan. 2024. A framework for assurance audits of algorithmic systems. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1078–1092

work page 2024
[63]

McIlraith, Shirin Sohrabi, and John Mylopoulos

Sotirios Liaskos, Sheila A. McIlraith, Shirin Sohrabi, and John Mylopoulos. 2011. Representing and reasoning about preferences in re- quirements engineering.Requirements Engineering16 (2011), 227–249. Issue 3. http://www.springerlink.com/content/t27624345512v390/

work page 2011
[64]

Jason Xinyu Liu, Ankit Shah, George Konidaris, Stefanie Tellex, and David Paulius. 2024. Lang2LTL-2: Grounding Spatiotemporal Navigation Commands Using Large Language and Vision-Language Models. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024. IEEE, 2325–2332. doi:10.1109/IROS58592.2024.10802696

work page doi:10.1109/iros58592.2024.10802696 2024
[65]

Fabrizio Maria Maggi, Marco Montali, Michael Westergaard, and Wil MP Van Der Aalst. 2011. Monitoring business constraints with linear temporal logic: An approach based on colored automata. InBusiness Process Management: 9th International Conference, BPM 2011, Clermont-Ferrand, France, August 30-September 2, 2011. Proceedings. Springer, 132–147

work page 2011
[66]

Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, and Luciano Floridi. 2024. Auditing large language models: a three-layered approach. AI and Ethics4, 4 (2024), 1085–1115

work page 2024
[67]

OpenAI. 2023. GPT-4 Technical Report. https://openai.com/research/gpt-4. FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Parand A. Alamdari, Toryn Q. Klassen, and Sheila A. McIlraith

work page 2023
[68]

Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving

work page
[69]

Red teaming language models with language models,

Red Teaming Language Models with Language Models. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 3419–3448. doi:10.18653/v1/2022.emnlp-main.225

work page doi:10.18653/v1/2022.emnlp-main.225 2022
[70]

Amir Pnueli. 1977. The temporal logic of programs. In18th Annual Symposium on Foundations of Computer Science (sfcs 1977). IEEE, 46–57. doi:10.1109/SFCS.1977.32

work page doi:10.1109/sfcs.1977.32 1977
[71]

Amir Pnueli and Roni Rosner. 1989. On the Synthesis of a Reactive Module. InPOPL ’89: Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM Press, 179–190. doi:10.1145/75277.75293

work page doi:10.1145/75277.75293 1989
[72]

Inioluwa Deborah Raji, Peggy Xu, Colleen Honigsberg, and Daniel Ho. 2022. Outsider oversight: Designing a third party audit ecosystem for ai governance. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 557–571

work page 2022
[73]

Charvi Rastogi, Marco Tulio Ribeiro, Nicholas King, Harsha Nori, and Saleema Amershi. 2023. Supporting human-AI collaboration in auditing LLMs with LLMs. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 913–926

work page 2023
[74]

Anne Rozinat and Wil MP Van der Aalst. 2008. Conformance checking of processes based on monitoring real behavior.Information Systems33, 1 (2008), 64–95

work page 2008
[75]

Ajith Sankaran. 2025. How Small And Medium Businesses Can Take Advantage Of The Emerging Agentic AI Era. https://www.forbes.com/councils/forbesbusinesscouncil/2025/04/01/how-small-and-medium-businesses-can-take-advantage-of- the-emerging-agentic-ai-era/ Forbes Business Council, COUNCIL POST

work page 2025
[76]

Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. InProceedings of the Conference on Fairness, Accountability, and Transparency. 59–68

work page 2019
[77]

Niek Tax, Ilya Verenich, Marcello La Rosa, and Marlon Dumas. 2017. Predictive business process monitoring with LSTM neural networks. InAdvanced Information Systems Engineering: 29th International Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings 29. Springer, 477–492

work page 2017
[78]

Pashootan Vaezipoor, Andrew C Li, Rodrigo A Toro Icarte, and Sheila A Mcilraith. 2021. LTL2Action: Generalizing LTL instructions for multi-task RL. InInternational Conference on Machine Learning. PMLR, 10497–10508

work page 2021
[79]

Cameron Voloshin, Abhinav Verma, and Yisong Yue. 2023. Eventual Discounting Temporal Logic Counterfactual Experience Replay. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202). PMLR, 35137–35150. https://proceedings.mlr.press/v202/voloshin23a.html

work page 2023
[80]

Christopher Wang, Candace Ross, Yen-Ling Kuo, Boris Katz, and Andrei Barbu. 2021. Learning a natural-language to LTL executable semantic parser for grounded robotics. InProceedings of the 2020 Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 155). PMLR, 1706–1718

work page 2021

Showing first 80 references.

[1] [1]

Lama Ahmad, Sandhini Agarwal, Michael Lampe, and Pamela Mishkin. 2025. OpenAI’s Approach to External Red Teaming for AI Models and Systems.arXiv preprint arXiv:2503.16431(2025)

work page arXiv 2025

[2] [2]

Alamdari, Toryn Q

Parand A. Alamdari, Toryn Q. Klassen, Elliot Creager, and Sheila A. McIlraith. 2024. Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making. InProceedings of the 41st International Conference on Machine Learning. PMLR, 906–920. https://proceedings.mlr.press/v235/alamdari24a.html

work page 2024

[3] [3]

Alamdari, Toryn Q

Parand A. Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, and Sheila A. McIlraith. 2024. Being considerate as a pathway towards pluralistic alignment for agentic AI.arXiv preprint arXiv:2411.10613(2024)

work page arXiv 2024

[4] [4]

Alamdari, Toryn Q

Parand A. Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, and Sheila A. McIlraith. 2022. Be Considerate: Avoiding Negative Side Effects in Reinforcement Learning. InProceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 18–26

work page 2022

[5] [5]

Maryam Amirizaniani, Elias Martin, Tanya Roosta, Aman Chadha, and Chirag Shah. 2024. AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24). Association for Computing Machinery, 5174–5179. doi:10.1145/3627673.3679222

work page doi:10.1145/3627673.3679222 2024

[6] [6]

Maryam Amirizaniani, Jihan Yao, Adrian Lavergne, Elizabeth Snell Okada, Aman Chadha, Tanya Roosta, and Chirag Shah. 2024. Developing a framework for auditing large language models using human-in-the-loop.arXiv preprint arXiv:2402.09346(2024)

work page arXiv 2024

[7] [7]

Anthropic. 2024. Introducing Claude 3. https://www.anthropic.com/news/claude-3-family

work page 2024

[8] [8]

Marco Autili, Lars Grunske, Markus Lumpe, Patrizio Pelliccione, and Antony Tang. 2015. Aligning qualitative, real-time, and probabilistic property specification patterns using a structured English grammar.IEEE Transactions on Software Engineering41, 7 (2015), 620–638

work page 2015

[9] [9]

Fahiem Bacchus and Froduald Kabanza. 2000. Using temporal logics to express search control knowledge for planning.Artificial Intelligence116, 1-2 (2000), 123–191

work page 2000

[10] [10]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[12] [12]

2014.Principles of Model Checking

Christel Baier, Joost-Pieter Katoen, and Kim Guldstrand Larsen. 2014.Principles of Model Checking. MIT Press. Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page 2014

[13] [13]

Bowen Baker, Joost Huizinga, Leo Gao, Zehao Dou, Melody Y Guan, Aleksander Madry, Wojciech Zaremba, Jakub Pachocki, and David Farhi. 2025. Monitoring reasoning models for misbehavior and the risks of promoting obfuscation.arXiv preprint arXiv:2503.11926 (2025)

work page internal anchor Pith review arXiv 2025

[14] [14]

Howard Barringer, Allen Goldberg, Klaus Havelund, and Koushik Sen. 2004. Rule-based runtime verification. InInternational Workshop on Verification, Model Checking, and Abstract Interpretation. Springer, 44–57

work page 2004

[15] [15]

Andreas Bauer and Yliès Falcone. 2016. Decentralised LTL monitoring.Formal Methods in System Design48, 1-2 (2016), 46–93. doi:10.1007/S10703-016-0253-8

work page doi:10.1007/s10703-016-0253-8 2016

[16] [16]

Andreas Bauer, Martin Leucker, and Christian Schallhart. 2006. Monitoring of real-time properties. InInternational Conference on Foundations of Software Technology and Theoretical Computer Science. Springer, 260–272

work page 2006

[17] [17]

Andreas Bauer, Martin Leucker, and Christian Schallhart. 2010. Comparing LTL semantics for runtime verification.Journal of Logic and Computation20, 3 (2010), 651–674

work page 2010

[18] [18]

Andreas Bauer, Martin Leucker, and Christian Schallhart. 2011. Runtime verification for LTL and TLTL.ACM Transactions on Software Engineering and Methodology (TOSEM)20, 4 (2011), 1–64

work page 2011

[19] [19]

Dietterich, Edward W

Yoshua Bengio, Stephen Clare, Carina Prunkl, Malcolm Murray, Maksym Andriushchenko, Ben Bucknall, Rishi Bommasani, Stephen Casper, Tom Davidson, Raymond Douglas, David Duvenaud, Philip Fox, Usman Gohar, Rose Hadshar, Anson Ho, Tiancheng Hu, Cameron Jones, Sayash Kapoor, Atoosa Kasirzadeh, Sam Manning, Nestor Maslej, Vasilios Mavroudis, Conor McGlynn, Rich...

work page 2026

[20] [20]

Yoshua Bengio, Sören Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Philip Fox, Ben Garfinkel, Danielle Goldfarb, Hoda Heidari, Anson Ho, Sayash Kapoor, Leila Khalatbari, Shayne Longpre, Sam Manning, Vasilios Mavroudis, Mantas Mazeika, Julian Michael, Jessica Newman, Kwan Yee Ng, Chinasa T. Okolo, Deborah Raji,...

work page 2025

[21] [21]

Meghyn Bienvenu, Christian Fritz, and Sheila A McIlraith. 2011. Specifying and computing preferred plans.Artificial Intelligence175, 7-8 (2011), 1308–1345

work page 2011

[22] [22]

Abeba Birhane, Ryan Steed, Victor Ojewale, Briana Vecchione, and Inioluwa Deborah Raji. 2024. AI auditing: The broken bus on the road to AI accountability. In2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 612–643

work page 2024

[23] [23]

Andrea Brunello, Angelo Montanari, and Mark Reynolds. 2019. Synthesis of LTL Formulas from Natural Language Texts: State of the Art and Research Directions. In26th International Symposium on Temporal Representation and Reasoning (TIME 2019) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 147). Schloss Dagstuhl – Leibniz-Zentrum für Inform...

work page doi:10.4230/lipics.time 2019

[24] [24]

Klassen, Richard Valenzano, and Sheila A

Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, and Sheila A. McIlraith. 2019. LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. 6065–6073. doi:10.24963/ijcai.2019/840

work page doi:10.24963/ijcai.2019/840 2019

[25] [25]

Zouying Cao, Yifei Yang, and Hai Zhao. 2025. SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering.Proceedings of the AAAI Conference on Artificial Intelligence39 (2025), 23523–23531. doi:10.1609/aaai.v39i22.34521 FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Parand A. Alamdari, Toryn Q. Klassen, and Sheila A. McIlraith

work page doi:10.1609/aaai.v39i22.34521 2025

[26] [26]

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip J. K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Mic...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [27]

Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, and Dylan Hadfield-Menell. 2024. Black-box access i...

work page 2024

[28] [28]

Stephen Casper, Jason Lin, Joe Kwon, Gatlen Culp, and Dylan Hadfield-Menell. 2023. Explore, establish, exploit: Red teaming language models from scratch.arXiv preprint arXiv:2306.09442(2023)

work page arXiv 2023

[29] [29]

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. 2024. Visibility into AI Agents. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024. ACM, 958–973. https://doi.org/10.1145/3630106.3658948

work page doi:10.1145/3630106.3658948 2024

[30] [30]

Yongchao Chen, Rujul Gandhi, Yang Zhang, and Chuchu Fan. 2023. NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 15880–15903. doi:10.18653/v1/2023.emnlp-main.985

work page doi:10.18653/v1/2023.emnlp-main.985 2023

[31] [31]

Noam Chomsky. 1956. Three models for the description of language.IRE Transactions on Information Theory2, 3 (1956), 113–124. https://doi.org/10.1109/TIT.1956.1056813

work page doi:10.1109/tit.1956.1056813 1956

[32] [32]

2024 BCCRT 149 (canLII) Civil Resolution Tribunal of British Columbia. 2024. Moffatt v. Air Canada. https://canlii.ca/t/k2spq File Number SC-2023-005609

work page 2024

[33] [33]

Matthias Cosler, Christopher Hahn, Daniel Mendoza, Frederik Schmitt, and Caroline Trippel. 2023. nl2spec: Interactively translating unstructured natural language to temporal logics with large language models. InInternational Conference on Computer Aided Verification. Springer, 383–396

work page 2023

[34] [34]

Sasha Costanza-Chock, Inioluwa Deborah Raji, and Joy Buolamwini. 2022. Who Audits the Auditors? Recommendations from a field scan of the algorithmic auditing ecosystem. InProceedings of the 2022 ACM conference on Fairness, Accountability, and Transparency. 1571–1583

work page 2022

[35] [35]

Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, and Adam Trischler. 2019. Textworld: A learning environment for text-based games. InComputer Games: 7th Workshop, CGW 2018, Held in Conjunction with the 27th International Conference on Artificial...

work page 2019

[36] [36]

Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang. 2024. Safe RLHF: Safe Reinforcement Learning from Human Feedback. InThe Twelfth International Conference on Learning Representations

work page 2024

[37] [37]

Ben d’Angelo, Sriram Sankaranarayanan, César Sánchez, Will Robinson, Bernd Finkbeiner, Henny B Sipma, Sandeep Mehrotra, and Zohar Manna. 2005. LOLA: runtime monitoring of synchronous systems. In12th International Symposium on Temporal Representation and Reasoning (TIME’05). IEEE, 166–174

work page 2005

[38] [38]

Google DeepMind. 2023. Gemini: A Family of Highly Capable Multimodal Models.arXiv preprint arXiv:2312.11805(2023). https: //arxiv.org/abs/2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [39]

Matthew B Dwyer, George S Avrunin, and James C Corbett. 1999. Patterns in property specifications for finite-state verification. In Proceedings of the 21st International Conference on Software Engineering. 411–420

work page 1999

[40] [40]

Henzinger, and Bernhard Kragl

Thomas Ferrère, Thomas A. Henzinger, and Bernhard Kragl. 2020. Monitoring Event Frequencies. In28th EACSL Annual Conference on Computer Science Logic (CSL 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 152), Maribel Fernández and Anca Muscholl (Eds.). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 20:1–20:16....

work page doi:10.4230/lipics.csl.2020.20 2020

[41] [41]

Francesco Fuggitti and Tathagata Chakraborti. 2023. NL2LTL–a Python package for converting natural language (NL) instructions to linear temporal logic (LTL) formulas. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 16428–16430

work page 2023

[42] [42]

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Joh...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[43] [43]

Alfonso E Gerevini, Patrik Haslum, Derek Long, Alessandro Saetti, and Yannis Dimopoulos. 2009. Deterministic planning in the fifth international planning competition: PDDL3 and experimental evaluation of the planners.Artificial Intelligence173, 5-6 (2009), 619–668. Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems FAccT ’26, Jun...

work page 2009

[44] [44]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The Llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[45] [45]

Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, and Fabien Roger. 2024. AI Control: Improving Safety Despite Intentional Subversion. InProceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 235). PMLR, 16295–16336

work page 2024

[46] [46]

Lara Groves, Jacob Metcalf, Alayna Kennedy, Briana Vecchione, and Andrew Strait. 2024. Auditing work: Exploring the New York City algorithmic bias audit regime. InProceedings of the 2024 ACM conference on Fairness, Accountability, and Transparency. 1107–1120

work page 2024

[47] [47]

Guan, Miles Wang, Micah Carroll, Zehao Dou, Annie Y

Melody Y. Guan, Miles Wang, Micah Carroll, Zehao Dou, Annie Y. Wei, Marcus Williams, Benjamin Arnav, Joost Huizinga, Ian Kivlichan, Mia Glaese, Jakub Pachocki, and Bowen Baker. 2025. Monitoring Monitorability.arXiv preprint arXiv:2512.18311(2025)

work page arXiv 2025

[48] [48]

Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning.arXiv preprint arXiv:1801.08099(2018). http://arxiv.org/abs/1801.08099

work page internal anchor Pith review Pith/arXiv arXiv 2018

[49] [49]

Klaus Havelund and Grigore Roşu. 2001. Monitoring programs using rewriting. InProceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001). IEEE, 135–143

work page 2001

[50] [50]

Klaus Havelund and Grigore Roşu. 2004. Efficient monitoring of safety properties.International Journal on Software Tools for Technology Transfer6, 2 (2004), 158–173

work page 2004

[51] [51]

Thomas A Henzinger, Mahyar Karimi, Konstantin Kueffner, and Kaushik Mallik. 2023. Monitoring algorithmic fairness. InInternational Conference on Computer Aided Verification. Springer, 358–382

work page 2023

[52] [52]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration.arXiv preprint arXiv:1904.09751(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[53] [53]

Erik Jones, Anca Dragan, Aditi Raghunathan, and Jacob Steinhardt. 2023. Automatically auditing large language models via discrete optimization. InInternational Conference on Machine Learning. PMLR, 15307–15329

work page 2023

[54] [54]

Jeffrey Joyce, Greg Lomow, Konrad Slind, and Brian Unger. 1987. Monitoring distributed systems.ACM Transactions on Computer Systems (TOCS)5, 2 (1987), 121–150

work page 1987

[55] [55]

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec,...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[56] [56]

Hannes Kallwies, Martin Leucker, César Sánchez, and Torben Scheffel. 2022. Anticipatory recurrent monitoring with uncertainty and assumptions. InInternational Conference on Runtime Verification. Springer, 181–199

work page 2022

[57] [57]

Taesup Kim, Sungjin Ahn, and Yoshua Bengio. 2019. Variational temporal abstraction.Advances in Neural Information Processing Systems 32 (2019)

work page 2019

[58] [58]

Klassen, Parand A

Toryn Q. Klassen, Parand A. Alamdari, and Sheila A. McIlraith. 2023. Epistemic Side Effects: An AI Safety Problem. InProceedings of International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 1797–1801

work page 2023

[59] [59]

Klassen, Parand A

Toryn Q. Klassen, Parand A. Alamdari, and Sheila A. McIlraith. 2024. Pluralistic Alignment Over Time.arXiv preprint arXiv:2411.10654 (2024)

work page arXiv 2024

[60] [60]

Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, et al. 2025. Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.arXiv preprint arXiv:2507.11473 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[61] [61]

Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, and Shane Legg. 2020. Avoiding Side Effects By Considering Future Tasks. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc

work page 2020

[62] [62]

Khoa Lam, Benjamin Lange, Borhane Blili-Hamelin, Jovana Davidovic, Shea Brown, and Ali Hasan. 2024. A framework for assurance audits of algorithmic systems. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1078–1092

work page 2024

[63] [63]

McIlraith, Shirin Sohrabi, and John Mylopoulos

Sotirios Liaskos, Sheila A. McIlraith, Shirin Sohrabi, and John Mylopoulos. 2011. Representing and reasoning about preferences in re- quirements engineering.Requirements Engineering16 (2011), 227–249. Issue 3. http://www.springerlink.com/content/t27624345512v390/

work page 2011

[64] [64]

Jason Xinyu Liu, Ankit Shah, George Konidaris, Stefanie Tellex, and David Paulius. 2024. Lang2LTL-2: Grounding Spatiotemporal Navigation Commands Using Large Language and Vision-Language Models. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024. IEEE, 2325–2332. doi:10.1109/IROS58592.2024.10802696

work page doi:10.1109/iros58592.2024.10802696 2024

[65] [65]

Fabrizio Maria Maggi, Marco Montali, Michael Westergaard, and Wil MP Van Der Aalst. 2011. Monitoring business constraints with linear temporal logic: An approach based on colored automata. InBusiness Process Management: 9th International Conference, BPM 2011, Clermont-Ferrand, France, August 30-September 2, 2011. Proceedings. Springer, 132–147

work page 2011

[66] [66]

Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, and Luciano Floridi. 2024. Auditing large language models: a three-layered approach. AI and Ethics4, 4 (2024), 1085–1115

work page 2024

[67] [67]

OpenAI. 2023. GPT-4 Technical Report. https://openai.com/research/gpt-4. FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Parand A. Alamdari, Toryn Q. Klassen, and Sheila A. McIlraith

work page 2023

[68] [68]

Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving

work page

[69] [69]

Red teaming language models with language models,

Red Teaming Language Models with Language Models. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 3419–3448. doi:10.18653/v1/2022.emnlp-main.225

work page doi:10.18653/v1/2022.emnlp-main.225 2022

[70] [70]

Amir Pnueli. 1977. The temporal logic of programs. In18th Annual Symposium on Foundations of Computer Science (sfcs 1977). IEEE, 46–57. doi:10.1109/SFCS.1977.32

work page doi:10.1109/sfcs.1977.32 1977

[71] [71]

Amir Pnueli and Roni Rosner. 1989. On the Synthesis of a Reactive Module. InPOPL ’89: Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM Press, 179–190. doi:10.1145/75277.75293

work page doi:10.1145/75277.75293 1989

[72] [72]

Inioluwa Deborah Raji, Peggy Xu, Colleen Honigsberg, and Daniel Ho. 2022. Outsider oversight: Designing a third party audit ecosystem for ai governance. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 557–571

work page 2022

[73] [73]

Charvi Rastogi, Marco Tulio Ribeiro, Nicholas King, Harsha Nori, and Saleema Amershi. 2023. Supporting human-AI collaboration in auditing LLMs with LLMs. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 913–926

work page 2023

[74] [74]

Anne Rozinat and Wil MP Van der Aalst. 2008. Conformance checking of processes based on monitoring real behavior.Information Systems33, 1 (2008), 64–95

work page 2008

[75] [75]

Ajith Sankaran. 2025. How Small And Medium Businesses Can Take Advantage Of The Emerging Agentic AI Era. https://www.forbes.com/councils/forbesbusinesscouncil/2025/04/01/how-small-and-medium-businesses-can-take-advantage-of- the-emerging-agentic-ai-era/ Forbes Business Council, COUNCIL POST

work page 2025

[76] [76]

Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. InProceedings of the Conference on Fairness, Accountability, and Transparency. 59–68

work page 2019

[77] [77]

Niek Tax, Ilya Verenich, Marcello La Rosa, and Marlon Dumas. 2017. Predictive business process monitoring with LSTM neural networks. InAdvanced Information Systems Engineering: 29th International Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings 29. Springer, 477–492

work page 2017

[78] [78]

Pashootan Vaezipoor, Andrew C Li, Rodrigo A Toro Icarte, and Sheila A Mcilraith. 2021. LTL2Action: Generalizing LTL instructions for multi-task RL. InInternational Conference on Machine Learning. PMLR, 10497–10508

work page 2021

[79] [79]

Cameron Voloshin, Abhinav Verma, and Yisong Yue. 2023. Eventual Discounting Temporal Logic Counterfactual Experience Replay. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202). PMLR, 35137–35150. https://proceedings.mlr.press/v202/voloshin23a.html

work page 2023

[80] [80]

Christopher Wang, Candace Ross, Yen-Ling Kuo, Boris Katz, and Andrei Barbu. 2021. Learning a natural-language to LTL executable semantic parser for grounded robotics. InProceedings of the 2020 Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 155). PMLR, 1706–1718

work page 2021