Towards Counterfactual Explanation and Assertion Inference for CPS Debugging

Hadiza Yusuf; Khouloud Gaaloul; Zaid Ghazal

arxiv: 2604.07679 · v1 · submitted 2026-04-09 · 💻 cs.SE · cs.LG· cs.SY· eess.SY

Towards Counterfactual Explanation and Assertion Inference for CPS Debugging

Zaid Ghazal , Hadiza Yusuf , Khouloud Gaaloul This is my paper

Pith reviewed 2026-05-10 18:33 UTC · model grok-4.3

classification 💻 cs.SE cs.LGcs.SYeess.SY

keywords counterfactual explanationassertion inferenceCPS debuggingcyber-physical systemssimulation failuresinput signal changescausal modelstest verification

0 comments

The pith

DeCaF generates minimal counterfactual changes to input signals that turn failing CPS tests into passing ones and infers generalizable assertions from those changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeCaF to help engineers interpret hard-to-explain failures in large-scale CPS simulations, where violations often arise from specific interactions between continuous signals and discrete events at particular times. Given a failing test, DeCaF produces small, targeted modifications to the input signals that make the test pass, using combinations of counterfactual generators and causal models to ensure the changes are minimal, necessary, and sufficient. It then extracts logical assertions over the input values and timings that capture the conditions for success in a form engineers can read and reason about, without any access to the internal structure of the simulated model. A sympathetic reader would care because current debugging tools can point to faulty components but rarely reveal the precise input conditions that trigger the problem or the smallest fix that would have avoided it.

Core claim

DeCaF combines three counterfactual generators with two causal models to create minimal, necessary, and sufficient changes to the input signals of a failing CPS test so that the test becomes passing, then infers success assertions as logical predicates over those inputs that generalize the recovery conditions in an interpretable way.

What carries the argument

DeCaF framework, which pairs counterfactual generators (KD-Tree Nearest Neighbors, Genetic Algorithm) with causal models (M5 model tree, Random Forest) to produce precise input-signal corrections and derive assertions.

If this is right

Engineers obtain interpretable logical predicates that describe the exact input values and timings responsible for a violation.
The framework works on black-box models since it requires no internal access to the CPS simulation code.
Different generator-model pairs trade off success rate against causal precision, with KD-Tree Nearest Neighbors plus M5 model tree showing the highest success rate across the evaluated case studies.
The generated assertions characterize recovery conditions that can be checked on future inputs without rerunning full simulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The assertions could be reused to filter or generate new test inputs that are guaranteed to avoid the identified failure modes.
If the causal models prove accurate on additional CPS examples, the same generator combinations might reduce the total number of simulations needed during verification.
The approach implicitly treats the input-signal space as the primary diagnostic surface rather than the model internals, which may shift debugging effort toward input specification and test design.

Load-bearing premise

That the chosen counterfactual generators and causal models can reliably produce minimal, necessary, and sufficient input changes, and that the resulting assertions accurately generalize the recovery conditions beyond the original failing tests.

What would settle it

Apply the counterfactual changes or the inferred assertions to new, previously unseen failing tests in the same CPS models and observe whether the changes actually make the tests pass or whether the assertions correctly predict success versus failure.

Figures

Figures reproduced from arXiv: 2604.07679 by Hadiza Yusuf, Khouloud Gaaloul, Zaid Ghazal.

**Figure 1.** Figure 1: Example Illustration of DeCaF and Example signals of counterfactual explanation generated for the AT case-study [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of DeCaF Framework. replacing 𝑐𝑢, 𝑗 with the signal value 𝑢(𝑡). Second, we apply logical translation to combine universal quantifiers across conjunctive expressions. For example, the counterfactual in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of evaluation metrics of the ML techniques [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Verification and validation of cyber-physical systems (CPS) via large-scale simulation often surface failures that are hard to interpret, especially when triggered by interactions between continuous and discrete behaviors at specific events or times. Existing debugging techniques can localize anomalies to specific model components, but they provide little insight into the input-signal values and timing conditions that trigger violations, or the minimal, precisely timed changes that could have prevented the failure. In this article, we introduce DeCaF, a counterfactual-guided explanation and assertion-based characterization framework for CPS debugging. Given a failing test input, DeCaF generates counterfactual changes to the input signals that transform the test from failing to passing. These changes are designed to be minimal, necessary, and sufficient to precisely restore correctness. Then, it infers assertions as logical predicates over inputs that generalize recovery conditions in an interpretable form engineers can reason about, without requiring access to internal model details. Our approach combines three counterfactual generators with two causal models, and infers success assertions. Across three CPS case studies, DeCaF achieves its best success rate with KD-Tree Nearest Neighbors combined with M5 model tree, while Genetic Algorithm combined with Random Forest provides the strongest balance between success and causal precision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeCaF combines three counterfactual generators with causal models to explain CPS failures via input changes and then infer assertions, but the three case studies report only success rates and precision without checks that the changes are minimal or that the assertions generalize.

read the letter

DeCaF takes a failing CPS test input and produces changes to the signals that make the test pass, then turns those changes into logical assertions over the inputs. The approach mixes genetic algorithms, KD-Tree nearest neighbors, and other generators with M5 model trees or random forests to handle the mix of continuous and discrete behaviors without needing the internal model structure. Across the three case studies the best success rate came from KD-Tree plus M5, while genetic algorithm plus random forest gave the best trade-off between success and causal precision. That comparison is the most concrete output the paper offers engineers who run large simulations and need to understand why a test failed. The integration itself is new for this domain and the black-box focus is practical. The main weakness is that the evaluation does not test the properties the framework claims. There is no report of trying a smaller perturbation to confirm minimality, no boundary check, and no held-out inputs to see whether the inferred assertions still hold or overfit the original failures. The generators are heuristic, so the results stay suggestive until those controls appear. The work is aimed at software engineers and researchers who debug cyber-physical systems through simulation. Anyone already using search-based or causal methods for test explanation will see the direct connection. It deserves a serious referee because the pipeline is clearly described, the case studies are real, and the open questions about verification are fixable with targeted experiments rather than fundamental flaws. I would send it for review and ask the authors to add the missing checks on minimality and generalization.

Referee Report

2 major / 1 minor

Summary. The paper introduces DeCaF, a counterfactual-guided explanation and assertion-based characterization framework for debugging cyber-physical systems (CPS). Given a failing test input, DeCaF uses three counterfactual generators (including Genetic Algorithm and KD-Tree Nearest Neighbors) combined with causal models (such as M5 model trees and Random Forest) to produce changes to input signals that turn the test from failing to passing; these changes are claimed to be minimal, necessary, and sufficient. It then infers interpretable logical predicates (assertions) over inputs that generalize the recovery conditions. Evaluation across three CPS case studies reports that KD-Tree NN with M5 achieves the highest success rate while GA with Random Forest provides the best balance between success and causal precision.

Significance. If the central claims hold, DeCaF would offer a practical advance for CPS debugging by delivering actionable, minimal input modifications and human-readable assertions without requiring white-box access to the system under test. The multi-generator design and emphasis on causal precision address a real gap between localization techniques and interpretable root-cause analysis. The empirical results on three case studies provide initial evidence of feasibility, though the absence of verification for the minimality/necessity/sufficiency properties and generalization reduces the immediate strength of the contribution.

major comments (2)

[Abstract] Abstract: The central claim that generated counterfactual changes are 'minimal, necessary, and sufficient' to restore correctness and that inferred assertions 'generalize recovery conditions' is load-bearing for the entire contribution, yet the reported evaluation provides only aggregate success rates and a balance metric with no quantitative checks (e.g., whether a strictly smaller perturbation still fails, whether the change lies on the decision boundary, or whether the assertion holds on held-out inputs).
[Evaluation] Evaluation section (implied by the three case studies): The abstract and results summary give no methodology specifics, statistical details, or discussion of limitations for the success-rate and causal-precision numbers; without these, it is impossible to determine whether the heuristic generators plus causal models reliably enforce the required properties or merely produce plausible but non-minimal recoveries.

minor comments (1)

[Abstract] The abstract would benefit from at least one concrete quantitative result (e.g., success rate or precision value) rather than only qualitative statements about 'best' and 'strongest balance'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which highlight important aspects of our evaluation that can be strengthened. We provide point-by-point responses to the major comments and outline the revisions we will make to address them.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that generated counterfactual changes are 'minimal, necessary, and sufficient' to restore correctness and that inferred assertions 'generalize recovery conditions' is load-bearing for the entire contribution, yet the reported evaluation provides only aggregate success rates and a balance metric with no quantitative checks (e.g., whether a strictly smaller perturbation still fails, whether the change lies on the decision boundary, or whether the assertion holds on held-out inputs).

Authors: We acknowledge that the evaluation in the current manuscript focuses on success rates and causal precision without explicit quantitative verification of minimality, necessity, sufficiency, or generalization on held-out data. The success rate indicates that the generated counterfactuals lead to passing tests, and the balance metric considers causal precision, but direct checks such as testing smaller perturbations or boundary conditions were not performed. In the revised version, we will add these validations: we will report the average perturbation size compared to random baselines, verify that the original failing input is recovered only with the full change, and evaluate the inferred assertions on a held-out set of test cases to demonstrate generalization. These additions will be incorporated into the Evaluation section. revision: yes
Referee: [Evaluation] Evaluation section (implied by the three case studies): The abstract and results summary give no methodology specifics, statistical details, or discussion of limitations for the success-rate and causal-precision numbers; without these, it is impossible to determine whether the heuristic generators plus causal models reliably enforce the required properties or merely produce plausible but non-minimal recoveries.

Authors: The manuscript does provide methodology details in the Evaluation section, including descriptions of the three CPS case studies, the counterfactual generators (Genetic Algorithm and KD-Tree Nearest Neighbors), the causal models (M5 model trees and Random Forest), and how success is measured. Statistical details such as the number of experiments and averaging over runs are included. However, we agree that a more explicit discussion of limitations and potential issues with the heuristic nature of the generators is needed to fully address concerns about reliability and minimality. We will revise the Evaluation section to include additional statistical analysis (e.g., standard deviations, significance tests) and a new subsection on limitations, discussing the assumptions of the causal models and the heuristic search for counterfactuals. revision: partial

Circularity Check

0 steps flagged

No circularity detected; empirical method proposal evaluated on external case studies

full rationale

The paper introduces DeCaF as a framework that combines three existing counterfactual generators (GA, KD-Tree NN, etc.) with two causal models to produce input changes and infer assertions for CPS debugging. All load-bearing claims are empirical performance results (success rates, causal precision) measured on three separate CPS case studies. No equations, derivations, or self-citations are presented that reduce the central claims to tautological redefinitions or fitted inputs renamed as predictions. The method is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work to force its conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The provided abstract does not detail any free parameters, background axioms, or newly invented entities used in the framework.

pith-pipeline@v0.9.0 · 5524 in / 1147 out tokens · 74711 ms · 2026-05-10T18:33:48.109440+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DeCaF generates counterfactual changes to the input signals that transform the test from failing to passing... infers assertions as logical predicates over inputs
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our approach combines three counterfactual generators with two causal models

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

[1]

A roadmap for simulation-based testing of autonomous cyber- physical systems: Challenges and future direction,

C. Birchler, S. Khatiri, P. Rani, T. Kehrer, and S. Panichella, “A roadmap for simulation-based testing of autonomous cyber- physical systems: Challenges and future direction,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, May 2025. [Online]. Available: https://doi.org/10.1145/3711906

work page doi:10.1145/3711906 2025
[2]

Simulation-based test case generation for unmanned aerial vehicles in the neighborhood of real flights,

S. Khatiri, S. Panichella, and P. Tonella, “Simulation-based test case generation for unmanned aerial vehicles in the neighborhood of real flights,” in2023 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 2023, pp. 281–292

work page 2023
[3]

Pareto efficient multi-objective black-box test case selec- tion for simulation-based testing,

A. Arrieta, S. Wang, U. Markiegi, A. Arruabarrena, L. Etxeberria, and G. Sagardui, “Pareto efficient multi-objective black-box test case selec- tion for simulation-based testing,”Information and Software Technology, 2019

work page 2019
[4]

Digital twins could revolutionize planes, cars and hearts,

S. Woo, “Digital twins could revolutionize planes, cars and hearts,” https://www.wsj.com/articles/digital-twins-could-revolutionize- planes-cars-and-hearts-technology-a8c2bd4e, 2024, wall Street Journal, July 17, 2024

work page 2024
[5]

D. K. Chaturvedi,Modeling and simulation of systems using MATLAB® and Simulink®. CRC press, 2017, ISBN: 978-1439806722

work page 2017
[6]

Finding unknown unknowns using cyber- physical system simulators,

S. D. Wehbe and S. Bak, “Finding unknown unknowns using cyber- physical system simulators,” inProceedings of the 7th Workshop on Design Automation for CPS and IoT, 2025, pp. 1–6

work page 2025
[7]

Failure diagnosis using discrete-event models,

M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. C. Teneketzis, “Failure diagnosis using discrete-event models,”IEEE trans- actions on control systems technology, vol. 4, no. 2, pp. 105–124, 2002

work page 2002
[8]

Trace diagnostics using temporal implicants,

T. Ferrère, O. Maler, and D. Ni ˇckovi´c, “Trace diagnostics using temporal implicants,” inInternational Symposium on Automated Technology for Verification and Analysis. Springer, 2015, pp. 241–258

work page 2015
[9]

Localizing faults in simulink/stateflow models with stl,

E. Bartocci, T. Ferrère, N. Manjunath, and D. Ni ˇckovi´c, “Localizing faults in simulink/stateflow models with stl,” inProceedings of the 21st international conference on hybrid systems: computation and control (part of cps week), 2018, pp. 197–206

work page 2018
[10]

Simulink fault localization: an iterative statistical debugging approach,

B. Liu, L. Lucia, S. Nejati, L. C. Briand, and T. Bruckmann, “Simulink fault localization: an iterative statistical debugging approach,”Software Testing, Verification and Reliability, vol. 26, no. 6, pp. 431–459, 2016

work page 2016
[11]

Cpsdebug: Automatic failure explanation in cps models,

E. Bartocci, N. Manjunath, L. Mariani, C. Mateis, and D. Ni ˇckovi´c, “Cpsdebug: Automatic failure explanation in cps models,”International Journal on Software Tools for Technology Transfer, pp. 1–14, 2021

work page 2021
[12]

Causal signal temporal logic for the environmental control and life support system’s fault analysis and explanation,

Z. Deng, S. P. Eshima, J. Nabity, and Z. Kong, “Causal signal temporal logic for the environmental control and life support system’s fault analysis and explanation,”IEEE Access, vol. 11, pp. 26 471–26 482, 2023

work page 2023
[13]

Counterfault: Value-based fault local- ization by modeling and predicting counterfactual outcomes,

A. Podgurski and Y . Küçük, “Counterfault: Value-based fault local- ization by modeling and predicting counterfactual outcomes,” in2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2020, pp. 382–393

work page 2020
[14]

Causal testing: understanding de- fects’ root causes,

B. Johnson, Y . Brun, and A. Meliou, “Causal testing: understanding de- fects’ root causes,” inProceedings of the ACM/IEEE 42nd international conference on software engineering, 2020, pp. 87–99

work page 2020
[15]

Applications of causality and causal inference in software engineering,

P. Chadbourne and N. U. Eisty, “Applications of causality and causal inference in software engineering,” in2023 IEEE/ACIS 21st Interna- tional Conference on Software Engineering Research, Management and Applications (SERA). IEEE, 2023, pp. 47–52

work page 2023
[16]

Root cause detection among anomalous time series using temporal state alignment,

S. Chakraborty, S. Shah, K. Soltani, and A. Swigart, “Root cause detection among anomalous time series using temporal state alignment,” in2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 2019, pp. 523–528

work page 2019
[17]

Root cause localization for unreproducible builds via causality analysis over system call trac- ing,

Z. Ren, C. Liu, X. Xiao, H. Jiang, and T. Xie, “Root cause localization for unreproducible builds via causality analysis over system call trac- ing,” in2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 527–538

work page 2019
[18]

Causal signal temporal logic for the environmental control and life support system’s fault analysis and explanation,

Z. Deng, S. Eshima, J. Nabity, and Z. Kong, “Causal signal temporal logic for the environmental control and life support system’s fault analysis and explanation,”IEEE Access, vol. PP, pp. 1–1, 01 2023

work page 2023
[19]

Human-in-the- loop oracle learning for semantic bugs in string processing programs,

C. G. Kapugama, V .-T. Pham, A. Aleti, and M. Böhme, “Human-in-the- loop oracle learning for semantic bugs in string processing programs,” inProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 215–226

work page 2022
[20]

Abstracting failure-inducing inputs,

R. Gopinath, A. Kampmann, N. Havrikov, E. O. Soremekun, and A. Zeller, “Abstracting failure-inducing inputs,” inProceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, 2020, pp. 237–248

work page 2020
[21]

Inputs from hell,

E. Soremekun, E. Pavese, N. Havrikov, L. Grunske, and A. Zeller, “Inputs from hell,”IEEE Transactions on Software Engineering, vol. 48, no. 4, pp. 1138–1153, 2020

work page 2020
[22]

When does my program do this? learning circumstances of software behavior,

A. Kampmann, N. Havrikov, E. O. Soremekun, and A. Zeller, “When does my program do this? learning circumstances of software behavior,” inProceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2020, pp. 1228–1239

work page 2020
[23]

Combining genetic programming and model checking to generate environment assumptions,

K. Gaaloul, C. Menghi, S. Nejati, L. C. Briand, and Y . I. Parache, “Combining genetic programming and model checking to generate environment assumptions,”IEEE Transactions on Software Engineering, vol. 48, no. 9, pp. 3664–3685, 2021

work page 2021
[24]

Learning non- robustness using simulation-based testing: a network traffic-shaping case study,

B. A. Jodat, S. Nejati, M. Sabetzadeh, and P. Saavedra, “Learning non- robustness using simulation-based testing: a network traffic-shaping case study,” in2023 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 2023, pp. 386–397

work page 2023
[25]

The daikon system for dynamic detection of likely invariants,

M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao, “The daikon system for dynamic detection of likely invariants,”Science of computer programming, vol. 69, no. 1-3, pp. 35–45, 2007

work page 2007
[26]

Min- ing assumptions for software components using machine learning,

K. Gaaloul, C. Menghi, S. Nejati, L. C. Briand, and D. Wolfe, “Min- ing assumptions for software components using machine learning,” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 159–171

work page 2020
[27]

Adaptive cruise control for an intelligent vehicle,

W. Pananurak, S. Thanok, and M. Parnichkun, “Adaptive cruise control for an intelligent vehicle,” in2008 IEEE International Conference on Robotics and Biomimetics. IEEE, 2009, pp. 1794–1799

work page 2009
[28]

Requirements-driven test generation for autonomous vehicles with ma- chine learning components,

C. E. Tuncali, G. Fainekos, D. Prokhorov, H. Ito, and J. Kapinski, “Requirements-driven test generation for autonomous vehicles with ma- chine learning components,”IEEE Transactions on Intelligent Vehicles, vol. 5, no. 2, pp. 265–280, 2019

work page 2019
[29]

[Online]

(Accessed: September 2025) Cruise control test generation. [Online]. Available: https://www.mathworks.com/help/sldv/ug/cruise-control-test- generation.html

work page 2025
[30]

[Online]

(Accessed: September 2025) Building a clutch lock-up model. [Online]. Available: https://www.mathworks.com/help/simulink/slref/building-a- clutch-lock-up-model.html

work page 2025
[31]

[Online]

(Accessed: September 2025) Design a guid- ance system in matlab and simulink. [Online]. Available: https://www.mathworks.com/help/simulink/slref/designing-a- guidance-system-in-matlab-and-simulink.html

work page 2025
[32]

[Online]

(Accessed: September 2025) Dc motor model simulink model. [Online]. Available: https://www.mathworks.com/matlabcentral/fileexchange/11587-dc- motor-model-simulink

work page 2025
[33]

Arch-comp 2024 category report: Falsification,

T. Khandait, F. Formica, P. Arcaini, S. Chotaliya, G. Fainekos, A. Hekal, A. Kundu, E. Lew, M. Loreti, C. Menghiet al., “Arch-comp 2024 category report: Falsification,” inProceedings of the 11th Int. Workshop on Applied, vol. 103, 2024, pp. 122–144

work page 2024
[34]

Arch-comp 2019 category report: Falsification

G. Ernst, P. Arcaini, A. Donze, G. Fainekos, L. Mathesen, G. Pedrielli, S. Yaghoubi, Y . Yamagata, and Z. Zhang, “Arch-comp 2019 category report: Falsification.” inARCH@ CPSIoTWeek, 2019, pp. 129–140

work page 2019
[35]

Luke,Essentials of Metaheuristics, 2nd ed

S. Luke,Essentials of Metaheuristics, 2nd ed. Lulu, 2013, available for free at http://cs.gmu.edu/∼sean/book/metaheuristics/

work page 2013
[36]

Monitoring temporal properties of con- tinuous signals,

O. Maler and D. Nickovic, “Monitoring temporal properties of con- tinuous signals,” inInternational Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems. Springer, 2004, pp. 152–166

work page 2004
[37]

Learning with continuous classes,

R. J. Quinlan, “Learning with continuous classes,” in5th Australian Joint Conference on Artificial Intelligence. Singapore: World Scientific, 1992, pp. 343–348

work page 1992
[38]

Generating rule sets from model trees,

G. Holmes, M. Hall, and E. Frank, “Generating rule sets from model trees,” inTwelfth Australian Joint Conference on Artificial Intelligence. Springer, 1999, pp. 1–12

work page 1999
[39]

A random forest guided tour,

G. Biau and E. Scornet, “A random forest guided tour,”Test, vol. 25, no. 2, pp. 197–227, 2016

work page 2016
[40]

A comprehensive survey on support vector machine classification: Applications, challenges and trends,

J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231220307153

work page 2020
[41]

Fast effective rule induction,

W. W. Cohenet al., “Fast effective rule induction,” inProceedings of the twelfth international conference on machine learning, 1995, pp. 115– 123

work page 1995
[42]

Geco: quality counterfactual explanations in real time,

M. Schleich, Z. Geng, Y . Zhang, and D. Suciu, “Geco: quality counterfactual explanations in real time,”Proc. VLDB Endow., vol. 14, no. 9, p. 1681–1693, May 2021. [Online]. Available: https://doi.org/10.14778/3461535.3461555

work page doi:10.14778/3461535.3461555 2021
[43]

Interpretable counterfactual explana- tions guided by prototypes,

A. Van Looveren and J. Klaise, “Interpretable counterfactual explana- tions guided by prototypes,” inMachine Learning and Knowledge Dis- covery in Databases. Applied Data Science and Demo Track: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part III. Springer, 2021, pp. 201–217

work page 2019
[44]

Benchmarks for temporal logic requirements for automotive systems

B. Hoxha, H. Abbas, and G. Fainekos, “Benchmarks for temporal logic requirements for automotive systems.”ARCH@ CPSWeek, vol. 34, pp. 25–30, 2014

work page 2014
[45]

Adaptive cruise control system using model predictive con- trol,

MathWorks, “Adaptive cruise control system using model predictive con- trol,” https://www.mathworks.com/help/mpc/ug/adaptive-cruise-control- using-model-predictive-controller.html, 2021, accessed: 2024-09-11

work page 2021
[46]

Towards a theory of stochastic hybrid systems,

J. Hu, J. Lygeros, and S. Sastry, “Towards a theory of stochastic hybrid systems,” inInternational Workshop on Hybrid Systems: Computation and Control. Springer, 2000, pp. 160–173

work page 2000
[47]

Arch-comp 2022 category report: Falsification with ubounded resources,

G. Ernst, P. Arcaini, G. Fainekos, F. Formica, J. Inoue, T. Khandait, M. M. Mahboob, C. Menghi, G. Pedrielli, M. Waga, Y . Yamagata, and Z. Zhang, “Arch-comp 2022 category report: Falsification with ubounded resources,” inProceedings of 9th International Workshop on Applied, vol. 90, 2022, pp. 204–221

work page 2022
[48]

When cyber-physical systems meet ai: a benchmark, an evaluation, and a way forward,

J. Song, D. Lyu, Z. Zhang, Z. Wang, T. Zhang, and L. Ma, “When cyber-physical systems meet ai: a benchmark, an evaluation, and a way forward,” inProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, 2022, pp. 343– 352

work page 2022
[49]

Robustness-guided temporal logic testing and verification for stochastic cyber-physical systems,

H. Abbas, B. Hoxha, G. Fainekos, and K. Ueda, “Robustness-guided temporal logic testing and verification for stochastic cyber-physical systems,” inThe 4th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent. IEEE, 2014, pp. 1–6

work page 2014
[50]

Explaining machine learning classifiers through diverse counterfactual explanations,

R. K. Mothilal, A. Sharma, and C. Tan, “Explaining machine learning classifiers through diverse counterfactual explanations,” inProceedings of the 2020 conference on fairness, accountability, and transparency, 2020, pp. 607–617

work page 2020
[51]

On a test of whether one of two random variables is stochastically larger than the other,

H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,”The annals of mathematical statistics, pp. 50–60, 1947

work page 1947
[52]

A critique and improvement of the cl common language effect size statistics of mcgraw and wong,

A. Vargha and H. D. Delaney, “A critique and improvement of the cl common language effect size statistics of mcgraw and wong,”Journal of Educational and Behavioral Statistics, vol. 25, no. 2, pp. 101–132, 2000

work page 2000
[53]

Towards unifying feature attribution and counterfactual explanations: Different means to the same end,

R. Kommiya Mothilal, D. Mahajan, C. Tan, and A. Sharma, “Towards unifying feature attribution and counterfactual explanations: Different means to the same end,” inProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 652–663

work page 2021
[54]

Arch-comp 2021 category report: Falsification with validation of results

G. Ernst, P. Arcaini, I. Bennani, A. Chandratre, A. Donzé, G. Fainekos, G. Frehse, K. Gaaloul, J. Inoue, T. Khandaitet al., “Arch-comp 2021 category report: Falsification with validation of results.” inARCH@ ADHS, 2021, pp. 133–152

work page 2021
[55]

Isolating cause-effect chains from computer programs,

A. Zeller, “Isolating cause-effect chains from computer programs,”ACM SIGSOFT Software Engineering Notes, vol. 27, no. 6, pp. 1–10, 2002

work page 2002

[1] [1]

A roadmap for simulation-based testing of autonomous cyber- physical systems: Challenges and future direction,

C. Birchler, S. Khatiri, P. Rani, T. Kehrer, and S. Panichella, “A roadmap for simulation-based testing of autonomous cyber- physical systems: Challenges and future direction,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, May 2025. [Online]. Available: https://doi.org/10.1145/3711906

work page doi:10.1145/3711906 2025

[2] [2]

Simulation-based test case generation for unmanned aerial vehicles in the neighborhood of real flights,

S. Khatiri, S. Panichella, and P. Tonella, “Simulation-based test case generation for unmanned aerial vehicles in the neighborhood of real flights,” in2023 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 2023, pp. 281–292

work page 2023

[3] [3]

Pareto efficient multi-objective black-box test case selec- tion for simulation-based testing,

A. Arrieta, S. Wang, U. Markiegi, A. Arruabarrena, L. Etxeberria, and G. Sagardui, “Pareto efficient multi-objective black-box test case selec- tion for simulation-based testing,”Information and Software Technology, 2019

work page 2019

[4] [4]

Digital twins could revolutionize planes, cars and hearts,

S. Woo, “Digital twins could revolutionize planes, cars and hearts,” https://www.wsj.com/articles/digital-twins-could-revolutionize- planes-cars-and-hearts-technology-a8c2bd4e, 2024, wall Street Journal, July 17, 2024

work page 2024

[5] [5]

D. K. Chaturvedi,Modeling and simulation of systems using MATLAB® and Simulink®. CRC press, 2017, ISBN: 978-1439806722

work page 2017

[6] [6]

Finding unknown unknowns using cyber- physical system simulators,

S. D. Wehbe and S. Bak, “Finding unknown unknowns using cyber- physical system simulators,” inProceedings of the 7th Workshop on Design Automation for CPS and IoT, 2025, pp. 1–6

work page 2025

[7] [7]

Failure diagnosis using discrete-event models,

M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. C. Teneketzis, “Failure diagnosis using discrete-event models,”IEEE trans- actions on control systems technology, vol. 4, no. 2, pp. 105–124, 2002

work page 2002

[8] [8]

Trace diagnostics using temporal implicants,

T. Ferrère, O. Maler, and D. Ni ˇckovi´c, “Trace diagnostics using temporal implicants,” inInternational Symposium on Automated Technology for Verification and Analysis. Springer, 2015, pp. 241–258

work page 2015

[9] [9]

Localizing faults in simulink/stateflow models with stl,

E. Bartocci, T. Ferrère, N. Manjunath, and D. Ni ˇckovi´c, “Localizing faults in simulink/stateflow models with stl,” inProceedings of the 21st international conference on hybrid systems: computation and control (part of cps week), 2018, pp. 197–206

work page 2018

[10] [10]

Simulink fault localization: an iterative statistical debugging approach,

B. Liu, L. Lucia, S. Nejati, L. C. Briand, and T. Bruckmann, “Simulink fault localization: an iterative statistical debugging approach,”Software Testing, Verification and Reliability, vol. 26, no. 6, pp. 431–459, 2016

work page 2016

[11] [11]

Cpsdebug: Automatic failure explanation in cps models,

E. Bartocci, N. Manjunath, L. Mariani, C. Mateis, and D. Ni ˇckovi´c, “Cpsdebug: Automatic failure explanation in cps models,”International Journal on Software Tools for Technology Transfer, pp. 1–14, 2021

work page 2021

[12] [12]

Causal signal temporal logic for the environmental control and life support system’s fault analysis and explanation,

Z. Deng, S. P. Eshima, J. Nabity, and Z. Kong, “Causal signal temporal logic for the environmental control and life support system’s fault analysis and explanation,”IEEE Access, vol. 11, pp. 26 471–26 482, 2023

work page 2023

[13] [13]

Counterfault: Value-based fault local- ization by modeling and predicting counterfactual outcomes,

A. Podgurski and Y . Küçük, “Counterfault: Value-based fault local- ization by modeling and predicting counterfactual outcomes,” in2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2020, pp. 382–393

work page 2020

[14] [14]

Causal testing: understanding de- fects’ root causes,

B. Johnson, Y . Brun, and A. Meliou, “Causal testing: understanding de- fects’ root causes,” inProceedings of the ACM/IEEE 42nd international conference on software engineering, 2020, pp. 87–99

work page 2020

[15] [15]

Applications of causality and causal inference in software engineering,

P. Chadbourne and N. U. Eisty, “Applications of causality and causal inference in software engineering,” in2023 IEEE/ACIS 21st Interna- tional Conference on Software Engineering Research, Management and Applications (SERA). IEEE, 2023, pp. 47–52

work page 2023

[16] [16]

Root cause detection among anomalous time series using temporal state alignment,

S. Chakraborty, S. Shah, K. Soltani, and A. Swigart, “Root cause detection among anomalous time series using temporal state alignment,” in2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 2019, pp. 523–528

work page 2019

[17] [17]

Root cause localization for unreproducible builds via causality analysis over system call trac- ing,

Z. Ren, C. Liu, X. Xiao, H. Jiang, and T. Xie, “Root cause localization for unreproducible builds via causality analysis over system call trac- ing,” in2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 527–538

work page 2019

[18] [18]

Causal signal temporal logic for the environmental control and life support system’s fault analysis and explanation,

Z. Deng, S. Eshima, J. Nabity, and Z. Kong, “Causal signal temporal logic for the environmental control and life support system’s fault analysis and explanation,”IEEE Access, vol. PP, pp. 1–1, 01 2023

work page 2023

[19] [19]

Human-in-the- loop oracle learning for semantic bugs in string processing programs,

C. G. Kapugama, V .-T. Pham, A. Aleti, and M. Böhme, “Human-in-the- loop oracle learning for semantic bugs in string processing programs,” inProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 215–226

work page 2022

[20] [20]

Abstracting failure-inducing inputs,

R. Gopinath, A. Kampmann, N. Havrikov, E. O. Soremekun, and A. Zeller, “Abstracting failure-inducing inputs,” inProceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, 2020, pp. 237–248

work page 2020

[21] [21]

Inputs from hell,

E. Soremekun, E. Pavese, N. Havrikov, L. Grunske, and A. Zeller, “Inputs from hell,”IEEE Transactions on Software Engineering, vol. 48, no. 4, pp. 1138–1153, 2020

work page 2020

[22] [22]

When does my program do this? learning circumstances of software behavior,

A. Kampmann, N. Havrikov, E. O. Soremekun, and A. Zeller, “When does my program do this? learning circumstances of software behavior,” inProceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2020, pp. 1228–1239

work page 2020

[23] [23]

Combining genetic programming and model checking to generate environment assumptions,

K. Gaaloul, C. Menghi, S. Nejati, L. C. Briand, and Y . I. Parache, “Combining genetic programming and model checking to generate environment assumptions,”IEEE Transactions on Software Engineering, vol. 48, no. 9, pp. 3664–3685, 2021

work page 2021

[24] [24]

Learning non- robustness using simulation-based testing: a network traffic-shaping case study,

B. A. Jodat, S. Nejati, M. Sabetzadeh, and P. Saavedra, “Learning non- robustness using simulation-based testing: a network traffic-shaping case study,” in2023 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 2023, pp. 386–397

work page 2023

[25] [25]

The daikon system for dynamic detection of likely invariants,

M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao, “The daikon system for dynamic detection of likely invariants,”Science of computer programming, vol. 69, no. 1-3, pp. 35–45, 2007

work page 2007

[26] [26]

Min- ing assumptions for software components using machine learning,

K. Gaaloul, C. Menghi, S. Nejati, L. C. Briand, and D. Wolfe, “Min- ing assumptions for software components using machine learning,” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 159–171

work page 2020

[27] [27]

Adaptive cruise control for an intelligent vehicle,

W. Pananurak, S. Thanok, and M. Parnichkun, “Adaptive cruise control for an intelligent vehicle,” in2008 IEEE International Conference on Robotics and Biomimetics. IEEE, 2009, pp. 1794–1799

work page 2009

[28] [28]

Requirements-driven test generation for autonomous vehicles with ma- chine learning components,

C. E. Tuncali, G. Fainekos, D. Prokhorov, H. Ito, and J. Kapinski, “Requirements-driven test generation for autonomous vehicles with ma- chine learning components,”IEEE Transactions on Intelligent Vehicles, vol. 5, no. 2, pp. 265–280, 2019

work page 2019

[29] [29]

[Online]

(Accessed: September 2025) Cruise control test generation. [Online]. Available: https://www.mathworks.com/help/sldv/ug/cruise-control-test- generation.html

work page 2025

[30] [30]

[Online]

(Accessed: September 2025) Building a clutch lock-up model. [Online]. Available: https://www.mathworks.com/help/simulink/slref/building-a- clutch-lock-up-model.html

work page 2025

[31] [31]

[Online]

(Accessed: September 2025) Design a guid- ance system in matlab and simulink. [Online]. Available: https://www.mathworks.com/help/simulink/slref/designing-a- guidance-system-in-matlab-and-simulink.html

work page 2025

[32] [32]

[Online]

(Accessed: September 2025) Dc motor model simulink model. [Online]. Available: https://www.mathworks.com/matlabcentral/fileexchange/11587-dc- motor-model-simulink

work page 2025

[33] [33]

Arch-comp 2024 category report: Falsification,

T. Khandait, F. Formica, P. Arcaini, S. Chotaliya, G. Fainekos, A. Hekal, A. Kundu, E. Lew, M. Loreti, C. Menghiet al., “Arch-comp 2024 category report: Falsification,” inProceedings of the 11th Int. Workshop on Applied, vol. 103, 2024, pp. 122–144

work page 2024

[34] [34]

Arch-comp 2019 category report: Falsification

G. Ernst, P. Arcaini, A. Donze, G. Fainekos, L. Mathesen, G. Pedrielli, S. Yaghoubi, Y . Yamagata, and Z. Zhang, “Arch-comp 2019 category report: Falsification.” inARCH@ CPSIoTWeek, 2019, pp. 129–140

work page 2019

[35] [35]

Luke,Essentials of Metaheuristics, 2nd ed

S. Luke,Essentials of Metaheuristics, 2nd ed. Lulu, 2013, available for free at http://cs.gmu.edu/∼sean/book/metaheuristics/

work page 2013

[36] [36]

Monitoring temporal properties of con- tinuous signals,

O. Maler and D. Nickovic, “Monitoring temporal properties of con- tinuous signals,” inInternational Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems. Springer, 2004, pp. 152–166

work page 2004

[37] [37]

Learning with continuous classes,

R. J. Quinlan, “Learning with continuous classes,” in5th Australian Joint Conference on Artificial Intelligence. Singapore: World Scientific, 1992, pp. 343–348

work page 1992

[38] [38]

Generating rule sets from model trees,

G. Holmes, M. Hall, and E. Frank, “Generating rule sets from model trees,” inTwelfth Australian Joint Conference on Artificial Intelligence. Springer, 1999, pp. 1–12

work page 1999

[39] [39]

A random forest guided tour,

G. Biau and E. Scornet, “A random forest guided tour,”Test, vol. 25, no. 2, pp. 197–227, 2016

work page 2016

[40] [40]

A comprehensive survey on support vector machine classification: Applications, challenges and trends,

J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231220307153

work page 2020

[41] [41]

Fast effective rule induction,

W. W. Cohenet al., “Fast effective rule induction,” inProceedings of the twelfth international conference on machine learning, 1995, pp. 115– 123

work page 1995

[42] [42]

Geco: quality counterfactual explanations in real time,

M. Schleich, Z. Geng, Y . Zhang, and D. Suciu, “Geco: quality counterfactual explanations in real time,”Proc. VLDB Endow., vol. 14, no. 9, p. 1681–1693, May 2021. [Online]. Available: https://doi.org/10.14778/3461535.3461555

work page doi:10.14778/3461535.3461555 2021

[43] [43]

Interpretable counterfactual explana- tions guided by prototypes,

A. Van Looveren and J. Klaise, “Interpretable counterfactual explana- tions guided by prototypes,” inMachine Learning and Knowledge Dis- covery in Databases. Applied Data Science and Demo Track: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part III. Springer, 2021, pp. 201–217

work page 2019

[44] [44]

Benchmarks for temporal logic requirements for automotive systems

B. Hoxha, H. Abbas, and G. Fainekos, “Benchmarks for temporal logic requirements for automotive systems.”ARCH@ CPSWeek, vol. 34, pp. 25–30, 2014

work page 2014

[45] [45]

Adaptive cruise control system using model predictive con- trol,

MathWorks, “Adaptive cruise control system using model predictive con- trol,” https://www.mathworks.com/help/mpc/ug/adaptive-cruise-control- using-model-predictive-controller.html, 2021, accessed: 2024-09-11

work page 2021

[46] [46]

Towards a theory of stochastic hybrid systems,

J. Hu, J. Lygeros, and S. Sastry, “Towards a theory of stochastic hybrid systems,” inInternational Workshop on Hybrid Systems: Computation and Control. Springer, 2000, pp. 160–173

work page 2000

[47] [47]

Arch-comp 2022 category report: Falsification with ubounded resources,

G. Ernst, P. Arcaini, G. Fainekos, F. Formica, J. Inoue, T. Khandait, M. M. Mahboob, C. Menghi, G. Pedrielli, M. Waga, Y . Yamagata, and Z. Zhang, “Arch-comp 2022 category report: Falsification with ubounded resources,” inProceedings of 9th International Workshop on Applied, vol. 90, 2022, pp. 204–221

work page 2022

[48] [48]

When cyber-physical systems meet ai: a benchmark, an evaluation, and a way forward,

J. Song, D. Lyu, Z. Zhang, Z. Wang, T. Zhang, and L. Ma, “When cyber-physical systems meet ai: a benchmark, an evaluation, and a way forward,” inProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, 2022, pp. 343– 352

work page 2022

[49] [49]

Robustness-guided temporal logic testing and verification for stochastic cyber-physical systems,

H. Abbas, B. Hoxha, G. Fainekos, and K. Ueda, “Robustness-guided temporal logic testing and verification for stochastic cyber-physical systems,” inThe 4th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent. IEEE, 2014, pp. 1–6

work page 2014

[50] [50]

Explaining machine learning classifiers through diverse counterfactual explanations,

R. K. Mothilal, A. Sharma, and C. Tan, “Explaining machine learning classifiers through diverse counterfactual explanations,” inProceedings of the 2020 conference on fairness, accountability, and transparency, 2020, pp. 607–617

work page 2020

[51] [51]

On a test of whether one of two random variables is stochastically larger than the other,

H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,”The annals of mathematical statistics, pp. 50–60, 1947

work page 1947

[52] [52]

A critique and improvement of the cl common language effect size statistics of mcgraw and wong,

A. Vargha and H. D. Delaney, “A critique and improvement of the cl common language effect size statistics of mcgraw and wong,”Journal of Educational and Behavioral Statistics, vol. 25, no. 2, pp. 101–132, 2000

work page 2000

[53] [53]

Towards unifying feature attribution and counterfactual explanations: Different means to the same end,

R. Kommiya Mothilal, D. Mahajan, C. Tan, and A. Sharma, “Towards unifying feature attribution and counterfactual explanations: Different means to the same end,” inProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 652–663

work page 2021

[54] [54]

Arch-comp 2021 category report: Falsification with validation of results

G. Ernst, P. Arcaini, I. Bennani, A. Chandratre, A. Donzé, G. Fainekos, G. Frehse, K. Gaaloul, J. Inoue, T. Khandaitet al., “Arch-comp 2021 category report: Falsification with validation of results.” inARCH@ ADHS, 2021, pp. 133–152

work page 2021

[55] [55]

Isolating cause-effect chains from computer programs,

A. Zeller, “Isolating cause-effect chains from computer programs,”ACM SIGSOFT Software Engineering Notes, vol. 27, no. 6, pp. 1–10, 2002

work page 2002