Automated Test Validators for Flaky Cyber-Physical System Simulators: Approach and Evaluation
Pith reviewed 2026-05-18 20:51 UTC · model grok-4.3
The pith
Genetic programming using the Ochiai formula produces more accurate test validators for filtering ineffective inputs in flaky cyber-physical system simulators than decision trees or other formulas.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Test validators generated using genetic programming with the Ochiai spectrum-based fault localization formula are significantly more accurate than those generated using genetic programming with Tarantula and Naish or using decision trees and decision rules. This accuracy advantage remains even when accounting for the flakiness of the simulator. The validators are robust against flakiness, showing only 4 percent average variation in accuracy results across four different network and autonomous-driving systems. On average, 88.7 percent of the assertions inferred by the approach align or overlap with requirements precondition violations, ODD-limit violations, and nominal safe conditions.
What carries the argument
Genetic programming that uses spectrum-based fault localization ranking formulas, especially Ochiai, as fitness functions to evolve boolean expressions classifying test inputs as valid or invalid for simulator execution.
If this is right
- Validators can pre-filter test inputs that violate preconditions or exceed ODD limits, avoiding unnecessary simulator runs.
- The accuracy advantage persists despite inconsistent outcomes caused by simulator flakiness.
- Generated assertions align closely with technical standards and empirical results from the literature.
- Robustness is demonstrated with only 4 percent average accuracy variation across multiple flaky systems.
Where Pith is reading between the lines
- The filtering step could be inserted early in automated testing pipelines to reduce overall simulation time for large input spaces.
- Similar repurposing of fault localization formulas might help other simulation-based domains that face high execution costs.
- If the validators prove stable, they could enable broader sampling of critical scenarios without proportional growth in compute demand.
Load-bearing premise
Spectrum-based fault localization ranking formulas such as Ochiai can be repurposed as effective fitness functions inside genetic programming to evolve validators that correctly identify precondition violations, ODD-limit violations, and inherently safe scenarios without needing to execute the simulator.
What would settle it
A new case study on a different CPS domain where the accuracy of GP with Ochiai is not significantly higher than GP with Tarantula, Naish, or decision trees would falsify the central accuracy claim.
Figures
read the original abstract
Simulation-based testing of cyber-physical systems (CPS) is costly due to the time-consuming execution of CPS simulators. In addition, CPS simulators may be flaky, leading to inconsistent test outcomes and requiring repeated test re-execution for reliable test verdicts. Many test inputs within the input space of CPS may not effectively exercise the behaviour of the system under test (SUT) -- for instance, those that violate system preconditions, exceed operational design domain (ODD) limits, or represent inherently safe scenarios. In this article, we propose to use test validators to filter out such test inputs before execution. We describe two methods for generating test validators: one using genetic programming (GP) that employs well-known spectrum-based fault localization (SBFL) ranking formulas, namely Ochiai, Tarantula, and Naish, as fitness functions; and the other using decision trees (DT) and decision rules (DR). We evaluate our test validators through case studies in the domains of aerospace, networking and autonomous driving. We show that test validators generated using GP with Ochiai are significantly more accurate than those generated using GP with Tarantula and Naish or using DT or DR. Moreover, this accuracy advantage remains even when accounting for the flakiness of the simulator. We further show that our test validators generated by GP with Ochiai are robust against flakiness with only 4% average variation in their accuracy results across four different network and autonomous-driving systems with flaky behaviours. Finally, we show that, on average, 88.7% of the assertions inferred by our approach align or overlap with requirements precondition violations, ODD-limit violations, and nominal safe conditions extracted from technical standards and empirical results in the literature.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes automated generation of test validators to filter ineffective inputs (precondition violations, ODD-limit violations, inherently safe scenarios) for flaky CPS simulators, thereby reducing execution costs. Two generation methods are described: genetic programming (GP) that repurposes spectrum-based fault localization (SBFL) formulas (Ochiai, Tarantula, Naish) as fitness functions, and decision trees/rules (DT/DR). Evaluation on aerospace, networking, and autonomous-driving case studies claims that GP with Ochiai yields significantly higher accuracy than the alternatives, that this advantage persists under simulator flakiness, that accuracy varies only 4% on average across four flaky systems, and that 88.7% of inferred assertions align with literature-derived requirements.
Significance. If the central empirical claims hold after rigorous statistical validation and clearer method exposition, the work could meaningfully lower the cost of simulation-based CPS testing in safety-critical domains. The explicit handling of flakiness and the reported alignment with external standards are practical strengths. The approach also offers a novel transfer of SBFL techniques into test-input filtering, which could be extended if the mapping from coverage spectra to validator fitness is shown to be general rather than artifactual.
major comments (3)
- [Abstract and §5] Abstract and §5 (Evaluation): the repeated claim that GP-Ochiai validators are 'significantly more accurate' is unsupported by any statistical test, confidence interval, effect size, or raw-data summary. The reported accuracy advantage and the 4% flakiness-variation figure therefore remain descriptive rather than inferential, weakening the central comparative result.
- [§3.2] §3.2 (GP fitness function definition): the mapping from SBFL spectra (counts a, b, c, d) to a fitness function over candidate validator predicates or features is not explicitly constructed. Without this definition it is unclear why Ochiai, Tarantula, or Naish remain meaningful outside their original code-coverage setting or whether the reported superiority is an artifact of the particular feature encoding and labeling scheme.
- [§4 and §5] §4 and §5: the experimental protocol (number of GP runs, population size, termination criteria, how flakiness is injected and measured, train/test split for validator accuracy) is not fully specified, preventing independent reproduction or assessment of robustness claims.
minor comments (2)
- [Figures 4-7] Table captions and axis labels in the accuracy and robustness plots should explicitly state the number of independent runs and the exact accuracy metric (e.g., precision, recall, F1) used.
- [§5.3] The 88.7% alignment figure would benefit from a breakdown by domain and by type of violation (precondition vs. ODD vs. safe scenario) to show whether the result is uniform.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below, indicating where revisions will strengthen the manuscript and where we provide additional clarification.
read point-by-point responses
-
Referee: [Abstract and §5] the repeated claim that GP-Ochiai validators are 'significantly more accurate' is unsupported by any statistical test, confidence interval, effect size, or raw-data summary. The reported accuracy advantage and the 4% flakiness-variation figure therefore remain descriptive rather than inferential.
Authors: We agree that the term 'significantly' was used in a descriptive sense in the current draft. In the revised version we will replace this with inferential statistics: we will report results from 30 independent GP runs, apply the Wilcoxon signed-rank test with p-values, compute Cohen's d effect sizes, and include 95% confidence intervals for the accuracy differences. Raw per-run accuracy tables will be added to an appendix or supplementary material. The 4% variation figure will similarly be accompanied by standard deviation and range across the four systems. revision: yes
-
Referee: [§3.2] the mapping from SBFL spectra (counts a, b, c, d) to a fitness function over candidate validator predicates or features is not explicitly constructed. Without this definition it is unclear why Ochiai, Tarantula, or Naish remain meaningful outside their original code-coverage setting.
Authors: The fitness function is obtained by treating each candidate validator predicate as a binary classifier over the set of executed test inputs: a = number of effective inputs where the predicate evaluates true, b = number of ineffective inputs where it evaluates true, c = number of effective inputs where it evaluates false, d = number of ineffective inputs where it evaluates false. The SBFL formula is then applied directly to these four counts to produce the fitness value. We will insert an explicit equation and a short paragraph in §3.2 that defines this mapping and explains why the formulas remain semantically meaningful when the 'spectrum' is derived from input-effectiveness labels rather than statement coverage. revision: yes
-
Referee: [§4 and §5] the experimental protocol (number of GP runs, population size, termination criteria, how flakiness is injected and measured, train/test split for validator accuracy) is not fully specified, preventing independent reproduction or assessment of robustness claims.
Authors: We acknowledge that several parameter values and procedural steps were described at a high level. In the revision we will expand both sections with the following concrete details: 30 independent GP runs per configuration, population size of 100, tournament selection, 100-generation limit or fitness convergence of 0.01, flakiness injection via Gaussian noise on simulator outputs with variance calibrated to observed real-world flakiness rates, accuracy measured on a held-out 30% test set after training on 70%, and explicit random-seed reporting. A new subsection will tabulate all hyperparameters. revision: yes
Circularity Check
No circularity in empirical evaluation of GP-based test validators
full rationale
The paper's claims rest on comparative experiments across aerospace, networking, and autonomous-driving case studies, measuring validator accuracy against ground-truth labels for precondition/ODD/safe-scenario violations and checking robustness to simulator flakiness. SBFL formulas are adopted as GP fitness functions via an explicit methodological choice, with performance differences reported empirically rather than derived by construction from the evaluation data itself. Alignment with literature requirements (88.7% overlap) serves as an external validation step, not a definitional input. No equations, self-citations, or renamings reduce the reported accuracy advantages or robustness figures to tautological re-expressions of the same fitted quantities or prior author results.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption CPS simulators can produce inconsistent outcomes for the same input due to flakiness, requiring repeated executions for reliable verdicts.
- domain assumption A substantial fraction of test inputs violate preconditions, exceed ODD limits, or represent inherently safe scenarios and can therefore be filtered without execution.
Forward citations
Cited by 1 Pith paper
-
Grammar-Constrained Refinement of Safety Operational Rules Using Language in the Loop: What Could Go Wrong
A grammar-constrained counterfactual refinement framework resolves inconsistencies in safety operational rules for an autonomous driving system while staying syntactically valid.
Reference graph
Works this paper leans on
- [1]
-
[2]
Machine learning-based test selection for simulation-based testing of self-driving cars software,
C. Birchler, S. Khatiri, B. Bosshard, A. Gambi, and S. Panichella, “Machine learning-based test selection for simulation-based testing of self-driving cars software,” Empirical Software Engineering , vol. 28, no. 3, p. 71, 2023
work page 2023
-
[3]
Salvo: Automated generation of diversified tests for self-driving cars from existing maps,
V . Nguyen, S. Huber, and A. Gambi, “Salvo: Automated generation of diversified tests for self-driving cars from existing maps,” in2021 IEEE International Conference on Artificial Intelligence Testing (AITest) . IEEE, 2021, pp. 128–135
work page 2021
-
[4]
Simulation-based testing of unmanned aerial vehicles with aerialist,
S. Khatiri, S. Panichella, and P. Tonella, “Simulation-based testing of unmanned aerial vehicles with aerialist,” in Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, 2024, pp. 134–138
work page 2024
-
[5]
An empirical analysis of flaky tests,
Q. Luo, F. Hariri, L. Eloussi, and D. Marinov, “An empirical analysis of flaky tests,” inProceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, 2014, pp. 643–653
work page 2014
-
[6]
O. Parry, G. M. Kapfhammer, M. Hilton, and P. McMinn, “A survey of flaky tests,” ACM Transactions on Software Engineering and Method- ology (TOSEM), vol. 31, no. 1, pp. 1–74, 2021
work page 2021
-
[7]
Constructing automated test oracle for low observable software,
M. Valueian, N. Attar, H. Haghighi, and M. Vahidi-Asl, “Constructing automated test oracle for low observable software,” Scientia Iranica , vol. 27, no. 3, pp. 1333–1351, 2020
work page 2020
-
[8]
Using a neural network in the software testing process,
M. Vanmali, M. Last, and A. Kandel, “Using a neural network in the software testing process,” International Journal of Intelligent Systems , vol. 17, no. 1, pp. 45–62, 2002
work page 2002
-
[9]
An automated framework for software test oracle,
S. R. Shahamiri, W. M. N. W. Kadir, S. Ibrahim, and S. Z. M. Hashim, “An automated framework for software test oracle,” Information and Software Technology, vol. 53, no. 7, pp. 774–788, 2011. 23
work page 2011
-
[10]
Artificial neural networks as multi-networks automated test oracle,
S. R. Shahamiri, W. M. Wan-Kadir, S. Ibrahim, and S. Z. M. Hashim, “Artificial neural networks as multi-networks automated test oracle,” Automated Software Engineering , vol. 19, pp. 303–334, 2012
work page 2012
-
[11]
A. Singhal, A. Bansal, and A. Kumar, “An approach to design test oracle for aspect oriented software systems using soft computing approach,” International Journal of System Assurance Engineering and Management, vol. 7, pp. 1–5, 2016
work page 2016
-
[12]
A classifier-based test oracle for embedded software,
F. Gholami, N. Attar, H. Haghighi, M. V . Asl, M. Valueian, and S. Mo- hamadyari, “A classifier-based test oracle for embedded software,” in 2018 Real-Time and Embedded Systems and Technologies (RTEST) . IEEE, 2018, pp. 104–111
work page 2018
-
[13]
A machine learning approach to generate test oracles,
R. Braga, P. S. Neto, R. Rabêlo, J. Santiago, and M. Souza, “A machine learning approach to generate test oracles,” in Proceedings of the XXXII Brazilian Symposium on Software Engineering , 2018, pp. 142–151
work page 2018
-
[14]
Human-in-the-loop automatic program repair,
C. Geethal, M. Böhme, and V .-T. Pham, “Human-in-the-loop automatic program repair,” IEEE Transactions on Software Engineering , 2023
work page 2023
-
[15]
On the accuracy of spectrum-based fault localization,
R. Abreu, P. Zoeteweij, and A. J. Van Gemund, “On the accuracy of spectrum-based fault localization,” in Testing: Academic and industrial conference practice and research techniques-MUTATION (TAICPART- MUTATION 2007). IEEE, 2007, pp. 89–98
work page 2007
-
[16]
Empirical evaluation of the tarantula automatic fault-localization technique,
J. A. Jones and M. J. Harrold, “Empirical evaluation of the tarantula automatic fault-localization technique,” in Proceedings of the 20th IEEE/ACM international Conference on Automated software engineer- ing, 2005, pp. 273–282
work page 2005
-
[17]
A model for spectra- based software diagnosis,
L. Naish, H. J. Lee, and K. Ramamohanarao, “A model for spectra- based software diagnosis,” ACM Transactions on software engineering and methodology (TOSEM) , vol. 20, no. 3, pp. 1–32, 2011
work page 2011
-
[18]
Localizing multiple faults in simulink models,
B. Liu, Lucia, S. Nejati, L. C. Briand, and T. Bruckmann, “Localizing multiple faults in simulink models,” in IEEE 23rd International Con- ference on Software Analysis, Evolution, and Reengineering, SANER 2016, Suita, Osaka, Japan, March 14-18, 2016 - Volume 1 . IEEE Computer Society, 2016, pp. 146–156
work page 2016
-
[19]
Monitoring temporal properties of con- tinuous signals,
O. Maler and D. Nickovic, “Monitoring temporal properties of con- tinuous signals,” in International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems. Springer, 2004, pp. 152–166
work page 2004
- [20]
-
[21]
C. Menghi, S. Nejati, K. Gaaloul, and L. C. Briand, “Generating automated and online test oracles for simulink models with continuous and uncertain behaviors,” in Proceedings of the 2019 27th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering , 2019, pp. 27–38
work page 2019
- [22]
- [23]
- [24]
- [25]
-
[26]
Arch- comp 2024 category report: Falsification,
T. Khandait, F. Formica, P. Arcaini, S. Chotaliya, G. Fainekos, A. Hekal, A. Kundu, E. Lew, M. Loreti, C. Menghi et al. , “Arch- comp 2024 category report: Falsification,” in Proceedings of the 11th Int. Workshop on Applied , vol. 103, 2024, pp. 122–144
work page 2024
-
[27]
(Accessed: September 2025) Replication package for the article. [Online]. Available: https://doi.org/10.5281/zenodo.16912908
-
[28]
The oracle problem in software testing: A survey,
E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,”Transactions on Software Engineering, vol. 41, no. 5, pp. 507–525, 2015
work page 2015
-
[29]
Luke, Essentials of Metaheuristics , 2nd ed
S. Luke, Essentials of Metaheuristics , 2nd ed. Lulu, 2013, available for free at http://cs.gmu.edu/ ∼sean/book/metaheuristics/
work page 2013
-
[30]
Test generation strategies for building failure models and explaining spurious failures,
B. A. Jodat, A. Chandar, S. Nejati, and M. Sabetzadeh, “Test generation strategies for building failure models and explaining spurious failures,” ACM Transactions on Software Engineering and Methodology, vol. 33, no. 4, pp. 1–32, 2024
work page 2024
-
[31]
Combining genetic programming and model checking to generate en- vironment assumptions,
K. Gaaloul, C. Menghi, S. Nejati, L. C. Briand, and Y . I. Parache, “Combining genetic programming and model checking to generate en- vironment assumptions,” IEEE Transactions on Software Engineering , vol. 48, no. 9, pp. 3664–3685, 2021
work page 2021
-
[32]
Using genetic programming to build self-adaptivity into software-defined networks,
J. Li, S. Nejati, and M. Sabetzadeh, “Using genetic programming to build self-adaptivity into software-defined networks,” ACM Transac- tions on Autonomous and Adaptive Systems , vol. 19, no. 1, pp. 1–35, 2024
work page 2024
-
[33]
Structure-based constants in genetic programming,
C. B. Veenhuis, “Structure-based constants in genetic programming,” in Progress in Artificial Intelligence: 16th Portuguese Conference on Artificial Intelligence, EPIA 2013, Angra do Heroísmo, Azores, Portugal, September 9-12, 2013. Proceedings 16 . Springer, 2013, pp. 126–137
work page 2013
-
[34]
M. Harman, P. McMinn, J. T. de Souza, and S. Yoo, Search Based Software Engineering: Techniques, Taxonomy, Tutorial. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 1–59, ISBN: 978-3-642-25231-0. [Online]. Available: https://doi.org/10.1007/978-3 -642-25231-0_1
-
[35]
R. Poli, W. B. Langdon, and N. F. McPhee, A Field Guide to Genetic Programming. Lulu.com, 2008, ISBN: 978-1-4092-0073-4
work page 2008
-
[36]
Evaluation of measures for statistical fault localisation and an optimising scheme,
D. Landsberg, H. Chockler, D. Kroening, and M. Lewis, “Evaluation of measures for statistical fault localisation and an optimising scheme,” in Fundamental Approaches to Software Engineering: 18th International Conference, FASE 2015, Held as Part of the European Joint Confer- ences on Theory and Practice of Software, ETAPS 2015, London, UK, April 11-18, 20...
work page 2015
-
[37]
Molnar, Interpretable machine learning
C. Molnar, Interpretable machine learning . Lulu. com, 2020, ISBN: 979-8411463330
work page 2020
-
[38]
L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in International conference on Tools and Algorithms for the Construction and Analysis of Systems . Springer, 2008, pp. 337–340
work page 2008
-
[39]
Requirements-driven test generation for autonomous vehicles with machine learning components,
C. E. Tuncali, G. Fainekos, D. Prokhorov, H. Ito, and J. Kapinski, “Requirements-driven test generation for autonomous vehicles with machine learning components,” IEEE Transactions on Intelligent Vehi- cles, vol. 5, no. 2, pp. 265–280, 2019
work page 2019
-
[40]
Pareto efficient multi-objective black-box test case selection for simulation-based testing,
A. Arrieta, S. Wang, U. Markiegi, A. Arruabarrena, L. Etxeberria, and G. Sagardui, “Pareto efficient multi-objective black-box test case selection for simulation-based testing,” Information and Software Tech- nology, 2019
work page 2019
-
[41]
Mining assumptions for software components using machine learning,
K. Gaaloul, C. Menghi, S. Nejati, L. C. Briand, and D. Wolfe, “Mining assumptions for software components using machine learning,” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Soft- ware Engineering, 2020, pp. 159–171
work page 2020
-
[42]
Arch-comp 2019 category report: Falsification
G. Ernst, P. Arcaini, A. Donze, G. Fainekos, L. Mathesen, G. Pedrielli, S. Yaghoubi, Y . Yamagata, and Z. Zhang, “Arch-comp 2019 category report: Falsification.” in ARCH@ CPSIoTWeek, 2019, pp. 129–140
work page 2019
-
[43]
Learning non- robustness using simulation-based testing: a network traffic-shaping case study,
B. A. Jodat, S. Nejati, M. Sabetzadeh, and P. Saavedra, “Learning non- robustness using simulation-based testing: a network traffic-shaping case study,” in 2023 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 2023, pp. 386–397
work page 2023
-
[44]
D. K. Chaturvedi, Modeling and simulation of systems using MAT- LAB® and Simulink® . CRC press, 2017, ISBN: 978-1439806722
work page 2017
- [45]
- [46]
- [47]
-
[48]
Control strategies for autonomous vehicles,
C. V . Samak, T. V . Samak, and S. Kandhasamy, “Control strategies for autonomous vehicles,” in Autonomous driving and advanced driver- assistance systems (ADAS) . CRC Press, 2021, pp. 37–86
work page 2021
-
[49]
End to End Learning for Self-Driving Cars
M. Bojarski, D. W. del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end learning for self- driving cars,” ArXiv, vol. abs/1604.07316, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:15780954
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [50]
-
[51]
Evaluating the impact of flaky simulators on testing autonomous driving systems,
M. H. Amini, S. Naseri, and S. Nejati, “Evaluating the impact of flaky simulators on testing autonomous driving systems,” Empirical Software Engineering, vol. 29, no. 2, pp. 1–30, 2024
work page 2024
-
[52]
M. Borg, R. B. Abdessalem, S. Nejati, F. Jegeden, and D. Shin, “Digital twins are not monozygotic - cross-replicating ADAS testing in two industry-grade automotive simulators,” in 14th IEEE Conference on Software Testing, Verification and Validation, ICST 2021, Porto de Galinhas, Brazil, April 12-16, 2021 . IEEE, 2021, pp. 383–393
work page 2021
-
[53]
Practical bayesian optimiza- tion of machine learning algorithms,
J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimiza- tion of machine learning algorithms,” Advances in neural information processing systems, vol. 25, 2012. 24
work page 2012
-
[54]
A comparison of bloat control methods for genetic programming,
S. Luke and L. Panait, “A comparison of bloat control methods for genetic programming,” Evolutionary computation , vol. 14, no. 3, pp. 309–344, 2006
work page 2006
-
[55]
On a test of whether one of two random variables is stochastically larger than the other,
H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,” The annals of mathematical statistics, pp. 50–60, 1947
work page 1947
-
[56]
A critique and improvement of the cl common language effect size statistics of mcgraw and wong,
A. Vargha and H. D. Delaney, “A critique and improvement of the cl common language effect size statistics of mcgraw and wong,” Journal of Educational and Behavioral Statistics , vol. 25, no. 2, pp. 101–132, 2000
work page 2000
-
[57]
Controlling the false discovery rate: a practical and powerful approach to multiple testing,
Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal statistical society: series B (Methodological) , vol. 57, no. 1, pp. 289–300, 1995
work page 1995
-
[58]
S. Khatiri, S. Panichella, and P. Tonella, “Simulation-based test case generation for unmanned aerial vehicles in the neighborhood of real flights,” in2023 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 2023, pp. 281–292
work page 2023
-
[59]
A. Arcuri and L. Briand, “A practical guide for using statistical tests to assess randomized algorithms in software engineering,” in Proceedings of the 33rd international conference on software engineering, 2011, pp. 1–10
work page 2011
-
[60]
Automated formalization of structured natural language requirements,
D. Giannakopoulou, T. Pressburger, A. Mavridou, and J. Schumann, “Automated formalization of structured natural language requirements,” Information and Software Technology , vol. 137, p. 106590, 2021
work page 2021
-
[61]
Evaluating model testing and model checking for finding requirements violations in simulink models,
S. Nejati, K. Gaaloul, C. Menghi, L. C. Briand, S. Foster, and D. Wolfe, “Evaluating model testing and model checking for finding requirements violations in simulink models,” in Proceedings of the 2019 27th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2019, pp. 1015– 1025
work page 2019
- [62]
-
[63]
The daikon system for dynamic detection of likely invariants,
M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao, “The daikon system for dynamic detection of likely invariants,” Science of computer programming , vol. 69, no. 1-3, pp. 35–45, 2007
work page 2007
-
[64]
Test oracle assessment and improvement,
G. Jahangirova, D. Clark, M. Harman, and P. Tonella, “Test oracle assessment and improvement,” in Proceedings of the 25th international symposium on software testing and analysis , 2016, pp. 247–258
work page 2016
-
[65]
Evolutionary improvement of assertion oracles,
V . Terragni, G. Jahangirova, P. Tonella, and M. Pezzè, “Evolutionary improvement of assertion oracles,” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2020, pp. 1178–1189
work page 2020
-
[66]
Using semi-supervised learning for pre- dicting metamorphic relations,
B. Hardin and U. Kanewala, “Using semi-supervised learning for pre- dicting metamorphic relations,” in Proceedings of the 3rd International Workshop on Metamorphic Testing, 2018, pp. 14–17
work page 2018
-
[67]
Using machine learning techniques to detect metamorphic relations for programs without test oracles,
U. Kanewala and J. M. Bieman, “Using machine learning techniques to detect metamorphic relations for programs without test oracles,” in 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2013, pp. 1–10
work page 2013
-
[68]
U. Kanewala, J. M. Bieman, and A. Ben-Hur, “Predicting metamorphic relations for testing scientific software: a machine learning approach using graph kernels,” Software testing, verification and reliability , vol. 26, no. 3, pp. 245–269, 2016
work page 2016
-
[69]
Leveraging mutants for automatic prediction of metamorphic relations using machine learning,
A. Nair, K. Meinke, and S. Eldh, “Leveraging mutants for automatic prediction of metamorphic relations using machine learning,” in Pro- ceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation , 2019, pp. 1–6
work page 2019
-
[70]
Rbf-mlmr: A multi-label metamorphic relation prediction approach using rbf neural network,
P. Zhang, X. Zhou, P. Pelliccione, and H. Leung, “Rbf-mlmr: A multi-label metamorphic relation prediction approach using rbf neural network,” IEEE access, vol. 5, pp. 21 791–21 805, 2017
work page 2017
-
[71]
Gen- morph: Automatically generating metamorphic relations via genetic programming,
J. Ayerdi, V . Terragni, G. Jahangirova, A. Arrieta, and P. Tonella, “Gen- morph: Automatically generating metamorphic relations via genetic programming,” IEEE Transactions on Software Engineering , 2024
work page 2024
-
[72]
Ex- ploratory test oracle using multi-layer perceptron neural network,
W. Makondo, R. Nallanthighal, I. Mapanga, and P. Kadebu, “Ex- ploratory test oracle using multi-layer perceptron neural network,” in 2016 International Conference on Advances in Computing, Communi- cations and Informatics (ICACCI) . IEEE, 2016, pp. 1166–1171
work page 2016
-
[73]
An automated oracle approach to test decision-making structures,
S. R. Shahamiri, W. M. N. W. Kadir, and S. bin Ibrahim, “An automated oracle approach to test decision-making structures,” in 2010 3rd International Conference on Computer Science and Information Technology, vol. 5. IEEE, 2010, pp. 30–34
work page 2010
-
[74]
A neural net based approach to test oracle,
K. Aggarwal, Y . Singh, A. Kaur, and O. Sangwan, “A neural net based approach to test oracle,” ACM SIGSOFT Software Engineering Notes , vol. 29, no. 3, pp. 1–6, 2004
work page 2004
-
[75]
Artificial neural network for automatic test oracles generation,
H. Jin, Y . Wang, N.-W. Chen, Z.-J. Gou, and S. Wang, “Artificial neural network for automatic test oracles generation,” in 2008 International Conference on Computer Science and Software Engineering , vol. 2. IEEE, 2008, pp. 727–730
work page 2008
-
[76]
Performing software test oracle based on deep neural network with fuzzy inference system,
A. K. Monsefi, B. Zakeri, S. Samsam, and M. Khashehchi, “Performing software test oracle based on deep neural network with fuzzy inference system,” in High-Performance Computing and Big Data Analysis: Second International Congress, TopHPC 2019, Tehran, Iran, April 23– 25, 2019, Revised Selected Papers 2 . Springer, 2019, pp. 406–417
work page 2019
-
[77]
Radial basis function neural network based approach to test oracle,
O. P. Sangwan, P. K. Bhatia, and Y . Singh, “Radial basis function neural network based approach to test oracle,”ACM SIGSOFT Software Engineering Notes, vol. 36, no. 5, pp. 1–5, 2011
work page 2011
-
[78]
Automated test oracle based on neural networks,
M. Ye, B. Feng, L. Zhu, and Y . Lin, “Automated test oracle based on neural networks,” in 2006 5th IEEE International Conference on Cognitive Informatics, vol. 1. IEEE, 2006, pp. 517–522
work page 2006
-
[79]
Automatic test oracle based on probabilistic neural networks,
R. Zhang, Y .-w. Wang, and M.-z. Zhang, “Automatic test oracle based on probabilistic neural networks,” inRecent Developments in Intelligent Computing, Communication and Devices: Proceedings of ICCD 2017 . Springer, 2019, pp. 437–445
work page 2017
-
[80]
J. Ayerdi, V . Terragni, A. Arrieta, P. Tonella, G. Sagardui, and M. Ar- ratibel, “Generating metamorphic relations for cyber-physical systems with genetic programming: an industrial case study,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Con- ference and Symposium on the Foundations of Software Engineering , 2021, pp. 1264–1274
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.