Systematic API Testing Through Model Checking and Executable Contracts

Ana Ribeiro; Carla Ferreira; Margarida Mamede

arxiv: 2604.08633 · v1 · submitted 2026-04-09 · 💻 cs.SE

Systematic API Testing Through Model Checking and Executable Contracts

Ana Ribeiro , Margarida Mamede , Carla Ferreira This is my paper

Pith reviewed 2026-05-10 17:04 UTC · model grok-4.3

classification 💻 cs.SE

keywords API testingmodel checkingTLA+state coverageexecutable contractstest sequence generationsoftware verificationblack-box testing

0 comments

The pith

Model checking of formal API models generates test sequences with complete state coverage and detects interaction faults.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve automated API testing by bridging the gap between interface specs and actual behavior. Standard methods struggle with generating effective stateful sequences and verifying outcomes beyond basic responses. IcePick addresses this by creating a TLA+ model of the API, using a model checker to map out all reachable states, and extracting test sequences via a guided traversal that ensures full coverage. It further adds executable contracts to check semantic properties automatically. A sympathetic reader would care because this promises more reliable testing for complex, state-dependent systems where bugs often hide in sequences of calls.

Core claim

By modeling API state evolution in TLA+ and exhaustively exploring the reachable states with the TLC model checker using coverage-guided breadth-first traversal, IcePick generates test sequences that provably cover the entire behavioural model. The Glacier language then enriches the specifications with first-order logic contracts that are executable for automated verification of behavior during test runs. Evaluation demonstrates that this achieves complete state coverage on benchmark systems and reveals faults specifically in multi-operation interactions, while also identifying scalability boundaries for the technique.

What carries the argument

Coverage-guided breadth-first traversal of the TLC state-space graph derived from a TLA+ model of the API, combined with executable first-order logic contracts for behavioral verification.

If this is right

Reproducible test suites with provable coverage guarantees for critical API systems.
Faults in interactions between multiple operations are detected that black-box testing often misses.
Scalability limits and applicability requirements for the approach are characterized through evaluation.
Automated verification of semantic properties beyond HTTP status codes becomes possible.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could reduce the reliance on random or heuristic test generation in practice.
Writing the initial TLA+ model by hand represents a potential bottleneck that future work might address with model inference techniques.
Similar principles might apply to testing stateful components in other software domains, such as microservices or database APIs.

Load-bearing premise

An accurate and sufficiently compact TLA+ model of the target API can be written by hand without the traversal missing critical states.

What would settle it

Observing that the generated test sequences fail to visit a state that the TLA+ model indicates is reachable, or that a known fault in operation interactions goes undetected in the benchmark evaluations.

Figures

Figures reproduced from arXiv: 2604.08633 by Ana Ribeiro, Carla Ferreira, Margarida Mamede.

**Figure 1.** Figure 1: ICEPICK framework overview. As noted, while we illustrate the technique using microservice architectures and RESTful APIs, it should also apply to systems with a well-defined programmatic interface. Following an overview of our solution, we introduce key background concepts and survey related work on the automated testing of RESTful APIs and on model-checking-based test generation. We then present the pa… view at source ↗

**Figure 2.** Figure 2: Model size vs. number of states and transitions generated by TLC for [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Automated black-box testing of APIs typically relies on interface specifications that define available operations and data schemas, but offer limited or no behavioural semantics. This semantic gap amplifies the test-oracle problem and limits the generation of effective, stateful call sequences. We introduce IcePick, a framework that achieves systematic state-space coverage for API testing by leveraging model checking. IcePick uses TLA+ to formally model API state evolution, employs the TLC model checker to exhaustively explore reachable states, and generates test sequences that provably cover the behavioural model. To mitigate state-space explosion and improve sequence extraction, we introduce a coverage-guided breadth-first traversal of the TLC state-space graph. To address oracle limitations beyond HTTP status codes, we propose Glacier, a first-order logic contract language that enriches API specifications with executable semantic contracts, enabling automated behavioural verification during test execution. We evaluate IcePick on EvoMaster Benchmark systems, demonstrating that model-checking-guided exploration achieves complete state coverage and reveals faults in multi-operation interactions. We also analyse scalability to characterise practical limits and applicability requirements. Overall, IcePick provides reproducible test suites with strong coverage guarantees for critical API-based systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IcePick uses TLA+ and TLC to generate API test sequences with claimed complete model coverage, but the hand-written models are never checked against the real implementations.

read the letter

The paper's main contribution is a framework called IcePick that models API state changes in TLA+, runs the TLC checker to explore the full state space, and extracts test sequences via a coverage-guided breadth-first traversal. It also adds Glacier, a first-order logic language for writing executable contracts that go beyond HTTP status codes as oracles. They apply this to EvoMaster benchmark systems and report that the generated suites cover all model states while exposing faults that involve multiple operations in sequence. The approach is grounded in existing tools rather than new theory, and the traversal technique is presented as a practical way to avoid full enumeration when pulling out concrete call sequences. That part is concrete and worth looking at if you work on stateful API testing. The evaluation shows the method can be applied to real benchmark code, which is better than pure theory papers in this area. The soft spot is exactly the one the stress-test note flags. The models are written by hand for each target API, and nothing in the description validates that those models capture the actual behavior of the implementation. There is no differential execution, no runtime monitoring of invariants on the live system, and no comparison to a reference. So the complete coverage and the faults found are only with respect to the model; they may not transfer. The abstract also gives no numbers on state-space sizes, no error bars, and no breakdown of how much of the coverage came from the guided traversal versus plain BFS. That makes the scalability claims hard to assess. This is for people already comfortable with TLA+ or model checking who want to generate tests for networked services. A reader looking for new ideas on combining formal exploration with black-box API testing would get something out of it. It has enough of a working framework and benchmark results to deserve a serious referee, even though the model-fidelity gap will need to be addressed in revision. I would send it to review.

Referee Report

2 major / 2 minor

Summary. The paper introduces IcePick, a framework that models API state evolution in TLA+, uses the TLC model checker for exhaustive exploration of reachable states, and extracts test sequences via coverage-guided breadth-first traversal to achieve complete state-space coverage. It also proposes Glacier, a first-order logic contract language for executable semantic contracts to improve oracles beyond HTTP status codes. Evaluation on EvoMaster benchmarks claims that the approach achieves complete state coverage and reveals faults in multi-operation interactions, with additional analysis of scalability and practical limits.

Significance. If the central claims hold, the work offers a valuable contribution to stateful API testing by providing formal coverage guarantees through model checking, which addresses limitations in black-box approaches. The reliance on established tools (TLA+ and TLC) rather than ad-hoc constructions is a strength, as is the introduction of Glacier for richer behavioral verification. This could be significant for critical systems where multi-operation faults are hard to detect.

major comments (2)

[Evaluation] Evaluation section: The claims of complete state coverage and fault revelation are reported only with respect to the hand-written TLA+ models of the EvoMaster benchmarks. No validation (e.g., differential execution against the implementation, runtime invariant monitoring, or comparison to a reference) is described to confirm that the models neither omit reachable states nor introduce spurious transitions, so the results do not necessarily transfer to the concrete APIs.
[Abstract] Abstract and Evaluation: No quantitative metrics, state-space sizes, error bars, or details on model construction effort are supplied to support the 'complete coverage' and 'reveals faults' claims, making it difficult to assess the practical strength of the results or the effectiveness of the coverage-guided traversal.

minor comments (2)

The scalability analysis would benefit from explicit discussion of state-space sizes and when explosion occurs, to better characterise the applicability requirements.
Clarify how Glacier contracts are executed during test runs and whether they require additional runtime instrumentation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and presentation of our evaluation. We address each major comment below.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The claims of complete state coverage and fault revelation are reported only with respect to the hand-written TLA+ models of the EvoMaster benchmarks. No validation (e.g., differential execution against the implementation, runtime invariant monitoring, or comparison to a reference) is described to confirm that the models neither omit reachable states nor introduce spurious transitions, so the results do not necessarily transfer to the concrete APIs.

Authors: We agree that the reported complete state coverage and fault revelation results apply to the hand-written TLA+ models of the EvoMaster benchmarks rather than being directly validated against the concrete implementations. IcePick's primary contribution is the provision of formal coverage guarantees through exhaustive model checking on a behavioral specification; this is a deliberate design choice that distinguishes it from purely black-box approaches. Model fidelity is indeed a prerequisite for transferring results to real APIs, and we acknowledge that the current manuscript does not describe explicit validation steps such as differential execution or runtime invariant monitoring. In the revised manuscript we will add a new subsection in the Evaluation section that details the model construction process (drawing from benchmark documentation and API schemas), discusses potential discrepancies between model and implementation, and outlines practical validation techniques (e.g., runtime monitoring of extracted invariants during test execution on the live system). We will also note that, for safety-critical deployments, independent model validation remains advisable. revision: partial
Referee: [Abstract] Abstract and Evaluation: No quantitative metrics, state-space sizes, error bars, or details on model construction effort are supplied to support the 'complete coverage' and 'reveals faults' claims, making it difficult to assess the practical strength of the results or the effectiveness of the coverage-guided traversal.

Authors: We accept that additional quantitative detail would strengthen the presentation. Although the manuscript states that complete coverage is achieved and faults are revealed, we will expand the Evaluation section (and update the abstract accordingly) to include a summary table reporting, for each benchmark: the size of the reachable state space, the number of states explored by the coverage-guided traversal, model-checking runtime, test-sequence extraction time, approximate effort for constructing each TLA+ model (e.g., specification lines and person-hours), and the concrete multi-operation faults detected. Because TLC performs deterministic exhaustive exploration, statistical error bars are not applicable; we will explicitly state this deterministic character. These additions will allow readers to better judge scalability and the practical benefit of the coverage-guided breadth-first traversal. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external model-checking tools

full rationale

The paper constructs TLA+ models by hand, applies the external TLC model checker for exhaustive exploration, and extracts test sequences that cover states of that model by construction. This is not circular because the coverage claim is explicitly limited to the hand-written behavioural model (an input assumption), the tools (TLA+, TLC) are independent established artifacts, and no equations, fitted parameters, or self-citations reduce the central result to a tautology. The method's soundness for the real API is a separate validation question, not a definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the feasibility of hand-written TLA+ models and the assumption that state-space explosion can be controlled by the proposed traversal without loss of coverage.

axioms (1)

domain assumption API behaviors can be faithfully captured by a TLA+ specification
Required for the model checker to produce meaningful test sequences.

invented entities (2)

IcePick framework no independent evidence
purpose: Model-checking-based API test generation with coverage guarantees
New tool and method introduced by the authors.
Glacier contract language no independent evidence
purpose: Executable first-order logic contracts for API behavioral oracles
New specification language proposed in the paper.

pith-pipeline@v0.9.0 · 5500 in / 1260 out tokens · 71344 ms · 2026-05-10T17:04:18.604720+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Model checking,

E. M. Clarke, “Model checking,” inInternational conference on founda- tions of software technology and theoretical computer science. Springer, 1997, pp. 54–56

work page 1997
[2]

Godefroid and K

P. Godefroid and K. Sen,Combining Model Checking and Testing. Cham: Springer International Publishing, 2018

work page 2018
[3]

Using model checking to generate tests from specifications,

P. E. Ammann, P. E. Black, and W. Majurski, “Using model checking to generate tests from specifications,” inProceedings second international conference on formal engineering methods (Cat. No. 98EX241). IEEE, 1998, pp. 46–54

work page 1998
[4]

Microservices,

M. Fowler and J. Lewis, “Microservices,” http://martinfowler.com/artic les/microservices.html, 2014, accessed in January 2024

work page 2014
[5]

RESTful API automated test case generation with evomas- ter,

A. Arcuri, “RESTful API automated test case generation with evomas- ter,”ACM Trans. Softw. Eng. Methodol., vol. 28, no. 1, 2019

work page 2019
[6]

OpenAPI specification,

Swagger, “OpenAPI specification,” https://swagger.io/specification/, 2021, accessed in January 2024

work page 2021
[7]

Automated black- and white-box testing of RESTful APIs with EvoMaster,

A. Arcuri, “Automated black- and white-box testing of RESTful APIs with EvoMaster,”IEEE Software, vol. 38, no. 3, pp. 72–78, 2021

work page 2021
[8]

RESTTESTGEN: Auto- mated black-box testing of RESTful APIs,

E. Viglianisi, M. Dallago, and M. Ceccato, “RESTTESTGEN: Auto- mated black-box testing of RESTful APIs,” in2020 IEEE 13th Inter- national Conference on Software Testing, Validation and Verification (ICST), 2020, pp. 142–152

work page 2020
[9]

Morest: Model-based RESTful API testing with execution feedback,

Y . Liu, Y . Li, G. Deng, Y . Liu, R. Wan, R. Wu, D. Ji, S. Xu, and M. Bao, “Morest: Model-based RESTful API testing with execution feedback,” inProceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22. Association for Computing Machinery, 2022, p. 1406–1417

work page 2022
[10]

RESTler: Stateful REST API fuzzing,

V . Atlidakis, P. Godefroid, and M. Polishchuk, “RESTler: Stateful REST API fuzzing,” in2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019, pp. 748–758

work page 2019
[11]

Restest: automated black-box testing of RESTful web APIs,

A. Martin-Lopez, S. Segura, and A. Ruiz-Cortés, “Restest: automated black-box testing of RESTful web APIs,” inProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2021. New York, NY , USA: Association for Computing Machinery, 2021, p. 682–685. [Online]. Available: https://doi.org/10.1145/3460319.3469082

work page doi:10.1145/3460319.3469082 2021
[12]

Combinatorial testing of RESTful APIs,

H. Wu, L. Xu, X. Niu, and C. Nie, “Combinatorial testing of RESTful APIs,” inProceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 426–437

work page 2022
[13]

AGORA: Automated generation of test oracles for REST APIs,

J. C. Alonso, S. Segura, and A. Ruiz-Cortés, “AGORA: Automated generation of test oracles for REST APIs,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 1018–1030

work page 2023
[14]

Automatic gener- ation of test cases for REST APIs: A specification-based approach,

H. Ed-douibi, J. L. Cánovas Izquierdo, and J. Cabot, “Automatic gener- ation of test cases for REST APIs: A specification-based approach,” in 2018 IEEE 22nd International Enterprise Distributed Object Computing Conference (EDOC), 2018, pp. 181–190

work page 2018
[15]

Metamorphic testing of RESTful Web APIs,

S. Segura, J. A. Parejo, J. Troya, and A. Ruiz-Cortés, “Metamorphic testing of RESTful Web APIs,” inProceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 882

work page 2018
[16]

The oracle problem in software testing: A survey,

E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,”IEEE Transactions on Software Engineering, vol. 41, no. 5, pp. 507–525, 2015

work page 2015
[17]

The temporal logic of actions,

L. Lamport, “The temporal logic of actions,”ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 16, no. 3, pp. 872–923, 1994

work page 1994
[18]

Addison-Wesley, June 2002

——,Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley, June 2002. [Online]. Available: https://www.microsoft.com/en-us/research/publica tion/specifying-systems-the-tla-language-and-tools-for-hardware-and-s oftware-engineers/

work page 2002
[19]

Model checking TLA+ specifica- tions,

Y . Yu, P. Manolios, and L. Lamport, “Model checking TLA+ specifica- tions,” inAdvanced research working conference on correct hardware design and verification methods. Springer, 1999, pp. 54–66

work page 1999
[20]

TLA+ model checking made symbolic,

M. A. Kuppe, L. Lamport, and D. Ricketts, “TLA+ model checking made symbolic,” inProceedings of the ACM on Programming Lan- guages (OOPSLA), vol. 3, no. OOPSLA, 2019, pp. 123:1–123:30

work page 2019
[21]

Glacier generator & catalogue,

A. C. Ribeiro, “Glacier generator & catalogue,” https://github.com/acm -ribeiro/glacier-generator, 2025

work page 2025
[22]

Emb: A curated corpus of web/enter- prise applications and library support for software testing research,

A. Arcuri, M. Zhang, A. Golmohammadi, A. Belhadi, J. P. Galeotti, B. Marculescu, and S. Seran, “Emb: A curated corpus of web/enter- prise applications and library support for software testing research,” in 2023 IEEE Conference on Software Testing, Verification and Validation (ICST), 2023, pp. 433–442

work page 2023
[23]

Replication package for sys- tematic API testing through model checking and executable contracts,

A. Ribeiro, M. Mamede, and C. Ferreira, “Replication package for sys- tematic API testing through model checking and executable contracts,” https://doi.org/10.5281/zenodo.19404322, Dec. 2025

work page doi:10.5281/zenodo.19404322 2025
[24]

Invariant-driven automated testing,

A. C. Ribeiro, “Invariant-driven automated testing,” M.S. thesis, NOV A School of Science and Technology, February 2021

work page 2021
[25]

Verifying concurrent programs using contracts,

R. J. Dias, C. Ferreira, J. Fiedor, J. M. Lourenço, A. Smrcka, D. G. Sousa, and T. V ojnar, “Verifying concurrent programs using contracts,” in2017 IEEE International Conference on Software Testing, Verification and Validation (ICST), 2017, pp. 196–206

work page 2017
[26]

Towards contract-based testing of web services,

R. Heckel and M. Lohmann, “Towards contract-based testing of web services,”Electronic Notes in Theoretical Computer Science, vol. 116, pp. 145–156, 2005

work page 2005
[27]

Contract-based testing for web services,

G. Dai, X. Bai, Y . Wang, and F. Dai, “Contract-based testing for web services,” in31st Annual International Computer Software and Applications Conference (COMPSAC 2007), vol. 1, 2007, pp. 517–526

work page 2007
[28]

B. K. Aichernig,Contract-Based Testing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 34–48. [Online]. Available: https://doi.org/10.1007/978-3-540-40007-3_3

work page doi:10.1007/978-3-540-40007-3_3 2003
[29]

Model checking guided testing for distributed systems,

D. Wang, W. Dou, Y . Gao, C. Wu, J. Wei, and T. Huang, “Model checking guided testing for distributed systems,” inProceedings of the Eighteenth European Conference on Computer Systems, ser. EuroSys ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 127–143. [Online]. Available: https://doi.org/10.1145/355232 6.3587442

work page doi:10.1145/355232 2023
[30]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to algorithms. MIT press, 2022

work page 2022
[31]

Applying ’design by contract’,

B. Meyer, “Applying ’design by contract’,”Computer, vol. 25, no. 10, pp. 40–51, 1992

work page 1992
[32]

Petstore - openapi 3.0,

Swagger, “Petstore - openapi 3.0,” https://petstore3.swagger.io/, 2018, accessed in September 2025

work page 2018
[33]

WFC/WFD: Web fuzzing commons, dataset and guidelines to support experimentation in REST API fuzzing,

O. Sahin, M. Zhang, and A. Arcuri, “WFC/WFD: Web fuzzing commons, dataset and guidelines to support experimentation in REST API fuzzing,” 2025. [Online]. Available: https://arxiv.org/abs/2509.016 12

work page 2025
[34]

HTTP semantics (POST),

J. R. Roy T. Fielding, Mark Nottingham, “HTTP semantics (POST),” https://httpwg.org/specs/rfc9110.html#POST, RFC 9110, 6 2022, accessed in December 2025

work page 2022
[35]

Efficient bounded exhaustive input generation from program apis,

M. Politano, V . Bengolea, F. Molina, N. Aguirre, M. F. Frias, and P. Ponzio, “Efficient bounded exhaustive input generation from program apis,” inInternational Conference on Fundamental Approaches to Soft- ware Engineering. Springer, 2023, pp. 111–132

work page 2023
[36]

How Amazon Web Services uses formal methods,

C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and M. Deardeuff, “How Amazon Web Services uses formal methods,” Communications of the ACM, vol. 58, no. 4, pp. 66–73, 2015

work page 2015
[37]

Formal verification of a dis- tributed dynamic reconfiguration protocol,

W. Schultz, I. Dardik, and S. Tripakis, “Formal verification of a dis- tributed dynamic reconfiguration protocol,” inInternational Conference on Principles of Distributed Systems (OPODIS), 2019

work page 2019
[38]

JepREST: Functional tests for distributed REST applications,

S. Simoes, A. Ribeiro, C. Ferreira, and N. Preguica, “JepREST: Functional tests for distributed REST applications,” 2023. [Online]. Available: https://arxiv.org/abs/2303.14104

work page arXiv 2023
[39]

Model- guided fuzzing of distributed systems,

E. B. Gulcan, B. K. Ozkan, R. Majumdar, and S. Nagendra, “Model- guided fuzzing of distributed systems,”Proc. ACM Program. Lang., vol. 9, no. OOPSLA2, 2025

work page 2025
[40]

Adaptive REST API testing with rein- forcement learning,

M. Kim, S. Sinha, and A. Orso, “Adaptive REST API testing with rein- forcement learning,” in2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 446–458

work page 2023

[1] [1]

Model checking,

E. M. Clarke, “Model checking,” inInternational conference on founda- tions of software technology and theoretical computer science. Springer, 1997, pp. 54–56

work page 1997

[2] [2]

Godefroid and K

P. Godefroid and K. Sen,Combining Model Checking and Testing. Cham: Springer International Publishing, 2018

work page 2018

[3] [3]

Using model checking to generate tests from specifications,

P. E. Ammann, P. E. Black, and W. Majurski, “Using model checking to generate tests from specifications,” inProceedings second international conference on formal engineering methods (Cat. No. 98EX241). IEEE, 1998, pp. 46–54

work page 1998

[4] [4]

Microservices,

M. Fowler and J. Lewis, “Microservices,” http://martinfowler.com/artic les/microservices.html, 2014, accessed in January 2024

work page 2014

[5] [5]

RESTful API automated test case generation with evomas- ter,

A. Arcuri, “RESTful API automated test case generation with evomas- ter,”ACM Trans. Softw. Eng. Methodol., vol. 28, no. 1, 2019

work page 2019

[6] [6]

OpenAPI specification,

Swagger, “OpenAPI specification,” https://swagger.io/specification/, 2021, accessed in January 2024

work page 2021

[7] [7]

Automated black- and white-box testing of RESTful APIs with EvoMaster,

A. Arcuri, “Automated black- and white-box testing of RESTful APIs with EvoMaster,”IEEE Software, vol. 38, no. 3, pp. 72–78, 2021

work page 2021

[8] [8]

RESTTESTGEN: Auto- mated black-box testing of RESTful APIs,

E. Viglianisi, M. Dallago, and M. Ceccato, “RESTTESTGEN: Auto- mated black-box testing of RESTful APIs,” in2020 IEEE 13th Inter- national Conference on Software Testing, Validation and Verification (ICST), 2020, pp. 142–152

work page 2020

[9] [9]

Morest: Model-based RESTful API testing with execution feedback,

Y . Liu, Y . Li, G. Deng, Y . Liu, R. Wan, R. Wu, D. Ji, S. Xu, and M. Bao, “Morest: Model-based RESTful API testing with execution feedback,” inProceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22. Association for Computing Machinery, 2022, p. 1406–1417

work page 2022

[10] [10]

RESTler: Stateful REST API fuzzing,

V . Atlidakis, P. Godefroid, and M. Polishchuk, “RESTler: Stateful REST API fuzzing,” in2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019, pp. 748–758

work page 2019

[11] [11]

Restest: automated black-box testing of RESTful web APIs,

A. Martin-Lopez, S. Segura, and A. Ruiz-Cortés, “Restest: automated black-box testing of RESTful web APIs,” inProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2021. New York, NY , USA: Association for Computing Machinery, 2021, p. 682–685. [Online]. Available: https://doi.org/10.1145/3460319.3469082

work page doi:10.1145/3460319.3469082 2021

[12] [12]

Combinatorial testing of RESTful APIs,

H. Wu, L. Xu, X. Niu, and C. Nie, “Combinatorial testing of RESTful APIs,” inProceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 426–437

work page 2022

[13] [13]

AGORA: Automated generation of test oracles for REST APIs,

J. C. Alonso, S. Segura, and A. Ruiz-Cortés, “AGORA: Automated generation of test oracles for REST APIs,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 1018–1030

work page 2023

[14] [14]

Automatic gener- ation of test cases for REST APIs: A specification-based approach,

H. Ed-douibi, J. L. Cánovas Izquierdo, and J. Cabot, “Automatic gener- ation of test cases for REST APIs: A specification-based approach,” in 2018 IEEE 22nd International Enterprise Distributed Object Computing Conference (EDOC), 2018, pp. 181–190

work page 2018

[15] [15]

Metamorphic testing of RESTful Web APIs,

S. Segura, J. A. Parejo, J. Troya, and A. Ruiz-Cortés, “Metamorphic testing of RESTful Web APIs,” inProceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 882

work page 2018

[16] [16]

The oracle problem in software testing: A survey,

E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,”IEEE Transactions on Software Engineering, vol. 41, no. 5, pp. 507–525, 2015

work page 2015

[17] [17]

The temporal logic of actions,

L. Lamport, “The temporal logic of actions,”ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 16, no. 3, pp. 872–923, 1994

work page 1994

[18] [18]

Addison-Wesley, June 2002

——,Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley, June 2002. [Online]. Available: https://www.microsoft.com/en-us/research/publica tion/specifying-systems-the-tla-language-and-tools-for-hardware-and-s oftware-engineers/

work page 2002

[19] [19]

Model checking TLA+ specifica- tions,

Y . Yu, P. Manolios, and L. Lamport, “Model checking TLA+ specifica- tions,” inAdvanced research working conference on correct hardware design and verification methods. Springer, 1999, pp. 54–66

work page 1999

[20] [20]

TLA+ model checking made symbolic,

M. A. Kuppe, L. Lamport, and D. Ricketts, “TLA+ model checking made symbolic,” inProceedings of the ACM on Programming Lan- guages (OOPSLA), vol. 3, no. OOPSLA, 2019, pp. 123:1–123:30

work page 2019

[21] [21]

Glacier generator & catalogue,

A. C. Ribeiro, “Glacier generator & catalogue,” https://github.com/acm -ribeiro/glacier-generator, 2025

work page 2025

[22] [22]

Emb: A curated corpus of web/enter- prise applications and library support for software testing research,

A. Arcuri, M. Zhang, A. Golmohammadi, A. Belhadi, J. P. Galeotti, B. Marculescu, and S. Seran, “Emb: A curated corpus of web/enter- prise applications and library support for software testing research,” in 2023 IEEE Conference on Software Testing, Verification and Validation (ICST), 2023, pp. 433–442

work page 2023

[23] [23]

Replication package for sys- tematic API testing through model checking and executable contracts,

A. Ribeiro, M. Mamede, and C. Ferreira, “Replication package for sys- tematic API testing through model checking and executable contracts,” https://doi.org/10.5281/zenodo.19404322, Dec. 2025

work page doi:10.5281/zenodo.19404322 2025

[24] [24]

Invariant-driven automated testing,

A. C. Ribeiro, “Invariant-driven automated testing,” M.S. thesis, NOV A School of Science and Technology, February 2021

work page 2021

[25] [25]

Verifying concurrent programs using contracts,

R. J. Dias, C. Ferreira, J. Fiedor, J. M. Lourenço, A. Smrcka, D. G. Sousa, and T. V ojnar, “Verifying concurrent programs using contracts,” in2017 IEEE International Conference on Software Testing, Verification and Validation (ICST), 2017, pp. 196–206

work page 2017

[26] [26]

Towards contract-based testing of web services,

R. Heckel and M. Lohmann, “Towards contract-based testing of web services,”Electronic Notes in Theoretical Computer Science, vol. 116, pp. 145–156, 2005

work page 2005

[27] [27]

Contract-based testing for web services,

G. Dai, X. Bai, Y . Wang, and F. Dai, “Contract-based testing for web services,” in31st Annual International Computer Software and Applications Conference (COMPSAC 2007), vol. 1, 2007, pp. 517–526

work page 2007

[28] [28]

B. K. Aichernig,Contract-Based Testing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 34–48. [Online]. Available: https://doi.org/10.1007/978-3-540-40007-3_3

work page doi:10.1007/978-3-540-40007-3_3 2003

[29] [29]

Model checking guided testing for distributed systems,

D. Wang, W. Dou, Y . Gao, C. Wu, J. Wei, and T. Huang, “Model checking guided testing for distributed systems,” inProceedings of the Eighteenth European Conference on Computer Systems, ser. EuroSys ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 127–143. [Online]. Available: https://doi.org/10.1145/355232 6.3587442

work page doi:10.1145/355232 2023

[30] [30]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to algorithms. MIT press, 2022

work page 2022

[31] [31]

Applying ’design by contract’,

B. Meyer, “Applying ’design by contract’,”Computer, vol. 25, no. 10, pp. 40–51, 1992

work page 1992

[32] [32]

Petstore - openapi 3.0,

Swagger, “Petstore - openapi 3.0,” https://petstore3.swagger.io/, 2018, accessed in September 2025

work page 2018

[33] [33]

WFC/WFD: Web fuzzing commons, dataset and guidelines to support experimentation in REST API fuzzing,

O. Sahin, M. Zhang, and A. Arcuri, “WFC/WFD: Web fuzzing commons, dataset and guidelines to support experimentation in REST API fuzzing,” 2025. [Online]. Available: https://arxiv.org/abs/2509.016 12

work page 2025

[34] [34]

HTTP semantics (POST),

J. R. Roy T. Fielding, Mark Nottingham, “HTTP semantics (POST),” https://httpwg.org/specs/rfc9110.html#POST, RFC 9110, 6 2022, accessed in December 2025

work page 2022

[35] [35]

Efficient bounded exhaustive input generation from program apis,

M. Politano, V . Bengolea, F. Molina, N. Aguirre, M. F. Frias, and P. Ponzio, “Efficient bounded exhaustive input generation from program apis,” inInternational Conference on Fundamental Approaches to Soft- ware Engineering. Springer, 2023, pp. 111–132

work page 2023

[36] [36]

How Amazon Web Services uses formal methods,

C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and M. Deardeuff, “How Amazon Web Services uses formal methods,” Communications of the ACM, vol. 58, no. 4, pp. 66–73, 2015

work page 2015

[37] [37]

Formal verification of a dis- tributed dynamic reconfiguration protocol,

W. Schultz, I. Dardik, and S. Tripakis, “Formal verification of a dis- tributed dynamic reconfiguration protocol,” inInternational Conference on Principles of Distributed Systems (OPODIS), 2019

work page 2019

[38] [38]

JepREST: Functional tests for distributed REST applications,

S. Simoes, A. Ribeiro, C. Ferreira, and N. Preguica, “JepREST: Functional tests for distributed REST applications,” 2023. [Online]. Available: https://arxiv.org/abs/2303.14104

work page arXiv 2023

[39] [39]

Model- guided fuzzing of distributed systems,

E. B. Gulcan, B. K. Ozkan, R. Majumdar, and S. Nagendra, “Model- guided fuzzing of distributed systems,”Proc. ACM Program. Lang., vol. 9, no. OOPSLA2, 2025

work page 2025

[40] [40]

Adaptive REST API testing with rein- forcement learning,

M. Kim, S. Sinha, and A. Orso, “Adaptive REST API testing with rein- forcement learning,” in2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 446–458

work page 2023