L*-Based Learning of Markov Decision Processes (Extended Version)

Bernhard K. Aichernig; Giovanni Bacci; Kim G. Larsen; Maria Eichlseder; Martin Tappler

arxiv: 1906.12239 · v1 · pith:ZS37PKNQnew · submitted 2019-06-28 · 💻 cs.LG · stat.ML

L*-Based Learning of Markov Decision Processes (Extended Version)

Martin Tappler , Bernhard K. Aichernig , Giovanni Bacci , Maria Eichlseder , Kim G. Larsen This is my paper

Pith reviewed 2026-05-25 13:28 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords automata learningMarkov decision processesactive learningL* algorithmmodel inferencesamplingdeterministic MDPs

0 comments

The pith

An L*-based sampling algorithm learns the complete structure of deterministic Markov decision processes from test traces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an active learning technique for deterministic Markov decision processes that extends Angluin's L* algorithm. In an ideal setting it assumes perfect information, but then relaxes this to a practical sampling method that gathers data by testing the system and observing traces. The method infers both the states and the transition structure without predefined states. Experiments show it achieves higher accuracy than passive learning approaches when given the same volume of test data. This matters because it automates the creation of accurate models for systems whose internal states are not directly observable.

Core claim

We present a novel sampling-based L* algorithm for learning deterministic Markov decision processes that collects information by sampling system traces via testing. It learns the complete model structure including the states and achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data.

What carries the argument

The sampling-based L* algorithm for MDPs, which builds observation tables from traces to identify states as distinguishable future behaviors and infers transitions.

If this is right

Learned MDP models can support automated verification and testing of the original system.
The algorithm enables model learning in settings where the number of states is unknown in advance.
With limited test data, more accurate models become available for decision processes.
Active sampling reduces the data needed compared to passive collection of logs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such methods might be combined with reinforcement learning to learn and optimize policies simultaneously.
Extending the approach to nondeterministic or continuous-state MDPs could broaden its applicability.
Comparing the learned models against ground truth in simulation environments would validate the accuracy gains.

Load-bearing premise

The system under learning is a deterministic Markov decision process whose behavior is fully captured by finite traces sampled through testing.

What would settle it

Apply the algorithm to a small known deterministic MDP with a fixed number of samples and check whether the output model matches the original states, transitions, and probabilities exactly.

Figures

Figures reproduced from arXiv: 1906.12239 by Bernhard K. Aichernig, Giovanni Bacci, Kim G. Larsen, Maria Eichlseder, Martin Tappler.

**Figure 1.** Figure 1: MDP model of a faulty coffee machine We learn deterministic labelled MDPs as learned by passive learning techniques like IoAlergia [30]. Such MDPs define at most one successor state for each source state and input-output pair [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: The first gridworld Models similar to our gridworlds have, e.g., been considered in the context of learning control strategies [20]. Basically, a robot moves around in a world of tiles of different terrains. It may make errors in movement, e.g. move south west instead of south with an error probability depending on the target terrain. Our aim is to learn an environment model, i.e. a map [PITH_FULL_IMAGE:… view at source ↗

**Figure 3.** Figure 3: The second gridworld [PITH_FULL_IMAGE:figures/full_fig_p033_3.png] view at source ↗

read the original abstract

Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient. An influential active learning technique is Angluin's L* algorithm for regular languages which inspired several generalisations from DFAs to other automata-based modelling formalisms. In this work, we study L*-based learning of deterministic Markov decision processes, first assuming an ideal setting with perfect information. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling system traces via testing. Experiments with the implementation of our sampling-based algorithm suggest that it achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data. Unlike existing learning algorithms with predefined states, our algorithm learns the complete model structure including the states.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Adapts L* to deterministic MDPs via sampling that discovers states without pre-specifying their number, with the main open question being how solid the accuracy comparison actually is.

read the letter

This paper adapts Angluin's L* to deterministic MDPs. It first handles an ideal case with perfect information, then gives a sampling version that queries the system through test traces and learns the full structure, states included, without needing the state count upfront. That on-the-fly state discovery is the concrete step beyond prior L* extensions. The progression from ideal to practical sampling is laid out clearly and the connection to model-based verification makes sense for the target area. The work stays grounded in the automata learning literature it cites and does not introduce obvious circular claims or fitted parameters. The experimental summary in the abstract states better accuracy than passive baselines with equal test data, but supplies no numbers, metrics, or setup details, so the strength of that result is hard to judge from the given text. If the full paper includes reproducible tables and a clear accuracy definition, the claim becomes easier to assess; otherwise it remains thin. The assumption that deterministic MDP behavior can be captured by finite traces is stated explicitly and relaxed in the algorithm, so it does not create an internal contradiction. This is for researchers in automata learning or formal testing of probabilistic systems who already know L* and want to see the MDP extension. A reader focused on active learning for verification would get the algorithmic contribution. It deserves peer review because the core adaptation is non-trivial and the application is relevant, even if the experiments will need closer checking by referees.

Referee Report

1 major / 0 minor

Summary. The paper extends Angluin's L* algorithm to deterministic Markov decision processes. It first gives a learning procedure under an ideal perfect-information setting, then relaxes the assumption to present a sampling-based algorithm that collects finite traces via testing. The algorithm is claimed to learn the complete MDP structure (including states) rather than assuming a predefined state space, and experiments with its implementation are said to indicate higher accuracy than state-of-the-art passive learners when given the same volume of test data.

Significance. If the empirical superiority holds under proper quantitative scrutiny, the work would be a useful contribution to automata learning by generalizing L* to MDPs and by automatically inferring the state space. This addresses a practical gap in model inference for systems whose state structure is not known in advance and could support downstream tasks such as verification and testing.

major comments (1)

[Abstract] Abstract: the central claim that the sampling-based algorithm 'achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data' is load-bearing for the paper's contribution, yet the abstract supplies no quantitative results, error bars, number of runs, or definition of the accuracy metric. Without these details the empirical outcome cannot be assessed from the summary alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the sampling-based algorithm 'achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data' is load-bearing for the paper's contribution, yet the abstract supplies no quantitative results, error bars, number of runs, or definition of the accuracy metric. Without these details the empirical outcome cannot be assessed from the summary alone.

Authors: The abstract is a high-level summary of the contribution and is not intended to contain the full quantitative details of the evaluation. The accuracy metric is defined in Section 4, and the experimental results—including direct comparisons to passive learners on the same data volume, accuracy values, and supporting figures—are reported in full in Section 5. This follows the standard structure in which the abstract states the outcome at a qualitative level while the body supplies the supporting data and definitions. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central contribution is an algorithmic description of an L*-style active learner for deterministic MDPs together with an empirical claim that the implemented sampler outperforms passive baselines on accuracy for equal test-data volume while also recovering state structure. No equation or theorem in the provided text reduces a reported prediction to a parameter fitted inside the same paper, nor does any load-bearing premise rest on a self-citation chain whose validity is presupposed by the present work. The deterministic-MDP assumption is stated explicitly and then relaxed in the sampling construction, leaving the experimental result independent of internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract supplies insufficient detail to enumerate free parameters or invented entities; the central claim rests on the domain assumption that the target system is a deterministic MDP whose transitions can be resolved by trace sampling.

axioms (1)

domain assumption The system under learning is a deterministic Markov decision process.
Stated when relaxing the ideal setting to sampling.

pith-pipeline@v0.9.0 · 5701 in / 1210 out tokens · 22247 ms · 2026-05-25T13:28:44.397129+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

In: Bennaceur, A., H¨ ahnle, R., Meinke, K

Aichernig, B.K., Mostowski, W., Mousavi, M.R., Tappler, M., Taromirad, M.: Model learning and model-based testing. In: Bennaceur, A., H¨ ahnle, R., Meinke, K. (eds.) Machine Learning for Dynamic Software Analysis: P otentials and Limits - International Dagstuhl Seminar 16172, Dagstuhl Castle, G ermany, April 24-27, 2016, Revised Papers. Lecture Notes in C...

work page doi:10.1007/978-3-319-96562-8_3 2016
[2]

Journal of Automated Reasoning (Oct 2 018), https://doi.org/10.1007/s10817-018-9486-0

Aichernig, B.K., Tappler, M.: Eﬃcient active automata le arning via mutation testing. Journal of Automated Reasoning (Oct 2 018), https://doi.org/10.1007/s10817-018-9486-0

work page doi:10.1007/s10817-018-9486-0
[3]

Formal Methods in System Design (Ma y 2019), https://doi.org/10.1007/s10703-019-00333-0

Aichernig, B.K., Tappler, M.: Probabilistic black-box r eachability check- ing (extended version). Formal Methods in System Design (Ma y 2019), https://doi.org/10.1007/s10703-019-00333-0

work page doi:10.1007/s10703-019-00333-0 2019
[4]

Angluin, D.: Learning regular sets from queries and count erexamples. Inf. Comput. 75(2), 87–106 (1987), https://doi.org/10.1016/0890-5401(87)90052-6

work page doi:10.1016/0890-5401(87)90052-6 1987
[5]

Aspnes, J., Herlihy, M.: Fast randomized consensus using shared memory. J. Algo- rithms 11(3), 441–461 (1990), https://doi.org/10.1016/0196-6774(90)90021-6

work page doi:10.1016/0196-6774(90)90021-6 1990
[6]

http://people.cs.aau.dk/~giovbacci/tools/bisimdist.zip, accessed: June 28, 2019

Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: MDPDist l ibrary. http://people.cs.aau.dk/~giovbacci/tools/bisimdist.zip, accessed: June 28, 2019

work page 2019
[7]

In: Chatterjee, K., Sgall, J

Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: Computin g behavioral distances, compositionally. In: Chatterjee, K., Sgall, J. (eds.) Mathemati- cal Foundations of Computer Science 2013 - 38th Internation al Symposium, MFCS 2013, Klosterneuburg, Austria, August 26-30, 2013. Pr oceedings. Lec- ture Notes in Computer Science, vol. 8087, pp. 74–85. Spring ...

work page doi:10.1007/978-3-642-40313-2_9 2013
[8]

In: Joshi, K.R., Siegle, M., Stoelinga, M., D’Argenio, P.R

Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: The Bisim Dist library: Eﬃcient computation of bisimilarity distances for Markovian model s. In: Joshi, K.R., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) Quantitative Eva luation of Systems - 10th International Conference, QEST 2013, Buenos Aires, Ar gentina, August 27- 30, 2013. Proceedings. Lecture ...

work page doi:10.1007/978-3-642-40196-1_23 2013
[9]

MIT P ress (2008)

Baier, C., Katoen, J.: Principles of model checking. MIT P ress (2008)

work page 2008
[10]

Bergadano, F., Varricchio, S.: Learning behaviors of au tomata from multi- plicity and equivalence queries. SIAM J. Comput. 25(6), 1268–1280 (1996), https://doi.org/10.1137/S009753979326091X

work page doi:10.1137/s009753979326091x 1996
[11]

In: Carrasco, R.C., Oncina, J

Carrasco, R.C., Oncina, J.: Learning stochastic regula r grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) Grammatical Infer- ence and Applications, Second International Colloquium, I CGI-94, Alicante, Spain, September 21-23, 1994, Proceedings. Lecture Notes in Compu ter Science, vol. 862, pp. 139–152. Springer (1994), htt...

work page doi:10.1007/3-540-58473-0_144 1994
[12]

ITA 33(1), 1–20 (1999), https://doi.org/10.1051/ita:1999102

Carrasco, R.C., Oncina, J.: Learning deterministic reg ular grammars from stochastic samples in polynomial time. ITA 33(1), 1–20 (1999), https://doi.org/10.1051/ita:1999102

work page doi:10.1051/ita:1999102 1999
[13]

Formal Asp

Cassel, S., Howar, F., Jonsson, B., Steﬀen, B.: Active le arning for ex- tended ﬁnite state machines. Formal Asp. Comput. 28(2), 233–263 (2016), https://doi.org/10.1007/s00165-016-0355-5

work page doi:10.1007/s00165-016-0355-5 2016
[14]

Castro, J., Gavald` a, R.: Learning Probability Distrib utions Generated by Finite- State Machines, pp. 113–142. Springer Berlin Heidelberg, B erlin, Heidelberg (2016), https://doi.org/10.1007/978-3-662-48395-4_5

work page doi:10.1007/978-3-662-48395-4_5 2016
[15]

In: 11th International Conference on Machine Learning and Appli- cations, ICMLA, Boca Raton, FL, USA, December 12-15, 2012

Chen, Y., Nielsen, T.D.: Active learning of Markov decis ion processes for system veriﬁcation. In: 11th International Conference on Machine Learning and Appli- cations, ICMLA, Boca Raton, FL, USA, December 12-15, 2012. V olume 2. pp. 289–294. IEEE (2012), https://doi.org/10.1109/ICMLA.2012.158

work page doi:10.1109/icmla.2012.158 2012
[16]

IEEE Trans

Chow, T.S.: Testing software design modeled by ﬁnite-st ate machines. IEEE Trans. Softw. Eng. 4(3), 178–187 (May 1978) L∗-Based Learning of Markov Decision Processes (Extended Ver sion) 41

work page 1978
[17]

In: Bultan, T., Hsi- ung, P

Feng, L., Han, T., Kwiatkowska, M.Z., Parker, D.: Learni ng-based composi- tional veriﬁcation for synchronous probabilistic systems . In: Bultan, T., Hsi- ung, P. (eds.) Automated Technology for Veriﬁcation and Ana lysis, 9th Interna- tional Symposium, ATV A 2011, Taipei, Taiwan, October 11-14 , 2011. Proceed- ings. Lecture Notes in Computer Science, vol....

work page doi:10.1007/978-3-642-24372-1_40 2011
[18]

: Combining model learn- ing and model checking to analyze TCP implementations

Fiter˘ au-Bro¸ stean, P., Janssen, R., Vaandrager, F.W. : Combining model learn- ing and model checking to analyze TCP implementations. In: C haudhuri, S., Farzan, A. (eds.) Computer Aided Veriﬁcation - 28th Interna tional Confer- ence, CA V 2016, Toronto, ON, Canada, July 17-23, 2016, Proce edings, Part II. Lecture Notes in Computer Science, vol. 9780, p...

work page doi:10.1007/978-3-319-41540-6_25 2016
[19]

In: Bernardo, M ., Issarny, V

Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D.: A utomated veriﬁca- tion techniques for probabilistic systems. In: Bernardo, M ., Issarny, V. (eds.) Formal Methods for Eternal Networked Software Systems - 11t h International School on Formal Methods for the Design of Computer, Communi cation and Software Systems, SFM 2011, Bertinoro, Italy, June 13...

work page doi:10.1007/978-3-642-21455-4_3 2011
[20]

In: Fox, D., Kavraki, L.E

Fu, J., Topcu, U.: Probably approximately correct MDP le arning and control with temporal logic constraints. In: Fox, D., Kavraki, L.E. , Kurniawati, H. (eds.) Robotics: Science and Systems X, University of California, Berkeley, USA, July 12-16, 2014 (2014), http://www.roboticsproceedings.org/rss10/p39.html

work page 2014
[21]

Cambridge University Press, New York, NY, USA (2010)

de la Higuera, C.: Grammatical Inference: Learning Auto mata and Grammars. Cambridge University Press, New York, NY, USA (2010)

work page 2010
[22]

Journal of the American Statistical Association 58(301), 13–30 (1963), http://www.jstor.org/stable/2282952

Hoeﬀding, W.: Probability inequalities for sums of boun ded random vari- ables. Journal of the American Statistical Association 58(301), 13–30 (1963), http://www.jstor.org/stable/2282952

work page arXiv 1963
[23]

In: Bennaceur, A., H¨ ahn le, R., Meinke, K

Howar, F., Steﬀen, B.: Active automata learning in pract ice - an annotated bib- liography of the years 2011 to 2016. In: Bennaceur, A., H¨ ahn le, R., Meinke, K. (eds.) Machine Learning for Dynamic Software Analysis: Pot entials and Limits - International Dagstuhl Seminar 16172, Dagstuhl Castle, G ermany, April 24-27, 2016, Revised Papers. Lecture Notes ...

work page doi:10.1007/978-3-319-96562-8_5 2011
[24]

In: Hunt Jr., W.A., Somenzi, F

Hungar, H., Niese, O., Steﬀen, B.: Domain-speciﬁc optim ization in automata learn- ing. In: Hunt Jr., W.A., Somenzi, F. (eds.) Computer Aided Ve riﬁcation, 15th International Conference, CA V 2003, Boulder, CO, USA, July 8-12, 2003, Pro- ceedings. Lecture Notes in Computer Science, vol. 2725, pp. 315–327. Springer (2003), https://doi.org/10.1007/978-3-540...

work page doi:10.1007/978-3-540-45069-6_31 2003
[25]

In: Bonakdarpour, B., S molka, S.A

Isberner, M., Howar, F., Steﬀen, B.: The TTT algorithm: A redundancy-free ap- proach to active automata learning. In: Bonakdarpour, B., S molka, S.A. (eds.) Run- time Veriﬁcation - 5th International Conference, R V 2014, T oronto, ON, Canada, September 22-25, 2014. Proceedings. Lecture Notes in Compu ter Science, vol. 8734, pp. 307–322. Springer (2014), h...

work page doi:10.1007/978-3-319-11164-3_26 2014
[26]

In: Clark, A., Kanazawa, M., Yoshinaka, R

Khalili, A., Tacchella, A.: Learning nondeterministic Mealy machines. In: Clark, A., Kanazawa, M., Yoshinaka, R. (eds.) Proceedings of the 12th I nternational Confer- ence on Grammatical Inference, ICGI 2014, Kyoto, Japan, Sep tember 17-19, 2014. JMLR Workshop and Conference Proceedings, vol. 34, pp. 109– 123. JMLR.org (2014), http://jmlr.org/proceedings/...

work page 2014
[27]

SIGMETRICS Performance Evaluation Review 36(3), 17–22 (2008), https://doi.org/10.1145/1481506.1481511

Kwiatkowska, M.Z., Norman, G., Parker, D.: Analysis of a gossip protocol in PRISM. SIGMETRICS Performance Evaluation Review 36(3), 17–22 (2008), https://doi.org/10.1145/1481506.1481511

work page doi:10.1145/1481506.1481511 2008
[28]

In: Gopalakrishnan, G., Qadeer, S

Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Ve riﬁcation of probabilis- tic real-time systems. In: Gopalakrishnan, G., Qadeer, S. ( eds.) Computer Aided Veriﬁcation - 23rd International Conference, CA V 2011, Sno wbird, UT, USA, July 14-20, 2011. Proceedings. Lecture Notes in Computer Scienc e, vol. 6806, pp. 585–

work page 2011
[29]

Springer (2011), https://doi.org/10.1007/978-3-642-22110-1

work page doi:10.1007/978-3-642-22110-1 2011
[30]

In: Fahre nberg, U., Legay, A., Thrane, C.R

Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G ., Nielsen, B.: Learn- ing Markov decision processes for model checking. In: Fahre nberg, U., Legay, A., Thrane, C.R. (eds.) Proceedings Quantities in Formal Me thods, QFM 2012, Paris, France, 28 August 2012. EPTCS, vol. 103, pp. 49– 63 (2012), https://doi.org/10.4204/EPTCS.103.6

work page doi:10.4204/eptcs.103.6 2012
[31]

Machine Learning 105(2), 255–299 (2016), https://doi.org/10.1007/s10994-016-5565-9

Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G ., Nielsen, B.: Learning deterministic probabilistic automata from a model checkin g perspective. Machine Learning 105(2), 255–299 (2016), https://doi.org/10.1007/s10994-016-5565-9

work page doi:10.1007/s10994-016-5565-9 2016
[32]

In: Ninth IEEE Internati onal High-Level Design Validation and Test Workshop 2004

Margaria, T., Niese, O., Raﬀelt, H., Steﬀen, B.: Eﬃcient test-based model gener- ation for legacy reactive systems. In: Ninth IEEE Internati onal High-Level Design Validation and Test Workshop 2004. pp. 95–100. IEEE Compute r Society (2004)

work page 2004
[33]

Nerode, A.: Linear automaton transformations. Proc. Am . Math. Soc. 9, 541–544 (1958)

work page 1958
[34]

Journal of Computer Security 14(6), 561–589 (2006)

Norman, G., Shmatikov, V.: Analysis of probabilistic co ntract signing. Journal of Computer Security 14(6), 561–589 (2006)

work page 2006
[35]

Rivest, R.L., Schapire, R.E.: Inference of ﬁnite automa ta using homing sequences. Inf. Comput. 103(2), 299–347 (1993), https://doi.org/10.1006/inco.1993.1021

work page doi:10.1006/inco.1993.1021 1993
[36]

In: Jung, J., Holz, T

de Ruiter, J., Poll, E.: Protocol state fuzzing of TLS imp lementations. In: Jung, J., Holz, T. (eds.) 24th USENIX Security Symposium, USENIX Secu rity 15, Wash- ington, D.C., USA, August 12-14, 2015. pp. 193–206. USENIX A ssociation (2015)

work page 2015
[37]

In: Cav alcanti, A., Dams, D

Shahbaz, M., Groz, R.: Inferring Mealy machines. In: Cav alcanti, A., Dams, D. (eds.) FM 2009: Formal Methods, Second World Congre ss, Eindhoven, The Netherlands, November 2-6, 2009. Proceedin gs. Lecture Notes in Computer Science, vol. 5850, pp. 207–222. Springer (2009), https://doi.org/10.1007/978-3-642-05089-3_14

work page doi:10.1007/978-3-642-05089-3_14 2009
[38]

In: Bernardo, M., Issarny, V

Steﬀen, B., Howar, F., Merten, M.: Introduction to activ e automata learning from a practical perspective. In: Bernardo, M., Issarny, V. (eds.) Formal Meth- ods for Eternal Networked Software Systems - 11th Internati onal School on Formal Methods for the Design of Computer, Communication an d Software Systems, SFM 2011, Bertinoro, Italy, June 13-18, 2011....

work page doi:10.1007/978-3-642-21455-4_8 2011
[39]

https://doi.org/10.6084/m9.figshare.7960928.v1

Tappler, M.: Evaluation material for L∗-based learning of Markov decision pro- cesses. https://doi.org/10.6084/m9.figshare.7960928.v1

work page doi:10.6084/m9.figshare.7960928.v1
[40]

In: 2017 IEEE Intern ational Confer- ence on Software Testing, Veriﬁcation and Validation, ICST 2017, Tokyo, Japan, March 13-17, 2017

Tappler, M., Aichernig, B.K., Bloem, R.: Model-based te sting IoT commu- nication via active automata learning. In: 2017 IEEE Intern ational Confer- ence on Software Testing, Veriﬁcation and Validation, ICST 2017, Tokyo, Japan, March 13-17, 2017. pp. 276–287. IEEE Computer Societ y (2017), https://doi.org/10.1109/ICST.2017.32

work page doi:10.1109/icst.2017.32 2017
[41]

Soft- ware - Concepts and Tools 17(3), 103–120 (1996)

Tretmans, J.: Test generation with inputs, outputs and r epetitive quiescence. Soft- ware - Concepts and Tools 17(3), 103–120 (1996)

work page 1996
[42]

Ma- chine Learning 8, 151–166 (1992), https://doi.org/10.1007/BF00992862 L∗-Based Learning of Markov Decision Processes (Extended Ver sion) 43

Tzeng, W.: Learning probabilistic automata and Markov c hains via queries. Ma- chine Learning 8, 151–166 (1992), https://doi.org/10.1007/BF00992862 L∗-Based Learning of Markov Decision Processes (Extended Ver sion) 43

work page doi:10.1007/bf00992862 1992
[43]

Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984), https://doi.org/10.1145/1968.1972

work page doi:10.1145/1968.1972 1984
[44]

ECEASST 72 (2015), https://doi.org/10.14279/tuj.eceasst.72.1008

Volpato, M., Tretmans, J.: Approximate active learning of nonde- terministic input output transition systems. ECEASST 72 (2015), https://doi.org/10.14279/tuj.eceasst.72.1008

work page doi:10.14279/tuj.eceasst.72.1008 2015
[45]

In: Brim, L., Haverkort, B.R., Leucker, M., van de Pol, J

Willemse, T.A.C.: Heuristics for ioco-based test-base d modelling. In: Brim, L., Haverkort, B.R., Leucker, M., van de Pol, J. (eds.) Formal Me thods: Applications and Technology, 11th International Workshop, FMICS 2006 an d 5th International Workshop PDMC 2006, Bonn, Germany, August 26-27, and August 31, 2006, Re- vised Selected Papers. Lecture Notes in C...

work page doi:10.1007/978-3-540-70952-7_9 2006

[1] [1]

In: Bennaceur, A., H¨ ahnle, R., Meinke, K

Aichernig, B.K., Mostowski, W., Mousavi, M.R., Tappler, M., Taromirad, M.: Model learning and model-based testing. In: Bennaceur, A., H¨ ahnle, R., Meinke, K. (eds.) Machine Learning for Dynamic Software Analysis: P otentials and Limits - International Dagstuhl Seminar 16172, Dagstuhl Castle, G ermany, April 24-27, 2016, Revised Papers. Lecture Notes in C...

work page doi:10.1007/978-3-319-96562-8_3 2016

[2] [2]

Journal of Automated Reasoning (Oct 2 018), https://doi.org/10.1007/s10817-018-9486-0

Aichernig, B.K., Tappler, M.: Eﬃcient active automata le arning via mutation testing. Journal of Automated Reasoning (Oct 2 018), https://doi.org/10.1007/s10817-018-9486-0

work page doi:10.1007/s10817-018-9486-0

[3] [3]

Formal Methods in System Design (Ma y 2019), https://doi.org/10.1007/s10703-019-00333-0

Aichernig, B.K., Tappler, M.: Probabilistic black-box r eachability check- ing (extended version). Formal Methods in System Design (Ma y 2019), https://doi.org/10.1007/s10703-019-00333-0

work page doi:10.1007/s10703-019-00333-0 2019

[4] [4]

Angluin, D.: Learning regular sets from queries and count erexamples. Inf. Comput. 75(2), 87–106 (1987), https://doi.org/10.1016/0890-5401(87)90052-6

work page doi:10.1016/0890-5401(87)90052-6 1987

[5] [5]

Aspnes, J., Herlihy, M.: Fast randomized consensus using shared memory. J. Algo- rithms 11(3), 441–461 (1990), https://doi.org/10.1016/0196-6774(90)90021-6

work page doi:10.1016/0196-6774(90)90021-6 1990

[6] [6]

http://people.cs.aau.dk/~giovbacci/tools/bisimdist.zip, accessed: June 28, 2019

Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: MDPDist l ibrary. http://people.cs.aau.dk/~giovbacci/tools/bisimdist.zip, accessed: June 28, 2019

work page 2019

[7] [7]

In: Chatterjee, K., Sgall, J

Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: Computin g behavioral distances, compositionally. In: Chatterjee, K., Sgall, J. (eds.) Mathemati- cal Foundations of Computer Science 2013 - 38th Internation al Symposium, MFCS 2013, Klosterneuburg, Austria, August 26-30, 2013. Pr oceedings. Lec- ture Notes in Computer Science, vol. 8087, pp. 74–85. Spring ...

work page doi:10.1007/978-3-642-40313-2_9 2013

[8] [8]

In: Joshi, K.R., Siegle, M., Stoelinga, M., D’Argenio, P.R

Bacci, G., Bacci, G., Larsen, K.G., Mardare, R.: The Bisim Dist library: Eﬃcient computation of bisimilarity distances for Markovian model s. In: Joshi, K.R., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) Quantitative Eva luation of Systems - 10th International Conference, QEST 2013, Buenos Aires, Ar gentina, August 27- 30, 2013. Proceedings. Lecture ...

work page doi:10.1007/978-3-642-40196-1_23 2013

[9] [9]

MIT P ress (2008)

Baier, C., Katoen, J.: Principles of model checking. MIT P ress (2008)

work page 2008

[10] [10]

Bergadano, F., Varricchio, S.: Learning behaviors of au tomata from multi- plicity and equivalence queries. SIAM J. Comput. 25(6), 1268–1280 (1996), https://doi.org/10.1137/S009753979326091X

work page doi:10.1137/s009753979326091x 1996

[11] [11]

In: Carrasco, R.C., Oncina, J

Carrasco, R.C., Oncina, J.: Learning stochastic regula r grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) Grammatical Infer- ence and Applications, Second International Colloquium, I CGI-94, Alicante, Spain, September 21-23, 1994, Proceedings. Lecture Notes in Compu ter Science, vol. 862, pp. 139–152. Springer (1994), htt...

work page doi:10.1007/3-540-58473-0_144 1994

[12] [12]

ITA 33(1), 1–20 (1999), https://doi.org/10.1051/ita:1999102

Carrasco, R.C., Oncina, J.: Learning deterministic reg ular grammars from stochastic samples in polynomial time. ITA 33(1), 1–20 (1999), https://doi.org/10.1051/ita:1999102

work page doi:10.1051/ita:1999102 1999

[13] [13]

Formal Asp

Cassel, S., Howar, F., Jonsson, B., Steﬀen, B.: Active le arning for ex- tended ﬁnite state machines. Formal Asp. Comput. 28(2), 233–263 (2016), https://doi.org/10.1007/s00165-016-0355-5

work page doi:10.1007/s00165-016-0355-5 2016

[14] [14]

Castro, J., Gavald` a, R.: Learning Probability Distrib utions Generated by Finite- State Machines, pp. 113–142. Springer Berlin Heidelberg, B erlin, Heidelberg (2016), https://doi.org/10.1007/978-3-662-48395-4_5

work page doi:10.1007/978-3-662-48395-4_5 2016

[15] [15]

In: 11th International Conference on Machine Learning and Appli- cations, ICMLA, Boca Raton, FL, USA, December 12-15, 2012

Chen, Y., Nielsen, T.D.: Active learning of Markov decis ion processes for system veriﬁcation. In: 11th International Conference on Machine Learning and Appli- cations, ICMLA, Boca Raton, FL, USA, December 12-15, 2012. V olume 2. pp. 289–294. IEEE (2012), https://doi.org/10.1109/ICMLA.2012.158

work page doi:10.1109/icmla.2012.158 2012

[16] [16]

IEEE Trans

Chow, T.S.: Testing software design modeled by ﬁnite-st ate machines. IEEE Trans. Softw. Eng. 4(3), 178–187 (May 1978) L∗-Based Learning of Markov Decision Processes (Extended Ver sion) 41

work page 1978

[17] [17]

In: Bultan, T., Hsi- ung, P

Feng, L., Han, T., Kwiatkowska, M.Z., Parker, D.: Learni ng-based composi- tional veriﬁcation for synchronous probabilistic systems . In: Bultan, T., Hsi- ung, P. (eds.) Automated Technology for Veriﬁcation and Ana lysis, 9th Interna- tional Symposium, ATV A 2011, Taipei, Taiwan, October 11-14 , 2011. Proceed- ings. Lecture Notes in Computer Science, vol....

work page doi:10.1007/978-3-642-24372-1_40 2011

[18] [18]

: Combining model learn- ing and model checking to analyze TCP implementations

Fiter˘ au-Bro¸ stean, P., Janssen, R., Vaandrager, F.W. : Combining model learn- ing and model checking to analyze TCP implementations. In: C haudhuri, S., Farzan, A. (eds.) Computer Aided Veriﬁcation - 28th Interna tional Confer- ence, CA V 2016, Toronto, ON, Canada, July 17-23, 2016, Proce edings, Part II. Lecture Notes in Computer Science, vol. 9780, p...

work page doi:10.1007/978-3-319-41540-6_25 2016

[19] [19]

In: Bernardo, M ., Issarny, V

Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D.: A utomated veriﬁca- tion techniques for probabilistic systems. In: Bernardo, M ., Issarny, V. (eds.) Formal Methods for Eternal Networked Software Systems - 11t h International School on Formal Methods for the Design of Computer, Communi cation and Software Systems, SFM 2011, Bertinoro, Italy, June 13...

work page doi:10.1007/978-3-642-21455-4_3 2011

[20] [20]

In: Fox, D., Kavraki, L.E

Fu, J., Topcu, U.: Probably approximately correct MDP le arning and control with temporal logic constraints. In: Fox, D., Kavraki, L.E. , Kurniawati, H. (eds.) Robotics: Science and Systems X, University of California, Berkeley, USA, July 12-16, 2014 (2014), http://www.roboticsproceedings.org/rss10/p39.html

work page 2014

[21] [21]

Cambridge University Press, New York, NY, USA (2010)

de la Higuera, C.: Grammatical Inference: Learning Auto mata and Grammars. Cambridge University Press, New York, NY, USA (2010)

work page 2010

[22] [22]

Journal of the American Statistical Association 58(301), 13–30 (1963), http://www.jstor.org/stable/2282952

Hoeﬀding, W.: Probability inequalities for sums of boun ded random vari- ables. Journal of the American Statistical Association 58(301), 13–30 (1963), http://www.jstor.org/stable/2282952

work page arXiv 1963

[23] [23]

In: Bennaceur, A., H¨ ahn le, R., Meinke, K

Howar, F., Steﬀen, B.: Active automata learning in pract ice - an annotated bib- liography of the years 2011 to 2016. In: Bennaceur, A., H¨ ahn le, R., Meinke, K. (eds.) Machine Learning for Dynamic Software Analysis: Pot entials and Limits - International Dagstuhl Seminar 16172, Dagstuhl Castle, G ermany, April 24-27, 2016, Revised Papers. Lecture Notes ...

work page doi:10.1007/978-3-319-96562-8_5 2011

[24] [24]

In: Hunt Jr., W.A., Somenzi, F

Hungar, H., Niese, O., Steﬀen, B.: Domain-speciﬁc optim ization in automata learn- ing. In: Hunt Jr., W.A., Somenzi, F. (eds.) Computer Aided Ve riﬁcation, 15th International Conference, CA V 2003, Boulder, CO, USA, July 8-12, 2003, Pro- ceedings. Lecture Notes in Computer Science, vol. 2725, pp. 315–327. Springer (2003), https://doi.org/10.1007/978-3-540...

work page doi:10.1007/978-3-540-45069-6_31 2003

[25] [25]

In: Bonakdarpour, B., S molka, S.A

Isberner, M., Howar, F., Steﬀen, B.: The TTT algorithm: A redundancy-free ap- proach to active automata learning. In: Bonakdarpour, B., S molka, S.A. (eds.) Run- time Veriﬁcation - 5th International Conference, R V 2014, T oronto, ON, Canada, September 22-25, 2014. Proceedings. Lecture Notes in Compu ter Science, vol. 8734, pp. 307–322. Springer (2014), h...

work page doi:10.1007/978-3-319-11164-3_26 2014

[26] [26]

In: Clark, A., Kanazawa, M., Yoshinaka, R

Khalili, A., Tacchella, A.: Learning nondeterministic Mealy machines. In: Clark, A., Kanazawa, M., Yoshinaka, R. (eds.) Proceedings of the 12th I nternational Confer- ence on Grammatical Inference, ICGI 2014, Kyoto, Japan, Sep tember 17-19, 2014. JMLR Workshop and Conference Proceedings, vol. 34, pp. 109– 123. JMLR.org (2014), http://jmlr.org/proceedings/...

work page 2014

[27] [27]

SIGMETRICS Performance Evaluation Review 36(3), 17–22 (2008), https://doi.org/10.1145/1481506.1481511

Kwiatkowska, M.Z., Norman, G., Parker, D.: Analysis of a gossip protocol in PRISM. SIGMETRICS Performance Evaluation Review 36(3), 17–22 (2008), https://doi.org/10.1145/1481506.1481511

work page doi:10.1145/1481506.1481511 2008

[28] [28]

In: Gopalakrishnan, G., Qadeer, S

Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Ve riﬁcation of probabilis- tic real-time systems. In: Gopalakrishnan, G., Qadeer, S. ( eds.) Computer Aided Veriﬁcation - 23rd International Conference, CA V 2011, Sno wbird, UT, USA, July 14-20, 2011. Proceedings. Lecture Notes in Computer Scienc e, vol. 6806, pp. 585–

work page 2011

[29] [29]

Springer (2011), https://doi.org/10.1007/978-3-642-22110-1

work page doi:10.1007/978-3-642-22110-1 2011

[30] [30]

In: Fahre nberg, U., Legay, A., Thrane, C.R

Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G ., Nielsen, B.: Learn- ing Markov decision processes for model checking. In: Fahre nberg, U., Legay, A., Thrane, C.R. (eds.) Proceedings Quantities in Formal Me thods, QFM 2012, Paris, France, 28 August 2012. EPTCS, vol. 103, pp. 49– 63 (2012), https://doi.org/10.4204/EPTCS.103.6

work page doi:10.4204/eptcs.103.6 2012

[31] [31]

Machine Learning 105(2), 255–299 (2016), https://doi.org/10.1007/s10994-016-5565-9

Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G ., Nielsen, B.: Learning deterministic probabilistic automata from a model checkin g perspective. Machine Learning 105(2), 255–299 (2016), https://doi.org/10.1007/s10994-016-5565-9

work page doi:10.1007/s10994-016-5565-9 2016

[32] [32]

In: Ninth IEEE Internati onal High-Level Design Validation and Test Workshop 2004

Margaria, T., Niese, O., Raﬀelt, H., Steﬀen, B.: Eﬃcient test-based model gener- ation for legacy reactive systems. In: Ninth IEEE Internati onal High-Level Design Validation and Test Workshop 2004. pp. 95–100. IEEE Compute r Society (2004)

work page 2004

[33] [33]

Nerode, A.: Linear automaton transformations. Proc. Am . Math. Soc. 9, 541–544 (1958)

work page 1958

[34] [34]

Journal of Computer Security 14(6), 561–589 (2006)

Norman, G., Shmatikov, V.: Analysis of probabilistic co ntract signing. Journal of Computer Security 14(6), 561–589 (2006)

work page 2006

[35] [35]

Rivest, R.L., Schapire, R.E.: Inference of ﬁnite automa ta using homing sequences. Inf. Comput. 103(2), 299–347 (1993), https://doi.org/10.1006/inco.1993.1021

work page doi:10.1006/inco.1993.1021 1993

[36] [36]

In: Jung, J., Holz, T

de Ruiter, J., Poll, E.: Protocol state fuzzing of TLS imp lementations. In: Jung, J., Holz, T. (eds.) 24th USENIX Security Symposium, USENIX Secu rity 15, Wash- ington, D.C., USA, August 12-14, 2015. pp. 193–206. USENIX A ssociation (2015)

work page 2015

[37] [37]

In: Cav alcanti, A., Dams, D

Shahbaz, M., Groz, R.: Inferring Mealy machines. In: Cav alcanti, A., Dams, D. (eds.) FM 2009: Formal Methods, Second World Congre ss, Eindhoven, The Netherlands, November 2-6, 2009. Proceedin gs. Lecture Notes in Computer Science, vol. 5850, pp. 207–222. Springer (2009), https://doi.org/10.1007/978-3-642-05089-3_14

work page doi:10.1007/978-3-642-05089-3_14 2009

[38] [38]

In: Bernardo, M., Issarny, V

Steﬀen, B., Howar, F., Merten, M.: Introduction to activ e automata learning from a practical perspective. In: Bernardo, M., Issarny, V. (eds.) Formal Meth- ods for Eternal Networked Software Systems - 11th Internati onal School on Formal Methods for the Design of Computer, Communication an d Software Systems, SFM 2011, Bertinoro, Italy, June 13-18, 2011....

work page doi:10.1007/978-3-642-21455-4_8 2011

[39] [39]

https://doi.org/10.6084/m9.figshare.7960928.v1

Tappler, M.: Evaluation material for L∗-based learning of Markov decision pro- cesses. https://doi.org/10.6084/m9.figshare.7960928.v1

work page doi:10.6084/m9.figshare.7960928.v1

[40] [40]

In: 2017 IEEE Intern ational Confer- ence on Software Testing, Veriﬁcation and Validation, ICST 2017, Tokyo, Japan, March 13-17, 2017

Tappler, M., Aichernig, B.K., Bloem, R.: Model-based te sting IoT commu- nication via active automata learning. In: 2017 IEEE Intern ational Confer- ence on Software Testing, Veriﬁcation and Validation, ICST 2017, Tokyo, Japan, March 13-17, 2017. pp. 276–287. IEEE Computer Societ y (2017), https://doi.org/10.1109/ICST.2017.32

work page doi:10.1109/icst.2017.32 2017

[41] [41]

Soft- ware - Concepts and Tools 17(3), 103–120 (1996)

Tretmans, J.: Test generation with inputs, outputs and r epetitive quiescence. Soft- ware - Concepts and Tools 17(3), 103–120 (1996)

work page 1996

[42] [42]

Ma- chine Learning 8, 151–166 (1992), https://doi.org/10.1007/BF00992862 L∗-Based Learning of Markov Decision Processes (Extended Ver sion) 43

Tzeng, W.: Learning probabilistic automata and Markov c hains via queries. Ma- chine Learning 8, 151–166 (1992), https://doi.org/10.1007/BF00992862 L∗-Based Learning of Markov Decision Processes (Extended Ver sion) 43

work page doi:10.1007/bf00992862 1992

[43] [43]

Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984), https://doi.org/10.1145/1968.1972

work page doi:10.1145/1968.1972 1984

[44] [44]

ECEASST 72 (2015), https://doi.org/10.14279/tuj.eceasst.72.1008

Volpato, M., Tretmans, J.: Approximate active learning of nonde- terministic input output transition systems. ECEASST 72 (2015), https://doi.org/10.14279/tuj.eceasst.72.1008

work page doi:10.14279/tuj.eceasst.72.1008 2015

[45] [45]

In: Brim, L., Haverkort, B.R., Leucker, M., van de Pol, J

Willemse, T.A.C.: Heuristics for ioco-based test-base d modelling. In: Brim, L., Haverkort, B.R., Leucker, M., van de Pol, J. (eds.) Formal Me thods: Applications and Technology, 11th International Workshop, FMICS 2006 an d 5th International Workshop PDMC 2006, Bonn, Germany, August 26-27, and August 31, 2006, Re- vised Selected Papers. Lecture Notes in C...

work page doi:10.1007/978-3-540-70952-7_9 2006