Efficient Autonomy Validation in Simulation with Adaptive Stress Testing

Mark Koren; Mykel Kochenderfer

arxiv: 1907.06795 · v1 · pith:OR5HO7WBnew · submitted 2019-07-16 · 💻 cs.LG · cs.RO· cs.SE· cs.SY· eess.SY· stat.ML

Efficient Autonomy Validation in Simulation with Adaptive Stress Testing

Mark Koren , Mykel Kochenderfer This is my paper

Pith reviewed 2026-05-24 21:13 UTC · model grok-4.3

classification 💻 cs.LG cs.ROcs.SEcs.SYeess.SYstat.ML

keywords adaptive stress testingautonomous systemsreinforcement learningrecurrent neural networksfailure scenario detectionmarkov decision processessimulation validationcontinuous state spaces

0 comments

The pith

A recurrent neural network lets one adaptive stress testing solver handle any initial condition across a continuous space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses limitations in using deep reinforcement learning to solve Adaptive Stress Testing as a Markov decision process for finding likely failure scenarios in autonomous systems. Earlier solvers relied on feed-forward networks over discretized initial conditions, which forced separate runs for each condition and required inspecting internal system states. Switching to a recurrent neural network that accepts sets of initial conditions from a continuous space allows the solver to generalize across the full range in a single run. This change also removes the need to access internal states, treating the system as a black box. The method is shown on an autonomous car and crossing pedestrian example, where it solves problems that were previously intractable.

Core claim

The central claim is that a recurrent neural network policy, trained to take sets of initial conditions drawn from a continuous space as input, produces a single solver instance for Adaptive Stress Testing that generalizes across the space, identifies the most likely failure scenarios, and eliminates both the discretization requirement and the internal-state dependency of prior feed-forward approaches.

What carries the argument

Recurrent neural network policy that processes sets of initial conditions from a continuous space as input to the deep reinforcement learning solver for the Markov decision process.

If this is right

The autonomous system under test can be treated as a black box without analyzing its internal state.
A single solver instance covers the full continuous space of initial conditions rather than requiring a new instance for each discrete case.
Problems involving continuous initial conditions that were previously intractable due to repeated solver runs become solvable.
The underlying relationship between similar initial conditions is captured through the network's generalization instead of being ignored by discretization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same recurrent-network approach could be applied to other reinforcement learning problems that involve continuous parameter spaces instead of discrete grids.
Computational cost for large-scale autonomy validation would drop because only one training run is needed instead of many.
If the learned policy transfers to real sensor data, the method could support validation pipelines that mix simulation with limited physical testing.

Load-bearing premise

The recurrent neural network can learn a single policy that generalizes reliably across the entire continuous space of initial conditions without separate training or loss of accuracy compared with per-condition solvers.

What would settle it

Run the recurrent solver on a held-out initial condition and compare the found failure scenario and its likelihood against the output of a dedicated feed-forward solver trained only on that same condition; any systematic mismatch falsifies the generalization claim.

Figures

Figures reproduced from arXiv: 1907.06795 by Mark Koren, Mykel Kochenderfer.

**Figure 2.** Figure 2: The AST methodology. The simulator is treated as a black [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Contrasting the new and old AST architectures. The new solver uses a recurrent architecture and is able to generalize across a [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: The DRL solver architecture. The recurrent neural network [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: The Mahalanobis distance of the most-likely failure found [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

During the development of autonomous systems such as driverless cars, it is important to characterize the scenarios that are most likely to result in failure. Adaptive Stress Testing (AST) provides a way to search for the most-likely failure scenario as a Markov decision process (MDP). Our previous work used a deep reinforcement learning (DRL) solver to identify likely failure scenarios. However, the solver's use of a feed-forward neural network with a discretized space of possible initial conditions poses two major problems. First, the system is not treated as a black box, in that it requires analyzing the internal state of the system, which leads to considerable implementation complexities. Second, in order to simulate realistic settings, a new instance of the solver needs to be run for each initial condition. Running a new solver for each initial condition not only significantly increases the computational complexity, but also disregards the underlying relationship between similar initial conditions. We provide a solution to both problems by employing a recurrent neural network that takes a set of initial conditions from a continuous space as input. This approach enables robust and efficient detection of failures because the solution generalizes across the entire space of initial conditions. By simulating an instance where an autonomous car drives while a pedestrian is crossing a road, we demonstrate the solver is now capable of finding solutions for problems that would have previously been intractable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that replacing the feed-forward network in Adaptive Stress Testing (AST) with a recurrent neural network (RNN) conditioned on initial conditions drawn from a continuous space solves two problems: it treats the system under test as a black box and eliminates the need to retrain a separate solver for each discretized initial condition. The RNN is asserted to generalize across the full continuous space, enabling detection of failure scenarios that were previously intractable, as shown in a single pedestrian-crossing example with an autonomous vehicle.

Significance. If the generalization claim holds with no loss in failure-finding effectiveness relative to per-condition solvers, the method would reduce computational cost for black-box validation of autonomous systems and allow a single trained policy to cover ranges of initial conditions. The work builds on prior AST + DRL results but supplies no quantitative evidence (failure likelihood, success rate, sample efficiency, or baseline comparisons) to support the generalization or intractability claims.

major comments (2)

[Abstract] Abstract: the central claim that the RNN 'generalizes across the entire space of initial conditions' and solves 'previously intractable' problems is presented without any quantitative metrics, baseline comparisons to per-condition feed-forward solvers, or details on training stability and accuracy retention. This leaves the load-bearing assertion that a single conditioned RNN matches or exceeds the effectiveness of separate solvers unevaluated.
The manuscript asserts that the RNN input structure allows conditioning on continuous initial conditions without discretization, yet provides no description of the input encoding, loss function, or how the recurrent state is used to produce actions in the MDP; without these, it is impossible to assess whether the claimed generalization is achieved by construction or by empirical performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and indicate the revisions that will be incorporated into the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the RNN 'generalizes across the entire space of initial conditions' and solves 'previously intractable' problems is presented without any quantitative metrics, baseline comparisons to per-condition feed-forward solvers, or details on training stability and accuracy retention. This leaves the load-bearing assertion that a single conditioned RNN matches or exceeds the effectiveness of separate solvers unevaluated.

Authors: We agree that the abstract and main text would be strengthened by explicit quantitative support. The pedestrian-crossing example illustrates the approach on a continuous initial-condition space, but we will revise the abstract and add a dedicated results subsection providing failure likelihood, success rate, sample efficiency, training stability metrics, and direct comparisons against per-condition feed-forward solvers. These additions will allow readers to evaluate the generalization claim quantitatively. revision: yes
Referee: The manuscript asserts that the RNN input structure allows conditioning on continuous initial conditions without discretization, yet provides no description of the input encoding, loss function, or how the recurrent state is used to produce actions in the MDP; without these, it is impossible to assess whether the claimed generalization is achieved by construction or by empirical performance.

Authors: We will expand the methods section to include a precise description of the input encoding used for continuous initial conditions, the loss function applied during RNN training, and the manner in which the recurrent hidden state is mapped to actions within the AST MDP. These details will clarify the architectural basis for generalization. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural replacement of feed-forward NN by RNN for continuous ICs is presented as an independent engineering change.

full rationale

The paper's core contribution is the substitution of a per-condition feed-forward network (from prior work) with a single RNN that ingests a set of initial conditions drawn from a continuous space. No equations, fitted parameters, or derived quantities are shown that reduce the claimed generalization or intractability solution back to the inputs by construction. The reference to 'our previous work' merely identifies the baseline solver being improved; it does not supply a uniqueness theorem or ansatz that the new method is forced to adopt. The demonstration on the car-pedestrian scenario is offered as external validation rather than a self-referential fit. This is a standard non-circular methodological extension.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard MDP formulation of AST and the untested assumption that an RNN policy will generalize across initial conditions; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Failure scenario search in autonomous systems can be formulated as finding the most likely path in a Markov decision process.
This is the foundational modeling choice inherited from prior AST work and invoked to justify the RL solver.

pith-pipeline@v0.9.0 · 5783 in / 1195 out tokens · 36556 ms · 2026-05-24T21:13:09.977612+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Validation and Veriﬁcation of Automated Road Vehicles,

V . Agaram, F. Barickman, F. Fahrenkrog, E. Griffor, I. Muharemovic, H. Peng, J. Salinger, S. Shladover, and W. Shogren, “Validation and Veriﬁcation of Automated Road Vehicles,” inRoad Vehicle Automation 3, G. Meyer and S. Beiker, Eds. Springer, 2016, pp. 201–210

work page 2016
[2]

Challenges in autonomous vehicle testing and validation,

P. Koopman and M. Wagner, “Challenges in autonomous vehicle testing and validation,” SAE International Journal of Transportation Safety, vol. 4, no. 1, pp. 15–24, 2016

work page 2016
[3]

Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?

N. Kalra and S. M. Paddock, “Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?” Transportation Research Part A: Policy and Practice, vol. 94, pp. 182– 193, 2016

work page 2016
[4]

The heavy tail safety ceiling,

P. Koopman, “The heavy tail safety ceiling,” in Automated and Connected Vehicle Systems Testing Symposium , 2018

work page 2018
[5]

Adaptive generation of challenging scenarios for testing and evaluation of autonomous vehicles,

G. E. Mullins, P. G. Stankiewicz, R. C. Hawthorne, and S. K. Gupta, “Adaptive generation of challenging scenarios for testing and evaluation of autonomous vehicles,” Journal of Systems and Software , vol. 137, pp. 197–215, 2018

work page 2018
[6]

Scalable end-to-end autonomous vehicle testing via rare-event simu- lation,

M. O’Kelly, A. Sinha, H. Namkoong, R. Tedrake, and J. C. Duchi, “Scalable end-to-end autonomous vehicle testing via rare-event simu- lation,” in Advances in Neural Information Processing Systems , 2018, pp. 9827–9838

work page 2018
[7]

Simulation-based adversarial test generation for autonomous vehicles with machine learning components,

C. E. Tuncali, G. Fainekos, H. Ito, and J. Kapinski, “Simulation-based adversarial test generation for autonomous vehicles with machine learning components,” in IEEE Intelligent Vehicles Symposium, 2018, pp. 1555–1562

work page 2018
[8]

Intelligence testing for autonomous vehicles: A new approach,

L. Li, W.-L. Huang, Y . Liu, N.-N. Zheng, and F.-Y . Wang, “Intelligence testing for autonomous vehicles: A new approach,” IEEE Transactions on Intelligent Vehicles, vol. 1, no. 2, pp. 158–166, 2016

work page 2016
[9]

Adaptive stress testing of airborne collision avoidance systems,

R. Lee, M. J. Kochenderfer, O. J. Mengshoel, G. P. Brat, and M. P. Owen, “Adaptive stress testing of airborne collision avoidance systems,” in Digital Avionics Systems Conference (DASC) , 2015

work page 2015
[10]

Adaptive stress testing for autonomous vehicles,

M. Koren, S. Alsaif, R. Lee, and M. J. Kochenderfer, “Adaptive stress testing for autonomous vehicles,” in IEEE Intelligent Vehicles Symposium, 2018, pp. 1–7

work page 2018
[11]

M. J. Kochenderfer, Decision Making Under Uncertainty. MIT Press, 2015

work page 2015
[12]

Bandit based Monte Carlo planning,

L. Kocsis and C. Szepesv ´ari, “Bandit based Monte Carlo planning,” in European Conference on Machine Learning (ECML) , 2006

work page 2006
[13]

A survey of Monte Carlo tree search methods,

C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games , vol. 4, no. 1, pp. 1–43, 2012

work page 2012
[14]

Continuous upper conﬁdence trees,

A. Cou ¨etoux, J.-B. Hoock, N. Sokolovska, O. Teytaud, and N. Bon- nard, “Continuous upper conﬁdence trees,” in International Confer- ence on Learning and Intelligent Optimization . Springer, 2011, pp. 433–445

work page 2011
[15]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997
[16]

Trust region policy optimization

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization.” in International Conference on Machine Learning (ICML), vol. 37, 2015, pp. 1889–1897

work page 2015
[17]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estima- tion,” arXiv preprint arXiv:1506.02438 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

On the generalised distance in statistics,

P. C. Mahalanobis, “On the generalised distance in statistics,” Pro- ceedings of the National Institute of Sciences of India , vol. 2, no. 1, pp. 49–55, 1936

work page 1936
[19]

Benchmarking deep reinforcement learning for continuous control,

Y . Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International Conference on Machine Learning (ICML) , 2016

work page 2016

[1] [1]

Validation and Veriﬁcation of Automated Road Vehicles,

V . Agaram, F. Barickman, F. Fahrenkrog, E. Griffor, I. Muharemovic, H. Peng, J. Salinger, S. Shladover, and W. Shogren, “Validation and Veriﬁcation of Automated Road Vehicles,” inRoad Vehicle Automation 3, G. Meyer and S. Beiker, Eds. Springer, 2016, pp. 201–210

work page 2016

[2] [2]

Challenges in autonomous vehicle testing and validation,

P. Koopman and M. Wagner, “Challenges in autonomous vehicle testing and validation,” SAE International Journal of Transportation Safety, vol. 4, no. 1, pp. 15–24, 2016

work page 2016

[3] [3]

Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?

N. Kalra and S. M. Paddock, “Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?” Transportation Research Part A: Policy and Practice, vol. 94, pp. 182– 193, 2016

work page 2016

[4] [4]

The heavy tail safety ceiling,

P. Koopman, “The heavy tail safety ceiling,” in Automated and Connected Vehicle Systems Testing Symposium , 2018

work page 2018

[5] [5]

Adaptive generation of challenging scenarios for testing and evaluation of autonomous vehicles,

G. E. Mullins, P. G. Stankiewicz, R. C. Hawthorne, and S. K. Gupta, “Adaptive generation of challenging scenarios for testing and evaluation of autonomous vehicles,” Journal of Systems and Software , vol. 137, pp. 197–215, 2018

work page 2018

[6] [6]

Scalable end-to-end autonomous vehicle testing via rare-event simu- lation,

M. O’Kelly, A. Sinha, H. Namkoong, R. Tedrake, and J. C. Duchi, “Scalable end-to-end autonomous vehicle testing via rare-event simu- lation,” in Advances in Neural Information Processing Systems , 2018, pp. 9827–9838

work page 2018

[7] [7]

Simulation-based adversarial test generation for autonomous vehicles with machine learning components,

C. E. Tuncali, G. Fainekos, H. Ito, and J. Kapinski, “Simulation-based adversarial test generation for autonomous vehicles with machine learning components,” in IEEE Intelligent Vehicles Symposium, 2018, pp. 1555–1562

work page 2018

[8] [8]

Intelligence testing for autonomous vehicles: A new approach,

L. Li, W.-L. Huang, Y . Liu, N.-N. Zheng, and F.-Y . Wang, “Intelligence testing for autonomous vehicles: A new approach,” IEEE Transactions on Intelligent Vehicles, vol. 1, no. 2, pp. 158–166, 2016

work page 2016

[9] [9]

Adaptive stress testing of airborne collision avoidance systems,

R. Lee, M. J. Kochenderfer, O. J. Mengshoel, G. P. Brat, and M. P. Owen, “Adaptive stress testing of airborne collision avoidance systems,” in Digital Avionics Systems Conference (DASC) , 2015

work page 2015

[10] [10]

Adaptive stress testing for autonomous vehicles,

M. Koren, S. Alsaif, R. Lee, and M. J. Kochenderfer, “Adaptive stress testing for autonomous vehicles,” in IEEE Intelligent Vehicles Symposium, 2018, pp. 1–7

work page 2018

[11] [11]

M. J. Kochenderfer, Decision Making Under Uncertainty. MIT Press, 2015

work page 2015

[12] [12]

Bandit based Monte Carlo planning,

L. Kocsis and C. Szepesv ´ari, “Bandit based Monte Carlo planning,” in European Conference on Machine Learning (ECML) , 2006

work page 2006

[13] [13]

A survey of Monte Carlo tree search methods,

C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games , vol. 4, no. 1, pp. 1–43, 2012

work page 2012

[14] [14]

Continuous upper conﬁdence trees,

A. Cou ¨etoux, J.-B. Hoock, N. Sokolovska, O. Teytaud, and N. Bon- nard, “Continuous upper conﬁdence trees,” in International Confer- ence on Learning and Intelligent Optimization . Springer, 2011, pp. 433–445

work page 2011

[15] [15]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997

[16] [16]

Trust region policy optimization

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization.” in International Conference on Machine Learning (ICML), vol. 37, 2015, pp. 1889–1897

work page 2015

[17] [17]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estima- tion,” arXiv preprint arXiv:1506.02438 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

On the generalised distance in statistics,

P. C. Mahalanobis, “On the generalised distance in statistics,” Pro- ceedings of the National Institute of Sciences of India , vol. 2, no. 1, pp. 49–55, 1936

work page 1936

[19] [19]

Benchmarking deep reinforcement learning for continuous control,

Y . Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International Conference on Machine Learning (ICML) , 2016

work page 2016