Safe Interactions via Monte Carlo Linear-Quadratic Games

Benjamin A. Christie; Dylan P. Losey

arxiv: 2504.06124 · v3 · submitted 2025-04-08 · 💻 cs.RO

Safe Interactions via Monte Carlo Linear-Quadratic Games

Benjamin A. Christie , Dylan P. Losey This is my paper

Pith reviewed 2026-05-22 20:11 UTC · model grok-4.3

classification 💻 cs.RO

keywords human-robot interactionzero-sum gamesMonte Carlo searchNash equilibriumsafetylinear-quadratic gamesrobot planning

0 comments

The pith

Robots find safe policies for unpredictable humans by starting with a linear-quadratic game solution and refining it via Monte Carlo search toward the Nash equilibrium.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to generate robot behaviors that remain safe even when people act in unexpected ways. It models the interaction as a zero-sum game in which the human's actions oppose the robot's goals, then solves for the Nash equilibrium policy that works well across many possible human choices. Rather than using exact but intractable Hamilton-Jacobi methods or relying solely on approximate linear-quadratic solutions, the approach begins with the linear-quadratic policy as an initial guess and iteratively improves it with Monte Carlo search. This produces real-time adjustable policies whose level of conservatism the designer can tune. Simulations and a user study indicate gains in both speed and safety performance over prior methods.

Core claim

Formulating human-robot interaction as a zero-sum game and solving for its Nash equilibrium yields robot policies that maximize safety and performance against a wide range of human actions. The MCLQ method obtains an initial policy from the linear-quadratic approximation of this game and refines it through Monte Carlo search to converge toward the equilibrium, delivering both computational efficiency and the ability to control conservatism without focusing on unrealistic human behaviors.

What carries the argument

MCLQ, the method that takes the solution of a linear-quadratic game as an initial guess at safe robot behavior and iteratively improves it with Monte Carlo search to approach the Nash equilibrium of the underlying zero-sum game.

If this is right

The robot can make real-time safety adjustments during interaction.
Designers can tune the robot's conservatism to avoid over-preparation for unrealistic human behaviors.
Expected performance improves compared with pure linear-quadratic or intractable exact methods.
The same framework applies across varied human-robot tasks without requiring precise prediction of human intent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may extend to other multi-agent settings where one agent must remain safe against an adversarial or unpredictable counterpart.
It reduces reliance on detailed models of typical human behavior by focusing instead on worst-case robustness.
Further experiments could test whether the Monte Carlo refinement step scales to higher-dimensional state spaces or longer horizons.

Load-bearing premise

The zero-sum game formulation correctly captures the worst-case human behavior the robot must guard against, and the linear-quadratic approximation is close enough that Monte Carlo search converges to useful policies inside real-time limits.

What would settle it

Deploy the computed policies on a physical robot in live human interactions and measure whether collision rates or safety violations exceed those of baseline methods when humans deviate from the modeled worst-case actions.

Figures

Figures reproduced from arXiv: 2504.06124 by Benjamin A. Christie, Dylan P. Losey.

**Figure 2.** Figure 2: Simulation results across point-mass, driving, and manipulator environments. (Left) We plot the cost and (Right) computation time averaged over 100 simulations. Computation time is the number of milliseconds per robot action (normalized by the number of timesteps per trajectory). In non-LQ settings the computation time for NE is prohibitively high; e.g., in driving the NE computation time exceeded one hour… view at source ↗

**Figure 3.** Figure 3: Simulation results for a modified point-mass environment where we adjust the safety margin [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Generally, when the human model aligns with the actual human behavior, MCLQ avoids worst [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Results from our user study in Section 6. Participants walked around a room to assemble a tower; a drone completed revolutions around the same workspace to monitor the human’s progress (also see [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

read the original abstract

Safety is critical during human-robot interaction. But -- because people are inherently unpredictable -- it is often difficult for robots to plan safe behaviors. Instead of relying on our ability to anticipate humans, here we identify robot policies that are robust to unexpected human decisions. We achieve this by formulating human-robot interaction as a zero-sum game, where (in the worst case) the human's actions directly conflict with the robot's objective. Solving for the Nash Equilibrium of this game provides robot policies that maximize safety and performance across a wide range of human actions. Existing approaches attempt to find these optimal policies by leveraging Hamilton-Jacobi analysis (which is intractable) or linear-quadratic approximations (which are inexact). By contrast, in this work we propose a computationally efficient and theoretically justified method that converges towards the Nash Equilibrium policy. Our approach (which we call MCLQ) leverages linear-quadratic games to obtain an initial guess at safe robot behavior, and then iteratively refines that guess with a Monte Carlo search. Not only does MCLQ provide real-time safety adjustments, but it also enables the designer to tune how conservative the robot is -- preventing the system from focusing on unrealistic human behaviors. Our simulations and user study suggest that this approach advances safety in terms of both computation time and expected performance. See videos of our experiments here: https://youtu.be/KJuHeiWVuWY.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MCLQ blends LQ initialization with Monte Carlo refinement for real-time safe robot policies, but the refinement step lacks a formal convergence argument.

read the letter

The main point is that this work gives a method called MCLQ that uses a linear-quadratic game to get a quick initial policy for safe robot behavior and then applies Monte Carlo search to improve it toward the Nash equilibrium of a zero-sum game with the human. What works here is the practicality. Pure game theory solutions like Hamilton-Jacobi are too slow for real robots, while basic LQ is fast but not accurate enough for safety. This hybrid lets them adjust how conservative the robot is, which helps avoid planning for impossible human moves. The simulations show it runs in real time with better performance, and the user study suggests it feels safer in practice. The new part is this specific iterative refinement for the safety setting. The soft spot is the Monte Carlo step. It is supposed to converge to the equilibrium, but without a proof or convergence rate for the continuous case, we have to take the empirical results on faith. If the initial LQ guess is poor, the search might not close the gap reliably. The zero-sum model is also a strong assumption that may not match typical human actions. This is worth looking at for anyone in robotics working on human interaction safety. It sits between theory and implementation in a useful way. I would recommend sending it for peer review. The idea is sound enough and the results are promising, so referees can sort out the details on convergence and validation.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes MCLQ, a hybrid method for safe human-robot interaction that formulates the problem as a zero-sum game between robot and human. It obtains an initial policy via linear-quadratic game solution and iteratively refines it with Monte Carlo search to approach the Nash equilibrium, claiming real-time computation, tunable conservatism, theoretical justification for convergence, and improved safety/performance in simulations and a user study.

Significance. If the Monte Carlo refinement step reliably improves safety metrics while approaching equilibrium policies, the work would provide a practical bridge between intractable Hamilton-Jacobi reachability and inexact LQ approximations, with the added benefit of explicit conservatism tuning. The combination of an LQ warm-start with sampling-based refinement is a concrete strength that could influence real-time HRI controllers.

major comments (2)

[§4] §4 (Monte Carlo Refinement): The central claim that MCLQ 'converges towards the Nash Equilibrium policy' is load-bearing, yet no convergence rate, contraction argument, or regret bound is provided for the Monte Carlo procedure in the continuous state-action space of the robot-human dynamics; the termination criterion appears empirical rather than tied to equilibrium approximation.
[§5.1] §5.1 (Simulation Results): The reported improvements in expected performance and safety are presented without ablation isolating the contribution of the Monte Carlo iterations versus the LQ initial guess alone; this weakens the claim that the refinement step is what enables the observed gains.

minor comments (2)

The abstract states that MCLQ 'enables the designer to tune how conservative the robot is,' but the manuscript does not specify the exact mechanism (e.g., sampling distribution or cost weighting) used to achieve this tuning.
Figure captions and experimental setup descriptions should include the precise state dimension, control bounds, and number of Monte Carlo samples per iteration to allow reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments identify key areas where additional clarity and evidence would strengthen the manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [§4] §4 (Monte Carlo Refinement): The central claim that MCLQ 'converges towards the Nash Equilibrium policy' is load-bearing, yet no convergence rate, contraction argument, or regret bound is provided for the Monte Carlo procedure in the continuous state-action space of the robot-human dynamics; the termination criterion appears empirical rather than tied to equilibrium approximation.

Authors: We agree that a formal convergence rate or contraction argument for the Monte Carlo refinement step in continuous state-action spaces is not provided. The manuscript offers a theoretical justification based on the fact that the LQ solution provides a conservative initial policy and that Monte Carlo sampling can improve the value estimate toward the true Nash equilibrium in expectation, but this falls short of a rigorous rate or regret bound. The termination criterion is indeed driven by practical metrics such as policy improvement and safety margins observed in simulation. In the revision we will add an expanded discussion section that explicitly states these limitations, clarifies the nature of the existing justification, and outlines directions for future analysis (e.g., discretization arguments or regret bounds under Lipschitz assumptions). We believe the current empirical evidence still supports the practical utility of the approach. revision: partial
Referee: [§5.1] §5.1 (Simulation Results): The reported improvements in expected performance and safety are presented without ablation isolating the contribution of the Monte Carlo iterations versus the LQ initial guess alone; this weakens the claim that the refinement step is what enables the observed gains.

Authors: We concur that an ablation isolating the Monte Carlo refinement from the LQ warm-start would strengthen the empirical claims. We will add a new subsection (or expanded table) in §5.1 that reports performance and safety metrics for (i) the pure LQ policy, (ii) MCLQ after a small number of Monte Carlo iterations, and (iii) MCLQ after the full iteration budget. This will directly quantify the incremental benefit of the refinement step. revision: yes

Circularity Check

0 steps flagged

No significant circularity; MCLQ derivation combines standard LQ initialization with independent Monte Carlo refinement

full rationale

The paper's chain begins with the standard zero-sum game formulation of human-robot interaction and uses established linear-quadratic approximations solely for an initial policy guess. The Monte Carlo search is introduced as an additional iterative refinement step whose claimed convergence toward Nash is not shown (via any quoted equation or self-citation) to be equivalent to the LQ input by construction. No self-definitional mappings, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems imported from the authors, smuggled ansatzes, or renamings of known results appear in the provided abstract or description. The central claim therefore retains independent algorithmic content and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions from differential game theory and the practical utility of LQ approximations; no new entities are introduced.

axioms (2)

domain assumption Human-robot interaction can be usefully modeled as a zero-sum game in which the human's actions directly oppose the robot's objective.
Explicitly stated in the abstract as the chosen formulation.
domain assumption The linear-quadratic approximation yields an initial policy sufficiently close to the true Nash equilibrium for Monte Carlo refinement to be effective.
Implicit in the description of MCLQ as an iterative refinement procedure.

pith-pipeline@v0.9.0 · 5771 in / 1340 out tokens · 74224 ms · 2026-05-22T20:11:31.509664+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MCLQ leverages linear-quadratic games to obtain an initial guess... then iteratively refines that guess with a Monte Carlo search
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Solving for the Nash Equilibrium of this game provides robot policies that maximize safety

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

Tianjiao An, Xinye Zhu, Mingchao Zhu, Bing Ma, and Bo Dong. 2023. Fuzzy logic nonzero-sum game-based distributed approximated optimal control of modular robot manipulators with human-robot collaboration. Neurocomputing 543 (2023), 126276

work page 2023
[2]

Christophe Andrieu and Éric Moulines. 2006. On the ergodicity properties of some adaptive MCMC algorithms. The Annals of Applied Probability 16, 1 (2006), 1462–1505

work page 2006
[3]

Andrea Bajcsy, Somil Bansal, Eli Bronstein, Varun Tolani, and Claire J Tomlin. 2019. An efficient reachability-based framework for provably safe autonomous navigation in unknown environments. In IEEE Conference on Decision and Control. 1758–1765

work page 2019
[4]

Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D Dragan, and Claire J Tomlin. 2020. A Hamilton-Jacobi reachability- based framework for predicting and analyzing human motion for safe planning. In IEEE International Conference on Robotics and Automation. 7149–7155

work page 2020
[5]

Somil Bansal, Mo Chen, Sylvia Herbert, and Claire J Tomlin. 2017. Hamilton-Jacobi reachability: A brief overview and recent advances. In IEEE Annual Conference on Decision and Control. 2242–2253. 18 Benjamin A. Christie and Dylan P. Losey

work page 2017
[6]

Somil Bansal and Claire J Tomlin. 2021. Deepreach: A deep learning approach to high-dimensional reachability. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1817–1824

work page 2021
[7]

Tamer Başar and Geert Jan Olsder. 1998. Dynamic Noncooperative Game Theory. SIAM

work page 1998
[8]

Siddhartha Chib and Edward Greenberg. 1995. Understanding the metropolis-hastings algorithm. The american statistician 49, 4 (1995), 327–335

work page 1995
[9]

Christian M Chilan and Bruce A Conway. 2020. Optimal nonlinear control using Hamilton–Jacobi–Bellman viscosity solutions on unstructured grids. Journal of Guidance, Control, and Dynamics (2020)

work page 2020
[10]

Benjamin A Christie and Dylan P Losey. 2024. LIMIT: Learning interfaces to maximize information transfer. ACM Transactions on Human-Robot Interaction 13, 4 (2024), 1–26

work page 2024
[11]

Jaime F Fisac, Neil F Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, and Claire J Tomlin. 2019. Bridging hamilton- jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 8550–8556

work page 2019
[12]

David Fridovich-Keil, Andrea Bajcsy, Jaime F Fisac, Sylvia L Herbert, Steven Wang, Anca D Dragan, and Claire J Tomlin. 2020. Confidence-aware motion prediction for real-time collision avoidance. The International Journal of Robotics Research 39, 2-3 (2020), 250–265

work page 2020
[13]

David Fridovich-Keil, Ellis Ratner, Lasse Peters, Anca D Dragan, and Claire J Tomlin. 2020. Efficient iterative linear- quadratic approximations for nonlinear multi-player general-sum differential games. InIEEE International Conference on Robotics and Automation. 1475–1481

work page 2020
[14]

Joshua Hoegerman and Dylan Losey. 2023. Reward learning with intractable normalizing functions. IEEE Robotics and Automation Letters 8, 11 (2023), 7511–7518

work page 2023
[15]

Kai-Chieh Hsu, Haimin Hu, and Jaime F Fisac. 2023. The safety filter: A unified view of safety-critical control in autonomous systems. Annual Review of Control, Robotics, and Autonomous Systems 7 (2023)

work page 2023
[16]

Haimin Hu, David Isele, Sangjae Bae, and Jaime F Fisac. 2024. Active uncertainty reduction for safe and efficient interaction planning: A shielding-aware dual control approach. The International Journal of Robotics Research 43, 9 (2024), 1382–1408

work page 2024
[17]

Haimin Hu, Zixu Zhang, Kensuke Nakamura, Andrea Bajcsy, and Jaime F Fisac. 2023. Deception game: Closing the safety-learning loop in interactive robot autonomy. arXiv preprint arXiv:2309.01267 (2023)

work page arXiv 2023
[18]

Rufus Isaacs. 1999. Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Courier Corporation

work page 1999
[19]

Frank Jiang, Glen Chou, Mo Chen, and Claire J Tomlin. 2016. Using neural networks to compute approximate and guaranteed feasible Hamilton-Jacobi-Bellman PDE solutions. arXiv preprint arXiv:1611.03158 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[20]

Morgan Jones and Matthew M Peet. 2020. Polynomial approximation of value functions and nonlinear controller design with performance bounds. arXiv preprint arXiv:2010.06828 (2020)

work page arXiv 2020
[21]

Kushal Kedia, Atiksh Bhardwaj, Prithwish Dan, and Sanjiban Choudhury. 2024. Interact: Transformer models for human intent prediction conditioned on robot actions. In 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 621–628

work page 2024
[22]

Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79–86

work page 1951
[23]

Forrest Laine, David Fridovich-Keil, Chih-Yuan Chiu, and Claire Tomlin. 2023. The computation of approximate generalized feedback Nash equilibria. SIAM Journal on Optimization 33, 1 (2023), 294–318

work page 2023
[24]

Tenavi Nakamura-Zimmerer, Qi Gong, and Wei Kang. 2020. A causality-free neural network method for high- dimensional Hamilton-Jacobi-Bellman equations. In American Control Conference. 787–793

work page 2020
[25]

Tenavi Nakamura-Zimmerer, Qi Gong, and Wei Kang. 2021. Adaptive deep learning for high-dimensional Hamilton– Jacobi–Bellman equations. SIAM Journal on Scientific Computing 43, 2 (2021), A1221–A1247

work page 2021
[26]

Youngim Nam and Cheolhyeon Kwon. 2024. Active inference-based planning for safe human-robot interaction: Concurrent consideration of human characteristic and rationality. IEEE robotics and automation letters 9, 8 (2024), 7086–7093

work page 2024
[27]

Sagar Parekh, Lauren Bramblett, Nicola Bezzo, and Dylan P Losey. 2025. Using high-level patterns to estimate how humans predict a robot will behave. In 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 16947–16954

work page 2025
[28]

Sagar Parekh and Dylan P Losey. 2023. Learning latent representations to co-adapt to humans. Autonomous Robots 47, 6 (2023), 771–796

work page 2023
[29]

Shahabedin Sagheb, Sagar Parekh, Ravi Pandya, Ye-Ji Mun, Katherine Driggs-Campbell, Andrea Bajcsy, and Dylan P Losey. 2025. A unified framework for robots that influence humans over long-term interaction. arXiv preprint arXiv:2503.14633 (2025)

work page arXiv 2025
[30]

Maurice Sion. 1958. On general minimax theorems. (1958). Safe Interactions via Monte Carlo Linear-Quadratic Games 19

work page 1958
[31]

Oliver Slumbers, David Henry Mguni, Stefano B Blumberg, Stephen Marcus Mcaleer, Yaodong Yang, and Jun Wang

work page
[32]

InInternational Conference on Machine Learning

A game-theoretic framework for managing risk in multi-agent systems. InInternational Conference on Machine Learning. 32059–32087

work page
[33]

Alan Wilbor Starr and Yu-Chi Ho. 1969. Nonzero-sum differential games. Journal of Optimization Theory and Applications 3 (1969), 184–206

work page 1969
[34]

Ran Tian, Liting Sun, Andrea Bajcsy, Masayoshi Tomizuka, and Anca D Dragan. 2022. Safety assurances for human- robot interaction via confidence-aware game-theoretic human models. In IEEE International Conference on Robotics and Automation. 11229–11235

work page 2022
[35]

Luke Tierney. 1994. Markov chains for exploring posterior distributions. the Annals of Statistics (1994), 1701–1728

work page 1994
[36]

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261–272

work page 2020
[37]

Kim P Wabersich, Andrew J Taylor, Jason J Choi, Koushil Sreenath, Claire J Tomlin, Aaron D Ames, and Melanie N Zeilinger. 2023. Data-driven safety filters: Hamilton-jacobi reachability, control barrier functions, and predictive methods for uncertain systems. IEEE Control Systems Magazine 43, 5 (2023), 137–177

work page 2023
[38]

Mingyu Wang, Negar Mehr, Adrien Gaidon, and Mac Schwager. 2020. Game-theoretic planning for risk-aware interactive agents. In IEEE/RSJ International Conference on Intelligent Robots and Systems

work page 2020
[39]

Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, and Niao He. 2023. Learning zero-sum linear quadratic games with improved sample complexity. In IEEE Conference on Decision and Control. 2602–2609

work page 2023

[1] [1]

Tianjiao An, Xinye Zhu, Mingchao Zhu, Bing Ma, and Bo Dong. 2023. Fuzzy logic nonzero-sum game-based distributed approximated optimal control of modular robot manipulators with human-robot collaboration. Neurocomputing 543 (2023), 126276

work page 2023

[2] [2]

Christophe Andrieu and Éric Moulines. 2006. On the ergodicity properties of some adaptive MCMC algorithms. The Annals of Applied Probability 16, 1 (2006), 1462–1505

work page 2006

[3] [3]

Andrea Bajcsy, Somil Bansal, Eli Bronstein, Varun Tolani, and Claire J Tomlin. 2019. An efficient reachability-based framework for provably safe autonomous navigation in unknown environments. In IEEE Conference on Decision and Control. 1758–1765

work page 2019

[4] [4]

Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D Dragan, and Claire J Tomlin. 2020. A Hamilton-Jacobi reachability- based framework for predicting and analyzing human motion for safe planning. In IEEE International Conference on Robotics and Automation. 7149–7155

work page 2020

[5] [5]

Somil Bansal, Mo Chen, Sylvia Herbert, and Claire J Tomlin. 2017. Hamilton-Jacobi reachability: A brief overview and recent advances. In IEEE Annual Conference on Decision and Control. 2242–2253. 18 Benjamin A. Christie and Dylan P. Losey

work page 2017

[6] [6]

Somil Bansal and Claire J Tomlin. 2021. Deepreach: A deep learning approach to high-dimensional reachability. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1817–1824

work page 2021

[7] [7]

Tamer Başar and Geert Jan Olsder. 1998. Dynamic Noncooperative Game Theory. SIAM

work page 1998

[8] [8]

Siddhartha Chib and Edward Greenberg. 1995. Understanding the metropolis-hastings algorithm. The american statistician 49, 4 (1995), 327–335

work page 1995

[9] [9]

Christian M Chilan and Bruce A Conway. 2020. Optimal nonlinear control using Hamilton–Jacobi–Bellman viscosity solutions on unstructured grids. Journal of Guidance, Control, and Dynamics (2020)

work page 2020

[10] [10]

Benjamin A Christie and Dylan P Losey. 2024. LIMIT: Learning interfaces to maximize information transfer. ACM Transactions on Human-Robot Interaction 13, 4 (2024), 1–26

work page 2024

[11] [11]

Jaime F Fisac, Neil F Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, and Claire J Tomlin. 2019. Bridging hamilton- jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 8550–8556

work page 2019

[12] [12]

David Fridovich-Keil, Andrea Bajcsy, Jaime F Fisac, Sylvia L Herbert, Steven Wang, Anca D Dragan, and Claire J Tomlin. 2020. Confidence-aware motion prediction for real-time collision avoidance. The International Journal of Robotics Research 39, 2-3 (2020), 250–265

work page 2020

[13] [13]

David Fridovich-Keil, Ellis Ratner, Lasse Peters, Anca D Dragan, and Claire J Tomlin. 2020. Efficient iterative linear- quadratic approximations for nonlinear multi-player general-sum differential games. InIEEE International Conference on Robotics and Automation. 1475–1481

work page 2020

[14] [14]

Joshua Hoegerman and Dylan Losey. 2023. Reward learning with intractable normalizing functions. IEEE Robotics and Automation Letters 8, 11 (2023), 7511–7518

work page 2023

[15] [15]

Kai-Chieh Hsu, Haimin Hu, and Jaime F Fisac. 2023. The safety filter: A unified view of safety-critical control in autonomous systems. Annual Review of Control, Robotics, and Autonomous Systems 7 (2023)

work page 2023

[16] [16]

Haimin Hu, David Isele, Sangjae Bae, and Jaime F Fisac. 2024. Active uncertainty reduction for safe and efficient interaction planning: A shielding-aware dual control approach. The International Journal of Robotics Research 43, 9 (2024), 1382–1408

work page 2024

[17] [17]

Haimin Hu, Zixu Zhang, Kensuke Nakamura, Andrea Bajcsy, and Jaime F Fisac. 2023. Deception game: Closing the safety-learning loop in interactive robot autonomy. arXiv preprint arXiv:2309.01267 (2023)

work page arXiv 2023

[18] [18]

Rufus Isaacs. 1999. Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Courier Corporation

work page 1999

[19] [19]

Frank Jiang, Glen Chou, Mo Chen, and Claire J Tomlin. 2016. Using neural networks to compute approximate and guaranteed feasible Hamilton-Jacobi-Bellman PDE solutions. arXiv preprint arXiv:1611.03158 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[20] [20]

Morgan Jones and Matthew M Peet. 2020. Polynomial approximation of value functions and nonlinear controller design with performance bounds. arXiv preprint arXiv:2010.06828 (2020)

work page arXiv 2020

[21] [21]

Kushal Kedia, Atiksh Bhardwaj, Prithwish Dan, and Sanjiban Choudhury. 2024. Interact: Transformer models for human intent prediction conditioned on robot actions. In 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 621–628

work page 2024

[22] [22]

Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79–86

work page 1951

[23] [23]

Forrest Laine, David Fridovich-Keil, Chih-Yuan Chiu, and Claire Tomlin. 2023. The computation of approximate generalized feedback Nash equilibria. SIAM Journal on Optimization 33, 1 (2023), 294–318

work page 2023

[24] [24]

Tenavi Nakamura-Zimmerer, Qi Gong, and Wei Kang. 2020. A causality-free neural network method for high- dimensional Hamilton-Jacobi-Bellman equations. In American Control Conference. 787–793

work page 2020

[25] [25]

Tenavi Nakamura-Zimmerer, Qi Gong, and Wei Kang. 2021. Adaptive deep learning for high-dimensional Hamilton– Jacobi–Bellman equations. SIAM Journal on Scientific Computing 43, 2 (2021), A1221–A1247

work page 2021

[26] [26]

Youngim Nam and Cheolhyeon Kwon. 2024. Active inference-based planning for safe human-robot interaction: Concurrent consideration of human characteristic and rationality. IEEE robotics and automation letters 9, 8 (2024), 7086–7093

work page 2024

[27] [27]

Sagar Parekh, Lauren Bramblett, Nicola Bezzo, and Dylan P Losey. 2025. Using high-level patterns to estimate how humans predict a robot will behave. In 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 16947–16954

work page 2025

[28] [28]

Sagar Parekh and Dylan P Losey. 2023. Learning latent representations to co-adapt to humans. Autonomous Robots 47, 6 (2023), 771–796

work page 2023

[29] [29]

Shahabedin Sagheb, Sagar Parekh, Ravi Pandya, Ye-Ji Mun, Katherine Driggs-Campbell, Andrea Bajcsy, and Dylan P Losey. 2025. A unified framework for robots that influence humans over long-term interaction. arXiv preprint arXiv:2503.14633 (2025)

work page arXiv 2025

[30] [30]

Maurice Sion. 1958. On general minimax theorems. (1958). Safe Interactions via Monte Carlo Linear-Quadratic Games 19

work page 1958

[31] [31]

Oliver Slumbers, David Henry Mguni, Stefano B Blumberg, Stephen Marcus Mcaleer, Yaodong Yang, and Jun Wang

work page

[32] [32]

InInternational Conference on Machine Learning

A game-theoretic framework for managing risk in multi-agent systems. InInternational Conference on Machine Learning. 32059–32087

work page

[33] [33]

Alan Wilbor Starr and Yu-Chi Ho. 1969. Nonzero-sum differential games. Journal of Optimization Theory and Applications 3 (1969), 184–206

work page 1969

[34] [34]

Ran Tian, Liting Sun, Andrea Bajcsy, Masayoshi Tomizuka, and Anca D Dragan. 2022. Safety assurances for human- robot interaction via confidence-aware game-theoretic human models. In IEEE International Conference on Robotics and Automation. 11229–11235

work page 2022

[35] [35]

Luke Tierney. 1994. Markov chains for exploring posterior distributions. the Annals of Statistics (1994), 1701–1728

work page 1994

[36] [36]

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261–272

work page 2020

[37] [37]

Kim P Wabersich, Andrew J Taylor, Jason J Choi, Koushil Sreenath, Claire J Tomlin, Aaron D Ames, and Melanie N Zeilinger. 2023. Data-driven safety filters: Hamilton-jacobi reachability, control barrier functions, and predictive methods for uncertain systems. IEEE Control Systems Magazine 43, 5 (2023), 137–177

work page 2023

[38] [38]

Mingyu Wang, Negar Mehr, Adrien Gaidon, and Mac Schwager. 2020. Game-theoretic planning for risk-aware interactive agents. In IEEE/RSJ International Conference on Intelligent Robots and Systems

work page 2020

[39] [39]

Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, and Niao He. 2023. Learning zero-sum linear quadratic games with improved sample complexity. In IEEE Conference on Decision and Control. 2602–2609

work page 2023