AI Coaching for Accelerating Human Skill Development with Reinforcement Learning
Pith reviewed 2026-06-25 21:26 UTC · model grok-4.3
The pith
An AI coach trained via reinforcement learning accelerates human motor-skill development by strategically scaffolding then withdrawing assistance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We formalize the interactive AI coaching process as a non-cooperative dynamic game in which the learner optimizes task performance while the coach targets the learner's independent competence. Building on this formalism, we develop a reinforcement learning framework combining adaptive shared control with probabilistic models of the coach's causal influence on skill evolution, enabling tractable training of coaching policies.
What carries the argument
Reinforcement learning framework that pairs adaptive shared control with probabilistic models of the coach's causal influence on skill evolution.
If this is right
- Coaching policies become trainable in a tractable way once the game and probabilistic influence models are in place.
- Human learners achieve measurable gains in independent task performance after training with the coach.
- Over-reliance and skill atrophy are reduced because assistance is withdrawn when the learner can succeed alone.
- The same formalism applies to other embodied motor tasks beyond drone racing.
Where Pith is reading between the lines
- The game-theoretic separation of objectives may extend to non-motor coaching domains such as decision training or language-skill practice.
- If the probabilistic models capture real causal effects, they could be used to audit whether an AI system is truly promoting independence rather than dependence.
- The approach suggests a design pattern for any shared-control system: optimize the human's future autonomy instead of joint performance alone.
Load-bearing premise
Effective coaching requires strategic scaffolding and stepping back aligned with the learner's capability, allowing productive failures that drive learning.
What would settle it
A replication of the N=33 drone-racing study in which participants trained by the RL coach show no faster gains in independent lap times or success rates than participants trained by the state-of-the-art baselines.
Figures
read the original abstract
AI copilots can substantially boost human performance through shared control, but excessive assistance can induce over-reliance and skill atrophy. This paper studies how an embodied AI agent can act as a coach that accelerates human motor-skill development. We argue that effective coaching requires strategic scaffolding and stepping back that are aligned with the learner's capability, allowing productive failures that drive learning. We formalize the interactive AI coaching process as a non-cooperative dynamic game in which the learner optimizes task performance while the coach targets the learner's independent competence. Building on this formalism, we develop a reinforcement learning framework combining adaptive shared control with probabilistic models of the coach's causal influence on skill evolution, enabling tractable training of coaching policies. A comprehensive user study (N=33) on first-person-view drone racing shows significant gains in human learning outcomes over state-of-the-art AI coaching baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formalizes the AI coaching process as a non-cooperative dynamic game in which the learner optimizes task performance while the coach targets the learner's independent competence. Building on this, it develops an RL framework that combines adaptive shared control with probabilistic models of the coach's causal influence on skill evolution. A user study with N=33 participants on first-person-view drone racing is reported to show significant gains in human learning outcomes over state-of-the-art AI coaching baselines.
Significance. If the empirical results are substantiated, the work offers a principled game-theoretic and RL-based approach to embodied coaching that could reduce over-reliance while accelerating motor skill acquisition. The integration of adaptive shared control with causal skill-evolution models is a technical contribution that aligns with established ideas in motor learning and human-AI interaction.
major comments (1)
- [Abstract and User Study section] Abstract and User Study section: the central empirical claim of 'significant gains' from the N=33 drone-racing study is load-bearing, yet the manuscript provides no description of study design (randomization, within/between-subjects structure), statistical methods, error bars or confidence intervals, baseline implementations, or exact RL policy training details. This prevents evaluation of whether the data support the claimed superiority of the proposed framework.
minor comments (2)
- Clarify the precise definition of 'independent competence' used as the coach's objective and how it is measured in the user study.
- Ensure the probabilistic causal model of skill evolution is accompanied by an explicit statement of its assumptions and identifiability conditions.
Simulated Author's Rebuttal
We thank the referee for the thorough review and for highlighting the need for greater transparency in the empirical evaluation. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract and User Study section] Abstract and User Study section: the central empirical claim of 'significant gains' from the N=33 drone-racing study is load-bearing, yet the manuscript provides no description of study design (randomization, within/between-subjects structure), statistical methods, error bars or confidence intervals, baseline implementations, or exact RL policy training details. This prevents evaluation of whether the data support the claimed superiority of the proposed framework.
Authors: We agree that the current manuscript omits critical methodological details required to evaluate the user-study results. In the revised version we will expand the User Study section (and update the abstract if space permits) to report: (i) the randomized between-subjects design with three conditions and the randomization procedure; (ii) the full statistical pipeline, including the mixed-effects model, post-hoc tests, and multiple-comparison correction; (iii) error bars or confidence intervals on all reported figures; (iv) precise implementation details of the two state-of-the-art baselines (including any hyper-parameter matching); and (v) the exact RL training protocol for the coaching policies (environment, reward shaping, network architecture, and training hyperparameters). These additions will allow readers to assess whether the reported gains are supported by the data. revision: yes
Circularity Check
No significant circularity
full rationale
The paper formalizes the coaching process as a non-cooperative dynamic game between learner and coach, then builds an RL framework with adaptive shared control and probabilistic causal models of skill evolution. These steps rely on standard game theory and RL techniques without any visible reduction of predictions to fitted parameters by construction, self-definitional loops, or load-bearing self-citations that collapse the central claim. The N=33 user study on drone racing provides external empirical grounding independent of the formalism. No steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The interactive AI coaching process can be formalized as a non-cooperative dynamic game in which the learner optimizes task performance while the coach targets the learner's independent competence.
Reference graph
Works this paper leans on
-
[1]
M. Kapur. Productive failure.Cognition and instruction, 26(3):379–424, 2008
2008
-
[2]
Metcalfe
J. Metcalfe. Learning from errors.Annual review of psychology, 68(1):465–489, 2017
2017
-
[3]
P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, et al. Outracing champion Gran Tur- ismo drivers with deep reinforcement learning.Nature, 602(7896):223–228, 2022. doi: 10.1038/s41586-021-04357-7
-
[4]
Kaufmann, L
E. Kaufmann, L. Bauersfeld, A. Loquercio, M. M ¨uller, V . Koltun, and D. Scaramuzza. Champion-level drone racing using deep reinforcement learning.Nature, 620(7976):982–987,
-
[5]
doi:10.1038/s41586-023-06419-4
-
[6]
Learning complex dexterous manipulation with deep reinforcement learning and demonstrations
S. Reddy, A. D. Dragan, and S. Levine. Shared autonomy via deep reinforcement learning. In Proc. Robotics: Science and Systems, 2018. doi:10.15607/RSS.2018.XIV .005
-
[7]
DeCastro, A
J. DeCastro, A. Silva, D. Gopinath, E. Sumner, T. M. Balch, L. Dees, and G. Rosman. Dream- ing to assist: Learning to align with human objectives for shared control in high-speed rac- ing. InConf. Robot Learning, 2024. URLhttps://proceedings.mlr.press/v270/ decastro25a.html
2024
-
[8]
Srivastava, R
M. Srivastava, R. Iranmanesh, Y . Cui, D. Gopinath, E. S. Sumner, A. Silva, L. Dees, G. Ros- man, and D. Sadigh. Shared autonomy for proximal teaching. In2025 20th ACM/IEEE Inter- national Conference on Human-Robot Interaction (HRI), pages 232–241. IEEE, 2025
2025
-
[9]
D. D. Oh, J. Lidard, H. Hu, H. Sinhmar, E. Lazarski, D. Gopinath, E. S. Sumner, J. A. De- Castro, G. Rosman, N. E. Leonard, et al. Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports.Proc. Robotics: Science and Systems, 2025. doi:10.15607/RSS.2025.XXI.093
-
[10]
S. Sha, Y . Wang, B. Huang, A. Loquercio, and Y . Li. Efficient and reliable teleoperation through real-to-sim-to-real shared autonomy.arXiv preprint arXiv:2603.17016, 2026
arXiv 2026
-
[11]
Bastani, O
H. Bastani, O. Bastani, A. Sungu, H. Ge, ¨O. Kabakcı, and R. Mariman. Generative AI can harm learning.The Wharton School Research Paper, 2024
2024
-
[12]
B. N. Macnamara, I. Berber, M. C. C ¸ avus ¸o˘glu, E. A. Krupinski, N. Nallapareddy, N. E. Nelson, P. J. Smith, A. L. Wilson-Delfosse, and S. Ray. Does using artificial intelligence assistance accelerate skill decay and hinder skill development without performers’ awareness?Cognitive Research: Principles and Implications, 9(1):46, 2024
2024
-
[13]
J. Kulveit, R. Douglas, N. Ammann, D. Turan, D. Krueger, and D. Duvenaud. Gradual dis- empowerment: Systemic existential risks from incremental AI development.arXiv preprint arXiv:2501.16946, 2025
arXiv 2025
-
[14]
Backman, D
K. Backman, D. Kuli ´c, and H. Chung. Reinforcement learning for shared autonomy drone landings.Autonomous Robots, 47(8):1419–1438, 2023
2023
-
[15]
C. Shen, S. Yu, Y . Weng, H. Ma, C. Li, H. Yasuda, J. Dallas, M. Thompson, J. Subosits, and T. Ersal. Cyber racing coach: A haptic shared control framework for teaching advanced driving skills.arXiv preprint arXiv:2509.20653, 2025
arXiv 2025
-
[16]
L. S. Vygotsky, M. Cole, V . John-Steiner, S. Scribner, and E. Souberman. The development of higher psychological processes, 1978. 9
1978
-
[17]
D. Sadigh, N. Landolfi, S. S. Sastry, S. A. Seshia, and A. D. Dragan. Planning for cars that coordinate with people: leveraging effects on human actions for planning and active infor- mation gathering over human internal state.Autonomous Robots, 42(7):1405–1426, 2018. doi:10.1007/s10514-018-9746-1
-
[18]
W. Schwarting, A. Pierson, S. Karaman, and D. Rus. Stochastic dynamic games in belief space. IEEE Transactions on Robotics, 37(6):2157–2172, 2021. doi:10.1109/TRO.2021.3075376
-
[19]
H. Hu, Z. Zhang, K. Nakamura, A. Bajcsy, and J. F. Fisac. Deception game: Closing the safety-learning loop in interactive robot autonomy. InConf. Robot Learning, volume 229 ofProceedings of Machine Learning Research, pages 3830–3850, 11 2023. URLhttps: //proceedings.mlr.press/v229/hu23b.html
2023
-
[20]
A. Fern, S. Natarajan, K. Judah, and P. Tadepalli. A decision-theoretic model of assistance. Journal of Artificial Intelligence Research, 50:71–104, 2014. doi:https://doi.org/10.1613/jair. 4213
-
[21]
Hadfield-Menell, S
D. Hadfield-Menell, S. J. Russell, P. Abbeel, and A. Dragan. Cooperative inverse reinforcement learning. InAdvances in Neural Information Processing Systems, pages 3909–3917, 2016
2016
-
[22]
J. F. Fisac, M. A. Gates, J. B. Hamrick, C. Liu, D. Hadfield-Menell, M. Palaniappan, D. Malik, S. S. Sastry, T. L. Griffiths, and A. D. Dragan. Pragmatic-pedagogic value alignment. In Robotics Research, pages 49–57. Springer, 2020
2020
-
[23]
C. Laidlaw, E. Bronstein, T. Guo, D. Feng, L. Berglund, J. Svegliato, S. Russell, and A. Dragan. Assistancezero: Scalably solving assistance games.arXiv preprint arXiv:2504.07091, 2025
arXiv 2025
-
[24]
E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observ- able stochastic games. InProc. AAAI Conf. Artificial Intelligence, volume 4, pages 709–715,
-
[25]
URLhttps://dl.acm.org/doi/10.5555/1597148.1597262
-
[26]
T. Basar and G. J. Olsder.Dynamic Noncooperative Game Theory. SIAM, London, 1988. URLhttps://epubs.siam.org/doi/book/10.1137/1.9781611971132
-
[27]
H. A. Simon. Bounded rationality.Utility and probability, pages 15–18, 1990
1990
-
[28]
D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of markov decision processes.Mathematics of operations research, 27(4):819–840, 2002
2002
-
[29]
V . Pasumarti, L. Bianchi, and A. Loquercio. Agile flight emerges from multi-agent competitive racing.arXiv preprint arXiv:2512.11781, 2025
arXiv 2025
-
[30]
R. D. Luce.Individual Choice Behavior. John Wiley, Oxford, England, 1959. URLhttps: //psycnet.apa.org/fulltext/2013-44649-000-FRM.pdf
1959
-
[31]
C. M. Bishop.Pattern Recognition and Machine Learning. Springer, 2006. URLhttps: //link.springer.com/book/9780387310732
arXiv 2006
-
[32]
Gopinath, X
D. Gopinath, X. Cui, J. DeCastro, E. Sumner, J. Costa, H. Yasuda, A. Morgan, L. Dees, S. Chau, J. Leonard, et al. Computational teaching for driving via multi-task imitation learn- ing. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 7019–
-
[33]
Mazumdar, K
E. Mazumdar, K. Panaganti, and L. Shi. Tractable multi-agent reinforcement learning through behavioral economics. InThe Thirteenth International Conference on Learning Representa- tions, 2025. 10
2025
-
[34]
X. Liu, L. Peters, and J. Alonso-Mora. Learning to play trajectory games against opponents with unknown objectives.IEEE Robotics and Automation Letters, 2023. doi:10.1109/LRA. 2023.3280809
work page doi:10.1109/lra 2023
-
[35]
H. Hu, J. F. Fisac, N. E. Leonard, D. Gopinath, J. DeCastro, and G. Rosman. Think deep and fast: Learning Neural NOD from inverse dynamic games for split-second interactions. InProc. IEEE Conf. Robotics and Automation, 2025. doi:10.48550/arXiv.2406.09810
-
[36]
A. P. Jacob, D. J. Wu, G. Farina, A. Lerer, H. Hu, A. Bakhtin, J. Andreas, and N. Brown. Modeling strong and human-like gameplay with KL-regularized search. InInternational Con- ference on Machine Learning, pages 9695–9728. PMLR, 2022
2022
-
[37]
Nikolaidis, D
S. Nikolaidis, D. Hsu, and S. Srinivasa. Human-robot mutual adaptation in collaborative tasks: Models and experiments.Int. Journal of Robotics Research, 36(5-7):618–634, 2017
2017
-
[38]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017. URLhttps://arxiv.org/abs/1707.06347
Pith/arXiv arXiv 2017
-
[39]
A. T. Corbett and J. R. Anderson. Knowledge tracing: Modeling the acquisition of procedural knowledge.User modeling and user-adapted interaction, 4(4):253–278, 1994. doi:10.1007/ BF01099821
1994
-
[40]
Piech, J
C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and J. Sohl- Dickstein. Deep knowledge tracing.Advances in neural information processing systems, 28, 2015. URLhttps://proceedings.neurips.cc/paper/2015/hash/ bac9162b47c56fc8a4d2a519803d51b3-Abstract.html
2015
-
[41]
yaw slightly left
M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, et al. Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters, 8(6):3740–3747, 2023. A Coaching Policy Training Details A.1 Drone Racing Simulation We implement the FPV drone-racing task as a vect...
2023
-
[42]
No Experience: Have not operated a drone; have not played drone racing games, flight simula- tion games, or similar games using a controller
-
[43]
Casual Experience: Have occasionally operated consumer drones in low-speed scenarios (e.g., photography), or have played flight simulation games with a controller
-
[44]
Regular Experience: Have regularly operated consumer drones, or have regularly played flight simulator games using a controller
-
[45]
Has not competed in organized races
Extensive Experience: Have regularly flown FPV drones, or have regularly practiced drone racing simulators to a proficient level (e.g., completing technical tracks cleanly at pace). Has not competed in organized races
-
[46]
∞X t=0 γt ¯rC(θt) # ≥E ˜πC
Competitive Experience: Have competed in organized drone racing events, or regularly trains on drone racing simulators. Based on the participant feedback, our pool consisted primarily of novices: 61.1% reported No Experience, 36.1% reported Casual Experience, and 2.8% reported Regular Experience, with no participant reporting Extensive or Competitive Expe...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.