pith. sign in

arxiv: 2603.12583 · v2 · submitted 2026-03-13 · 💻 cs.RO · cs.HC· cs.SY· eess.SY· math.OC

Skill-informed Data-driven Haptic Nudges for High-dimensional Human Motor Learning

Pith reviewed 2026-05-15 12:30 UTC · model grok-4.3

classification 💻 cs.RO cs.HCcs.SYeess.SYmath.OC
keywords haptic nudgesmotor learningIOHMMPOMDPexoskeletonskill modelingvibrotactile feedbackhigh-dimensional tasks
0
0 comments X

The pith

An IOHMM of latent skill evolution feeds a POMDP that computes haptic nudge policies, producing faster gains in movement efficiency and accuracy than heuristic feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a data-driven method to deliver vibrotactile nudges that adapt to a learner's hidden skill level while performing a novel high-dimensional motor task. It first fits an input-output hidden Markov model to separate unobservable skill changes from measurable movement outcomes under nudge inputs. This model then supplies the belief state for a POMDP whose optimal policy selects nudges to minimize long-term performance cost. In a thirty-person study with a hand exoskeleton, the resulting policy produced statistically faster improvements in task efficiency and endpoint precision than either fixed heuristic nudges or no feedback. The same group also converged more quickly on efficient low-dimensional movement synergies.

Core claim

By treating skill as a latent Markov state whose transitions are driven by haptic inputs and whose emissions are observed performance metrics, the input-output hidden Markov model supplies a predictive distribution over future skill. Solving the corresponding POMDP yields a policy that delivers nudges only when they are expected to reduce cumulative cost and advance the learner to higher-skill regimes. Human experiments confirm that this policy yields accelerated movement efficiency, improved endpoint accuracy, and quicker emergence of low-dimensional synergies relative to heuristic or null feedback conditions.

What carries the argument

The input-output hidden Markov model that decouples latent skill evolution from observable performance under haptic inputs, which parametrizes the POMDP solved for the optimal nudging policy.

If this is right

  • The POMDP policy minimizes long-term performance cost while guiding learners toward higher-skill states.
  • Trained participants exhibit accelerated movement efficiency and endpoint accuracy compared with heuristic or null feedback.
  • Synergy analysis indicates faster discovery of efficient low-dimensional motor representations.
  • The framework is demonstrated on a high-dimensional redundant task delivered through a hand exoskeleton.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same modeling approach could be applied to other sensory channels such as visual or force feedback for rehabilitation settings.
  • Online re-estimation of the IOHMM parameters during a session could produce individualized nudge policies without retraining from scratch.
  • Tasks with still higher dimensionality would test whether the observed rapid synergy compression generalizes or saturates.

Load-bearing premise

The input-output hidden Markov model accurately decouples and predicts how latent skill evolves separately from directly observed performance when haptic nudges are applied in the target task.

What would settle it

A replication study in which participants trained with the POMDP policy show no significant advantage in movement efficiency or endpoint accuracy over the heuristic-feedback control group would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2603.12583 by Ankur Kamboj, Rajiv Ranganathan, Vaibhav Srivastava, Xiaobo Tan.

Figure 1
Figure 1. Figure 1: Target Capture Task: (a) shows a participant performing our target capture task with the SenseGlove DK1 exoskeleton strapped to their right hand. (b) and (c) show the cursor trajectories during the first and eighth blocks of the game with the red dots representing the target locations. The trajectories get straighter as the participant learns to control the cursor by the end of the experiment session. (d) … view at source ↗
Figure 2
Figure 2. Figure 2: Simplified IOHMM: Human motor learning behavior modeled as an IOHMM shows how the hidden motor skill state hk transitions to hk+1 under the effect of the input xk = {uk : slopek , ak : nudgek }, leading to the emissions ok = {REk, SoTk} under the influence of the same input xk. The evolution of the latent motor skill state from trial k to k + 1 is governed by the transition probability distribution P(hk+1|… view at source ↗
Figure 3
Figure 3. Figure 3: Estimated Behavioral Model: (a) Predicted RE and SoT means out of the emission model for the ordered skill states, and (b) State Transition Matrices extracted from the estimated human motor learning behavioral model show how participants tend to transition from low latent skill states to high latent skill states under the effect of the input slope angles uk = −1.57, −1.11 (corresponding to target pairs {0,… view at source ↗
Figure 4
Figure 4. Figure 4: Movement Efficiency: (a) Mean SoT curves (with 95% confidence intervals) of the three group participants, and (b) the mean number of trials (along with the scatter plot taken across participants) to converge to a specific SoT threshold under the three nudging policies shows fastest convergence for participants trained on POMDP-based optimal nudging policy, followed by the heuristic policy. The low p-values… view at source ↗
Figure 5
Figure 5. Figure 5: Endpoint Accuracy: (a) Mean RE curves (with 95% confidence intervals) of the three group participants, and (b) the mean number of trials (along with the scatter plot taken across participants) to converge to different RE thresholds under the three nudging policies shows fastest convergence for participants trained on POMDP-based optimal nudging policy. The low two-tailed test p-values from the linear mixed… view at source ↗
Figure 6
Figure 6. Figure 6: Synergy Analysis: (a) Variance accounted for (VAF) by the first four principal components (PC), and (b) the number of PCs required to explain more than 90% variance in joint data across the three participant groups reveal that the POMDP group achieved a higher VAF by top-PCs (especially in later phases) and required fewer PCs to explain > 90% variance in the data on average, signifying a more rapid discove… view at source ↗
Figure 7
Figure 7. Figure 7: Learning Dynamics and Convergence Analysis: Expected latent state evolution across trials and the number of trials to reach a steady state mastery level (E[sk] < 3) reveals expedited convergence of POMDP group to better performing states and a statistically significantly lower number of trials to mastery, compared to heuristic and control groups. that these participants not only performed better but had se… view at source ↗
read the original abstract

In this work, we propose a data-driven framework to design optimal haptic nudge feedback leveraging the learner's estimated skill to address the challenge of learning a novel motor task in a high-dimensional, redundant motor space. A nudge is a series of vibrotactile feedback delivered to the learner to encourage motor movements that aid in task completion. We first model the stochastic dynamics of human motor learning under haptic nudges using an Input-Output Hidden Markov Model (IOHMM), which explicitly decouples latent skill evolution from observable performance measures. Leveraging this predictive model, we formulate the haptic nudge feedback design problem as a Partially Observable Markov Decision Process (POMDP). This allows us to derive an optimal nudging policy that minimizes long-term performance cost and implicitly guides the learner toward superior skill states. We validate our approach through a human participant study (N=30) involving a high-dimensional motor task rendered through a hand exoskeleton. Results demonstrate that participants trained with the POMDP-derived policy exhibit significantly accelerated movement efficiency and endpoint accuracy compared to groups receiving heuristic-based feedback or no feedback. Furthermore, synergy analysis reveals that the POMDP group discovers efficient low-dimensional motor representations more rapidly.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a data-driven framework for haptic nudge feedback in high-dimensional motor learning tasks. It models stochastic skill dynamics under nudges via an Input-Output Hidden Markov Model (IOHMM) that decouples latent skill states from observables, then casts policy design as a POMDP to minimize long-term cost and guide learners to better skill states. Validation is provided by a 30-participant human study with a hand-exoskeleton task, claiming statistically significant gains in movement efficiency and endpoint accuracy for the POMDP-derived policy versus heuristic or no-feedback controls, plus faster emergence of efficient motor synergies.

Significance. If the results hold after addressing the noted gaps, the work would provide a principled, model-based method for adaptive haptic assistance that integrates predictive skill modeling with decision-theoretic planning. This could advance personalized feedback systems in rehabilitation robotics and motor training, particularly for redundant high-dimensional spaces where heuristic approaches are common.

major comments (2)
  1. [Abstract and Results] Abstract and Results: The central claim of statistically significant acceleration in efficiency and accuracy for the POMDP group is reported without specifying the exact statistical tests, effect sizes, confidence intervals, baseline comparisons, or participant exclusion criteria. These details are load-bearing for evaluating whether the observed differences support the claimed mechanism rather than unmodeled factors.
  2. [Methods (IOHMM formulation)] Methods (IOHMM formulation): The IOHMM is presented as accurately decoupling latent skill evolution from observable performance measures in a high-dimensional redundant space, yet no validation is provided (e.g., recovery of known ground-truth states in simulation, comparison to alternative latent-variable models, or sensitivity analysis). Because the POMDP policy is solved on the resulting belief space, misspecification here directly undermines the optimality guarantee and the interpretation of the human-study gains.
minor comments (1)
  1. [Synergy analysis] Synergy analysis section: The claim that the POMDP group discovers efficient low-dimensional motor representations more rapidly would be strengthened by reporting explicit quantitative metrics (e.g., synergy dimensionality or variance explained) together with statistical comparisons across the three groups.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects of statistical reporting and model validation. We address each major comment below and will revise the manuscript to strengthen these elements while preserving the core contributions.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results: The central claim of statistically significant acceleration in efficiency and accuracy for the POMDP group is reported without specifying the exact statistical tests, effect sizes, confidence intervals, baseline comparisons, or participant exclusion criteria. These details are load-bearing for evaluating whether the observed differences support the claimed mechanism rather than unmodeled factors.

    Authors: We agree these details are necessary for full evaluation. In the revised manuscript we will explicitly state the statistical tests (independent-samples t-tests with Bonferroni correction for multiple comparisons), report effect sizes (Cohen's d), 95% confidence intervals for all key metrics, confirm the baselines as the heuristic-feedback and no-feedback groups described in the methods, and detail participant exclusion criteria (no participants were excluded beyond pre-registered criteria for technical failures). These additions will clarify that the reported gains align with the proposed skill-modeling mechanism. revision: yes

  2. Referee: [Methods (IOHMM formulation)] Methods (IOHMM formulation): The IOHMM is presented as accurately decoupling latent skill evolution from observable performance measures in a high-dimensional redundant space, yet no validation is provided (e.g., recovery of known ground-truth states in simulation, comparison to alternative latent-variable models, or sensitivity analysis). Because the POMDP policy is solved on the resulting belief space, misspecification here directly undermines the optimality guarantee and the interpretation of the human-study gains.

    Authors: We recognize that explicit validation of the IOHMM would strengthen confidence in the latent-state decoupling and downstream POMDP policy. The current manuscript prioritizes the human-study results, but we will add a dedicated simulation subsection in the revision. This will include: (i) recovery of known ground-truth skill states from synthetic trajectories generated under the fitted IOHMM, (ii) quantitative comparison against a standard HMM baseline using log-likelihood and state-recovery accuracy, and (iii) sensitivity analysis on emission and transition parameters. These results will support the model's suitability for the POMDP formulation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper first fits an IOHMM to observed performance data under nudges to obtain a predictive model of latent skill evolution, then uses that fitted model as the transition and observation dynamics inside a POMDP whose solution yields the nudging policy. The policy is subsequently tested in an independent N=30 human study whose outcome (faster efficiency and accuracy gains) is reported as an empirical result rather than an algebraic identity. No equation in the provided derivation reduces the claimed policy optimality or the study outcome directly back to the input data by construction, and no load-bearing step relies on a self-citation whose content is itself unverified. The chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework depends on the domain assumption that human motor learning can be usefully represented as a latent Markov process whose observations are performance metrics, plus standard POMDP optimality under partial observability; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Human motor skill evolves as a latent stochastic process that can be decoupled from observable performance via an input-output hidden Markov model
    Core modeling premise stated in the abstract
  • standard math A POMDP formulation yields an optimal nudging policy that minimizes long-term performance cost
    Standard decision-theoretic assumption invoked to derive the policy

pith-pipeline@v0.9.0 · 5530 in / 1328 out tokens · 45925 ms · 2026-05-15T12:30:27.071769+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    Principles of sensorimotor learning,

    D. M. Wolpert, J. Diedrichsen, and J. R. Flanagan, “Principles of sensorimotor learning,” Nature Reviews Neuroscience, vol. 12, no. 12, p. 739, 2011

  2. [2]

    Theoretical models of motor control and motor learning,

    A. M. Haith and J. W. Krakauer, “Theoretical models of motor control and motor learning,” in Routledge handbook of motor control and motor learning . Routledge, 2013, pp. 16–37

  3. [3]

    Human sensorimotor learning: adaptation, skill, and beyond,

    J. W. Krakauer and P. Mazzoni, “Human sensorimotor learning: adaptation, skill, and beyond,” Current Opinion in Neurobiology , vol. 21, no. 4, pp. 636–644, 2011

  4. [4]

    Learning to be lazy: Exploiting redundancy in a novel task to minimize movement-related effort,

    R. Ranganathan, A. Adewuyi, and F. A. Mussa-Ivaldi, “Learning to be lazy: Exploiting redundancy in a novel task to minimize movement-related effort,” Journal of Neuroscience, vol. 33, no. 7, pp. 2754–2760, 2013

  5. [5]

    Review of control strategies for robotic movement training after neurologic injury,

    L. Marchal-Crespo and D. J. Reinkensmeyer, “Review of control strategies for robotic movement training after neurologic injury,” Journal of Neuroengineering and Rehabilitation , vol. 6, no. 1, p. 20, 2009

  6. [6]

    Motor learning perspectives on haptic training for the upper ex- tremities,

    C. K. Williams and H. Carnahan, “Motor learning perspectives on haptic training for the upper ex- tremities,” IEEE Transactions on Haptics , vol. 7, no. 2, pp. 240–250, 2014

  7. [7]

    Haptic-based neurorehabilitation in poststroke patients: A feasibility prospective mul- ticentre trial for robotics hand rehabilitation,

    A. Turolla, O. A. Daud Albasini, R. Oboe, M. Agostini, P. Tonin, S. Paolucci, G. Sandrini, A. Venneri, and L. Piron, “Haptic-based neurorehabilitation in poststroke patients: A feasibility prospective mul- ticentre trial for robotics hand rehabilitation,” Computational and Mathematical Methods in Medicine , vol. 2013, no. 1, p. 895492, 2013

  8. [8]

    Development of a novel haptic glove for improving finger dexterity in poststroke rehabilitation,

    C.-Y. Lin, C.-M. Tsai, P.-C. Shih, and H.-C. Wu, “Development of a novel haptic glove for improving finger dexterity in poststroke rehabilitation,” Technology and Health Care , vol. 24, no. 1_suppl, pp. S97–S103, 2016

  9. [9]

    Fingertip position and force control for dexterous manipulation through model-based control of hand-exoskeleton-environment,

    P. Esmatloo and A. D. Deshpande, “Fingertip position and force control for dexterous manipulation through model-based control of hand-exoskeleton-environment,” in 2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) . IEEE, 2020, pp. 994–1001

  10. [10]

    Haptic training: Which types facilitate (re)learning of which motor task and for whom? Answers by a review,

    E. Basalp, P. Wolf, and L. Marchal-Crespo, “Haptic training: Which types facilitate (re)learning of which motor task and for whom? Answers by a review,” IEEE Transactions on Haptics , vol. 14, no. 4, pp. 722–739, 2021

  11. [11]

    Augmented visual, auditory, haptic, and multimodal feedback in motor learning: A review,

    R. Sigrist, G. Rauter, R. Riener, and P. Wolf, “Augmented visual, auditory, haptic, and multimodal feedback in motor learning: A review,” Psychonomic Bulletin & Review , vol. 20, no. 1, pp. 21–53, 2013

  12. [12]

    Haptic feedback based on movement smoothness improves performance in a perceptual-motor task,

    J. L. Sullivan, S. Pandey, M. D. Byrne, and M. K. O’Malley, “Haptic feedback based on movement smoothness improves performance in a perceptual-motor task,” IEEE Transactions on Haptics , vol. 15, no. 2, pp. 382–391, 2021

  13. [13]

    Robotic assistance for training finger movement using a Hebbian model: A randomized controlled trial,

    J. B. Rowe, V. Chan, M. L. Ingemanson, S. C. Cramer, E. T. Wolbrecht, and D. J. Reinkensmeyer, “Robotic assistance for training finger movement using a Hebbian model: A randomized controlled trial,” Neurorehabilitation and Neural Repair , vol. 31, no. 8, pp. 769–780, 2017

  14. [14]

    On the role of knowledge of results in motor learning: Exploring the guidance hypothesis,

    T. D. Lee, M. A. White, and H. Carnahan, “On the role of knowledge of results in motor learning: Exploring the guidance hypothesis,” Journal of Motor Behavior , vol. 22, no. 2, pp. 191–208, 1990

  15. [15]

    Static and dynamic proprioceptive recognition through vibrotactile stimulation,

    L. Vargas, H. Huang, Y. Zhu, and X. Hu, “Static and dynamic proprioceptive recognition through vibrotactile stimulation,” Journal of Neural Engineering , vol. 18, no. 4, p. 046093, 2021

  16. [16]

    Exploration of vibrotactile biofeedback strategies to induce stance time asymmetries,

    R. Escamilla-Nunez, H. Sivasambu, and J. Andrysek, “Exploration of vibrotactile biofeedback strategies to induce stance time asymmetries,” Canadian Prosthetics & Orthotics Journal , vol. 5, no. 1, p. 36744, 2021. 18

  17. [17]

    Haptic nudging using a wearable device to promote upper limb activity during stroke rehabilitation: Exploring diurnal variation, repetition, and duration of effect,

    N. Signal, S. Olsen, U. Rashid, R. McLaren, A. Vandal, M. King, and D. Taylor, “Haptic nudging using a wearable device to promote upper limb activity during stroke rehabilitation: Exploring diurnal variation, repetition, and duration of effect,” Behavioral Sciences, vol. 13, no. 12, p. 995, 2023

  18. [18]

    Modeling the distinct phases of skill acquisition

    C. Tenison and J. R. Anderson, “Modeling the distinct phases of skill acquisition. ” Journal of Experi- mental Psychology: Learning, Memory, and Cognition , vol. 42, no. 5, p. 749, 2016

  19. [19]

    Hidden Markov model approach to skill learning and its application to telerobotics,

    J. Yang, Y. Xu, and C. S. Chen, “Hidden Markov model approach to skill learning and its application to telerobotics,” IEEE Transactions on Robotics and Automation , vol. 10, no. 5, pp. 621–631, 1994

  20. [20]

    A multilevel logistic hidden Markov model for learning under cognitive diagnosis,

    S. Zhang and H.-H. Chang, “A multilevel logistic hidden Markov model for learning under cognitive diagnosis,” Behavior Research Methods , vol. 52, no. 1, pp. 408–421, 2020

  21. [21]

    Classification of human learning stages via kernel distribution embeddings,

    M. S.-T. Yuh, K. R. Ortiz, K. S. Sommer-Kohrt, M. Oishi, and N. Jain, “Classification of human learning stages via kernel distribution embeddings,” IEEE Open Journal of Control Systems , vol. 3, pp. 102–117, 2024

  22. [22]

    Human Performance,

    P. M. Fitts and M. I. Posner, “Human Performance,” 1967

  23. [23]

    Modelling and evaluation of surgical performance using hidden markov models,

    G. Megali, S. Sinigaglia, O. Tonet, and P. Dario, “Modelling and evaluation of surgical performance using hidden markov models,” IEEE Transactions on Biomedical Engineering , vol. 53, no. 10, pp. 1911–1919, 2006

  24. [24]

    Determining novice and expert status in human– automation interaction through hidden Markov models,

    A. French, M. L. Cummings, H. Zhu, and M. Pajic, “Determining novice and expert status in human– automation interaction through hidden Markov models,” Applied Artificial Intelligence , vol. 38, no. 1, p. 2402174, 2024

  25. [25]

    Capturing skill state in curriculum learning for human skill acquisition,

    K. Ghonasgi, R. Mirsky, S. Narvekar, B. Masetty, A. M. Haith, P. Stone, and A. D. Deshpande, “Capturing skill state in curriculum learning for human skill acquisition,” in IEEE/RSJ International Conference on Intelligent Robots and Systems , 2021, pp. 771–776

  26. [26]

    De- tecting neural-state transitions using hidden Markov models for motor cortical prostheses,

    C. Kemere, G. Santhanam, B. M. Yu, A. Afshar, S. I. Ryu, T. H. Meng, and K. V. Shenoy, “De- tecting neural-state transitions using hidden Markov models for motor cortical prostheses,” Journal of Neurophysiology, vol. 100, no. 4, pp. 2441–2452, 2008

  27. [27]

    Bayesian multilevel hidden Markov models identify stable state dynamics in longitudinal recordings from macaque primary motor cortex,

    S. Kirchherr, S. Mildiner Moraga, G. Coudé, M. Bimbi, P. F. Ferrari, E. Aarts, and J. J. Bonaiuto, “Bayesian multilevel hidden Markov models identify stable state dynamics in longitudinal recordings from macaque primary motor cortex,” European Journal of Neuroscience , vol. 58, no. 3, pp. 2787–2806, 2023

  28. [28]

    Interacting adaptive processes with different timescales underlie short-term motor learning,

    M. A. Smith, A. Ghazizadeh, and R. Shadmehr, “Interacting adaptive processes with different timescales underlie short-term motor learning,” PLoS Biology, vol. 4, no. 6, p. e179, 2006

  29. [29]

    The dynamics of motor learning through the formation of internal models,

    C. Pierella, M. Casadio, F. A. Mussa-Ivaldi, and S. A. Solla, “The dynamics of motor learning through the formation of internal models,” PLoS Computational Biology , vol. 15, no. 12, p. e1007118, 2019

  30. [30]

    Human motor learning dynamics in high- dimensional tasks,

    A. Kamboj, R. Ranganathan, X. Tan, and V. Srivastava, “Human motor learning dynamics in high- dimensional tasks,” PLOS Computational Biology , vol. 20, no. 10, p. e1012455, 2024

  31. [31]

    A decision-theoretic approach in the design of an adaptive upper-limb stroke rehabilitation robot,

    R. Huq, P. Kan, R. Goetschalckx, D. Hébert, J. Hoey, and A. Mihailidis, “A decision-theoretic approach in the design of an adaptive upper-limb stroke rehabilitation robot,” in IEEE International Conference on Rehabilitation Robotics , 2011, pp. 1–8

  32. [32]

    Toward skill-informed haptic feedback for hu- man motor learning in high-dimensional de-novo tasks,

    A. Kamboj, R. Ranganathan, X. Tan, and V. Srivastava, “Toward skill-informed haptic feedback for hu- man motor learning in high-dimensional de-novo tasks,” in American Control Conference, New Orleans, LA, 2026, accepted for publication. 19

  33. [33]

    Remapping hand movements in a novel geometrical environment,

    K. M. Mosier, R. A. Scheidt, S. Acosta, and F. A. Mussa-Ivaldi, “Remapping hand movements in a novel geometrical environment,” Journal of Neurophysiology , vol. 94, no. 6, pp. 4362–4372, 2005

  34. [34]

    Input-output HMMs for sequence processing,

    Y. Bengio and P. Frasconi, “Input-output HMMs for sequence processing,” IEEE Transactions on Neural Networks, vol. 7, no. 5, pp. 1231–1249, 1996

  35. [35]

    IOHMM: Input Output Hidden Markov Model in Python,

    M. Yin, T. Silva, E. Denovellis, and A. Pozdnukhov, “IOHMM: Input Output Hidden Markov Model in Python,” https://github.com/Mogeng/IOHMM, 2017, version 0.0.3, BSD-3-Clause License

  36. [36]

    Multinomial logistic regression,

    C. Kwak and A. Clayton-Matthews, “Multinomial logistic regression,” Nursing Research, vol. 51, no. 6, pp. 404–410, 2002

  37. [37]

    A simple weight decay can improve generalization,

    A. Krogh and J. A. Hertz, “A simple weight decay can improve generalization,” in Proceedings of the 5th International Conference on Neural Information Processing Systems , 1991, pp. 950––957

  38. [38]

    Regularization and variable selection via the elastic net,

    H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society Series B: Statistical Methodology , vol. 67, no. 2, pp. 301–320, 2005

  39. [39]

    Learning policies for partially observable envi- ronments: Scaling up,

    M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, “Learning policies for partially observable envi- ronments: Scaling up,” in Machine Learning Proceedings 1995 . Elsevier, 1995, pp. 362–370

  40. [40]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,

    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning . PMLR, 2018, pp. 1861–1870

  41. [41]

    Restricted maximum likelihood (REML) estimation of variance com- ponents in the mixed model,

    R. R. Corbeil and S. R. Searle, “Restricted maximum likelihood (REML) estimation of variance com- ponents in the mixed model,” Technometrics, vol. 18, no. 1, pp. 31–38, 1976. 20