Recoverable Identifier
advisory
doi_compliance
recoverable_identifier
DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.1109/TCYB.2020.2988820.Appendix) was visible in the surrounding text but could not be confirmed against doi.org as printed.
Paper page Integrity report arXiv Try DOI
Evidence text
Zhang, H., Yang, Y., and Jiang, Y. (2021). Reinforcement learning-based control with communication delays: The- ory and applications.IEEE Transactions on Cybernet- ics, 51(9), 4368–4381. doi:10.1109/TCYB.2020.2988820. Appendix A. TRAINING PROCEDURE The proposed framework is trained in two stages, both conducted entirely within a MuJoCo simulation of the Franka Panda manipulator. No real-robot data is used during training; the physical experiments described in Section 4.2 therefore also serve as a sim-to-real evaluation. Training and validation environments are instantiated from distinct random seeds to prevent data leakage. T raining data and collection.The training data con- sists of figure-8 reference trajectories generated on-the-fly by the leader simulator, with their geometric and temporal parameters randomized at each episode reset (centerc x ∈ [0.3,0.4] m,c y ∈[−0.1,0.1] m, scales x,y ∈[0.1,0.3] m, sz ∈[0.01,0.03] m, frequencyf∈[0.05,0.15] Hz). At every control step, the leader joint state, the corresponding stochastic delayω t s, and the future ground-truth trajectory used as the autoregressive target are written into a circular replay buffer. This online collection scheme is chosen over a pre-collected fixed dataset for two reasons. First, the joint distribution of trajectory shape and delay realization is too high-dimensional to enumerate offline; sampling on-the-fly ensures uniform coverage of the operating envelope used at deployment. Second, the autoregressive ta
Evidence payload
{
"printed_excerpt": "Zhang, H., Yang, Y., and Jiang, Y. (2021). Reinforcement learning-based control with communication delays: The- ory and applications.IEEE Transactions on Cybernet- ics, 51(9), 4368\u20134381. doi:10.1109/TCYB.2020.2988820. Appendix A. TRAINING P",
"reconstructed_doi": "10.1109/TCYB.2020.2988820.Appendix",
"ref_index": 14,
"resolved_title": null,
"verdict_class": "incontrovertible"
}