Recoverable Identifier

arXiv:2605.15480 · detector doi_compliance · incontrovertible · 2026-05-19 14:37:41.651882+00:00

advisory doi_compliance recoverable_identifier

DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.1109/TCYB.2020.2988820.Appendix) was visible in the surrounding text but could not be confirmed against doi.org as printed.

Paper page Integrity report arXiv Try DOI

Evidence text

Zhang, H., Yang, Y., and Jiang, Y. (2021). Reinforcement learning-based control with communication delays: The- ory and applications.IEEE Transactions on Cybernet- ics, 51(9), 4368–4381. doi:10.1109/TCYB.2020.2988820. Appendix A. TRAINING PROCEDURE The proposed framework is trained in two stages, both conducted entirely within a MuJoCo simulation of the Franka Panda manipulator. No real-robot data is used during training; the physical experiments described in Section 4.2 therefore also serve as a sim-to-real evaluation. Training and validation environments are instantiated from distinct random seeds to prevent data leakage. T raining data and collection.The training data con- sists of figure-8 reference trajectories generated on-the-fly by the leader simulator, with their geometric and temporal parameters randomized at each episode reset (centerc x ∈ [0.3,0.4] m,c y ∈[−0.1,0.1] m, scales x,y ∈[0.1,0.3] m, sz ∈[0.01,0.03] m, frequencyf∈[0.05,0.15] Hz). At every control step, the leader joint state, the corresponding stochastic delayω t s, and the future ground-truth trajectory used as the autoregressive target are written into a circular replay buffer. This online collection scheme is chosen over a pre-collected fixed dataset for two reasons. First, the joint distribution of trajectory shape and delay realization is too high-dimensional to enumerate offline; sampling on-the-fly ensures uniform coverage of the operating envelope used at deployment. Second, the autoregressive ta

Evidence payload

{
  "printed_excerpt": "Zhang, H., Yang, Y., and Jiang, Y. (2021). Reinforcement learning-based control with communication delays: The- ory and applications.IEEE Transactions on Cybernet- ics, 51(9), 4368\u20134381. doi:10.1109/TCYB.2020.2988820. Appendix A. TRAINING P",
  "reconstructed_doi": "10.1109/TCYB.2020.2988820.Appendix",
  "ref_index": 14,
  "resolved_title": null,
  "verdict_class": "incontrovertible"
}