Domain-Adaptive Communication-Rate Optimization for Sim-to-Real Humanoid-Robot Wireless XR Teleoperation

Caolu Xu; Feng Yang; Li Song; Meixia Tao; Wenjun Zhang; Zhiyong Chen

arxiv: 2605.19293 · v1 · pith:5RZ3LLVLnew · submitted 2026-05-19 · 💻 cs.IT · cs.LG· cs.RO· math.IT

Domain-Adaptive Communication-Rate Optimization for Sim-to-Real Humanoid-Robot Wireless XR Teleoperation

Caolu Xu , Zhiyong Chen , Meixia Tao , Li Song , Feng Yang , Wenjun Zhang This is my paper

Pith reviewed 2026-05-20 03:11 UTC · model grok-4.3

classification 💻 cs.IT cs.LGcs.ROmath.IT

keywords domain adaptationcommunication optimizationhumanoid robotsteleoperationPAC-Bayes boundsproximal policy optimizationsim-to-real transferwireless XR

0 comments

The pith

A PAC-Bayes guided PPO method improves the reconstruction error versus communication energy tradeoff for humanoid robot wireless XR teleoperation under sim-to-real shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework for optimizing communication rates in wireless XR teleoperation of humanoid robots to minimize energy use while preserving motion trajectory reconstruction accuracy. It controls sampling rates dimension by dimension and addresses the challenge of limited real-world feedback by using simulator interactions corrected with offline real data. A PAC-Bayes generalization bound is derived to characterize how latent density-ratio estimation, finite sample effects, and encoder bias influence performance, which then directs a proximal policy optimization algorithm incorporating density-ratio weighting and trust-region regularization. Experiments on a public humanoid teleoperation dataset confirm that this approach enhances the error-energy tradeoff when facing distribution shifts from simulation to reality, with further tests across wireless channels and motion types.

Core claim

The central discovery is a domain-adaptive communication-rate optimization framework that integrates sampling, transmission, interpolation, and reconstruction, using a PAC-Bayes characterization to guide sim-to-real adaptation through a PPO method with density-ratio weighting, resulting in better maintenance of reconstruction accuracy at lower communication energy under distribution shifts.

What carries the argument

The PAC-Bayes generalization characterization that accounts for latent density-ratio estimation, finite-sample deviation, and encoder bias to inform the PPO-based sim-to-real adaptation.

Load-bearing premise

The PAC-Bayes generalization characterization must accurately capture the influences of density-ratio estimation, sample deviation, and bias to effectively guide the adaptation process.

What would settle it

Running the method on a new set of real-world humanoid motion data not used in training and observing no significant improvement in the reconstruction error versus energy curve compared to baseline methods would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.19293 by Caolu Xu, Feng Yang, Li Song, Meixia Tao, Wenjun Zhang, Zhiyong Chen.

**Figure 2.** Figure 2: Timeline of the proposed wireless XR teleoperation framework. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of proposed sim-to-real domain-adaptive policy optimization algorithm architecture. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Performance comparison under different channel gains. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Dimension-wise sampling-rate decisions generated by [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Wireless extended reality (XR) teleoperation provides embodied interaction capability for collecting humanoid robot demonstrations, but the large-scale adoption is restricted by the overhead of high-frequency motion transmission. This paper develops a system framework that integrates sampling, transmission, interpolation, and reconstruction and formulates a communication-rate optimization that aims to minimize the communication energy while maintaining the reconstruction accuracy of robot motion trajectories through dimension-wise sampling-rate control. Since acquiring real-time feedback from physical robots is limited by hardware costs, it is necessary to solve the problem through simulator interaction with offline real-domain data correction. To guide sim-to-real adaptation, we provide a PAC-Bayes generalization characterization that reveals the effects of latent density-ratio estimation, finite-sample deviation, and encoder bias. Building on this analysis, we propose a proximal policy optimization (PPO) method with density-ratio weighting and trust-region regularization. Experiments on public humanoid teleoperation dataset show that the proposed method improves the tradeoff between reconstruction error and communication energy consumption under sim-to-real distribution shift. We further analyze the effectiveness of the proposed algorithm across various wireless channels and dynamic motion trajectories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines PAC-Bayes characterization of density-ratio effects with density-weighted PPO to set per-dimension sampling rates for lower-energy humanoid XR teleoperation under sim-to-real shift.

read the letter

The main point is a practical optimization that cuts communication energy for wireless humanoid robot teleoperation by choosing sampling rates dimension by dimension. It trains in simulation but corrects with offline real data, using a PAC-Bayes bound to shape the PPO objective through density-ratio weighting and trust-region regularization. Experiments on a public teleoperation dataset reportedly improve the error-versus-energy tradeoff when the real distribution differs from the simulator, and the authors check behavior across wireless channels and motion types. That combination of dimension-wise control, PAC-Bayes guidance, and PPO regularization for this exact setting looks like the new piece rather than a direct copy of earlier results. The pipeline that ties sampling, transmission, interpolation, and reconstruction together is laid out clearly and the offline correction step is a sensible response to hardware limits. The reported gains under distribution shift give at least initial evidence that the approach can work in practice. The softer spot is the strength of the PAC-Bayes part for high-dimensional motion trajectories. The abstract says the bound reveals effects from latent density-ratio estimation, finite-sample deviation, and encoder bias, but without the derivations or numerical checks it is difficult to judge how tight the bound stays when the shift hits the latent space. If the density-ratio estimator variance is large, the weighting may not add much beyond the trust-region term, and the claimed adaptation benefit could be smaller than presented. The stress-test concern about bound tightness therefore seems worth a close look in the full text. This paper is for people working on wireless robotics or XR teleoperation who need concrete ways to reduce bandwidth while preserving motion fidelity. A reader already familiar with PAC-Bayes or sim-to-real policy optimization might extract the PPO weighting trick even if the bounds are not the tightest. I would send it to peer review. The framework is coherent, the application is timely, and the experiments provide a starting point, though revisions would be needed for clearer metrics, baselines, and bound validation.

Referee Report

2 major / 2 minor

Summary. The paper develops a communication-rate optimization framework for wireless XR teleoperation of humanoid robots that integrates dimension-wise sampling, transmission, interpolation, and reconstruction to minimize energy consumption subject to reconstruction accuracy constraints. It derives a PAC-Bayes generalization characterization to analyze the effects of latent density-ratio estimation, finite-sample deviation, and encoder bias under sim-to-real shifts, then uses this to inform a PPO algorithm with density-ratio weighting and trust-region regularization. Experiments on a public humanoid teleoperation dataset are reported to show improved error-energy tradeoffs, with additional analysis across wireless channels and motion trajectories.

Significance. If the central claims hold, the work offers a theoretically grounded approach to reducing communication overhead in embodied XR systems while handling domain shift, which could support scalable humanoid robot teleoperation. The combination of PAC-Bayes bounds with PPO for adaptive sampling-rate control is a distinctive contribution at the intersection of information theory and robotics control. Explicit analysis over channels and trajectories strengthens applicability, and the use of a public dataset aids reproducibility.

major comments (2)

[§3] §3 (PAC-Bayes generalization characterization): The claim that this bound reliably guides sim-to-real PPO adaptation rests on the characterization remaining informative despite high-dimensional latent motion spaces. No explicit evaluation of bound tightness, density-ratio estimator variance, or looseness induced by trajectory dynamics is provided, leaving open whether the weighting and regularization produce the claimed improvement or function primarily as heuristics.
[§5] §5 (Experiments): The reported improvement in the reconstruction-error vs. communication-energy tradeoff under distribution shift lacks concrete metrics, error bars, statistical significance tests, or tabulated comparisons against baselines such as fixed-rate sampling or vanilla PPO. This omission makes it impossible to assess whether the gains are load-bearing or sensitive to post-hoc choices.

minor comments (2)

[Abstract] Abstract: The phrase 'public humanoid teleoperation dataset' should include the exact dataset name and citation for immediate reproducibility.
[Notation] Notation: Define 'latent density-ratio estimation' and 'encoder bias' at first use with explicit symbols before invoking them in the PAC-Bayes statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough and constructive review. The comments highlight important aspects that can strengthen the presentation of our theoretical analysis and experimental results. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [§3] §3 (PAC-Bayes generalization characterization): The claim that this bound reliably guides sim-to-real PPO adaptation rests on the characterization remaining informative despite high-dimensional latent motion spaces. No explicit evaluation of bound tightness, density-ratio estimator variance, or looseness induced by trajectory dynamics is provided, leaving open whether the weighting and regularization produce the claimed improvement or function primarily as heuristics.

Authors: We agree that an explicit assessment of bound tightness would strengthen the connection between the PAC-Bayes characterization and the observed PPO improvements. Section 3 derives the generalization bound to characterize the combined effects of latent density-ratio estimation error, finite-sample deviation, and encoder bias under sim-to-real shifts. While the bound is used to motivate the density-ratio weighting and trust-region regularization in the PPO objective, we did not include numerical tightness evaluations or variance analysis of the estimator in the original submission. In the revision we will add a dedicated subsection reporting (i) empirical bound tightness on the public humanoid dataset, (ii) variance of the density-ratio estimator across trajectory segments, and (iii) sensitivity of the bound to motion dynamics. This will clarify that the weighting and regularization are informed by the theoretical characterization rather than operating purely as heuristics. revision: yes
Referee: [§5] §5 (Experiments): The reported improvement in the reconstruction-error vs. communication-energy tradeoff under distribution shift lacks concrete metrics, error bars, statistical significance tests, or tabulated comparisons against baselines such as fixed-rate sampling or vanilla PPO. This omission makes it impossible to assess whether the gains are load-bearing or sensitive to post-hoc choices.

Authors: We acknowledge that the experimental section would benefit from more quantitative detail. The original manuscript reports results via figures that illustrate improved error-energy tradeoffs on the public humanoid teleoperation dataset, together with additional sweeps over wireless channels and motion trajectories. To enable rigorous evaluation, we will revise §5 to include: tabulated mean reconstruction error and communication energy values with standard deviations computed over multiple independent runs; statistical significance tests (e.g., paired t-tests) against the reported baselines; and explicit side-by-side comparisons with fixed-rate sampling and vanilla PPO (without density-ratio weighting). These additions will substantiate the load-bearing nature of the gains and their robustness to design choices. revision: yes

Circularity Check

0 steps flagged

PAC-Bayes characterization supplies independent analysis for PPO design

full rationale

The derivation proceeds by first stating a communication-rate optimization objective, then deriving a PAC-Bayes generalization characterization that explicitly accounts for latent density-ratio estimation, finite-sample deviation, and encoder bias. This characterization is presented as an analytical tool that reveals effects under sim-to-real shift. The PPO method with density-ratio weighting and trust-region regularization is then constructed on top of the characterization. Because the bound is not defined in terms of the final PPO performance metric and the experimental results on the public dataset constitute an external check, the chain does not reduce to a self-referential fit or self-citation. No load-bearing step matches any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard PAC-Bayes theory applied to domain adaptation and standard assumptions in reinforcement learning and wireless channel modeling; no new free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption PAC-Bayes generalization bounds apply to the latent density-ratio estimation, finite-sample deviation, and encoder bias in this teleoperation setting.
Invoked to characterize effects and guide the PPO adaptation method.

pith-pipeline@v0.9.0 · 5746 in / 1342 out tokens · 41410 ms · 2026-05-20T03:11:23.246000+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PAC-Bayes generalization characterization that reveals the effects of latent density-ratio estimation, finite-sample deviation, and encoder bias... weighted PPO loss with density-ratio weighting and trust-region regularization
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dimension-wise sampling-rate control... reconstruction error and communication energy consumption

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

[1]

Real-time execution of action chunking flow policies,

K. Black, M. Y . Galliker, and S. Levine, “Real-time execution of action chunking flow policies,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2026

work page 2026
[2]

Causal World Modeling for Robot Control

L. Li, Q. Zhang, Y . Luo, S. Yang, R. Wang, F. Han, M. Yu, Z. Gao, N. Xue, X. Zhuet al., “Causal world modeling for robot control,”arXiv preprint arXiv:2601.21998, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Meta Quest 3

“Meta Quest 3”, (2025), https://www.meta.com/quest/quest-3/

work page 2025
[4]

Apple Vision Pro

“Apple Vision Pro”, https://www.apple.com/apple-vision-pro/

work page
[5]

Prediction, communication, and com- puting duration optimization for vr video streaming,

X. Wei, C. Yang, and S. Han, “Prediction, communication, and com- puting duration optimization for vr video streaming,”IEEE Trans. Commun., vol. 69, no. 3, pp. 1947–1959, 2021

work page 1947
[6]

Design and analysis of mec- and proactive caching-based360 ◦ mobile vr video streaming,

Q. Cheng, H. Shan, W. Zhuang, L. Yu, Z. Zhang, and T. Quek, “Design and analysis of mec- and proactive caching-based360 ◦ mobile vr video streaming,”IEEE Trans. Multimedia, vol. 24, pp. 1529–1544, 2022

work page 2022
[7]

Wireless multiplayer interactive virtual reality game systems with edge computing: Modeling and optimization,

Z. Chen, H. Zhu, L. Song, D. He, and B. Xia, “Wireless multiplayer interactive virtual reality game systems with edge computing: Modeling and optimization,”IEEE Trans. Wireless Commun., vol. 21, no. 11, pp. 9684–9699, 2022

work page 2022
[8]

Wireless multi-user interactive virtual reality in metaverse with edge-device collaborative computing,

C. Xu, Z. Chen, M. Tao, and W. Zhang, “Wireless multi-user interactive virtual reality in metaverse with edge-device collaborative computing,” IEEE Trans. Wireless Commun., pp. 1–1, 2025

work page 2025
[9]

Task-oriented cross-system design for timely and accurate modeling in the metaverse,

Z. Meng, K. Chen, Y . Diao, C. She, G. Zhao, M. A. Imran, and B. Vucetic, “Task-oriented cross-system design for timely and accurate modeling in the metaverse,”IEEE J. Sel. Areas Commun., vol. 42, no. 3, pp. 752–766, 2024

work page 2024
[10]

Sampling, communica- tion, and prediction co-design for synchronizing the real-world device and digital model in metaverse,

Z. Meng, C. She, G. Zhao, and D. De Martini, “Sampling, communica- tion, and prediction co-design for synchronizing the real-world device and digital model in metaverse,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 288–300, 2023

work page 2023
[11]

Learning Humanoid Standing-up Control across Diverse Postures,

T. Huang, J. Ren, H. Wang, Z. Wang, Q. Ben, M. Wen, X. Chen, J. Li, and J. Pang, “Learning Humanoid Standing-up Control across Diverse Postures,” inProc. Robot. Sci. Syst. (RSS), Los Angeles, CA, USA, June 2025

work page 2025
[12]

Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbabu, C. Pan, Z. Yi, G. Qu, K. Kitani, J. Hodgins, L. J. Fan, Y . Zhu, C. Liu, and G. Shi, “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,” inProc. Robot. Sci. Syst. (RSS), 2025

work page 2025
[13]

Humanoid policy∼ human policy,

R.-Z. Qiu, S. Yang, X. Cheng, C. Chawla, J. Li, T. He, G. Yan, L. Paulsen, G. Yang, S. Yi, G. Shi, and X. Wang, “Humanoid policy∼ human policy,”Proc. Conf. Robot Learn. (CoRL), 2025

work page 2025
[14]

Opening the sim-to-real door for humanoid pixel-to-action policy transfer,

H. Xue, T. He, Z. Wang, Q. Ben, W. Xiao, Z. Luo, X. Da, F. Casta ˜neda, G. Shi, S. Sastry, L. J. Fan, and Y . Zhu, “Opening the sim-to-real door for humanoid pixel-to-action policy transfer,” 2026, accepted to Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)

work page 2026
[15]

Federated multi-source domain adaptation for mmwave-based human activity recognition,

C. Zhao, G. Fang, H. Ding, X. Liu, F. Wang, G. Wang, K. Zhao, Z. Wang, and W. Xi, “Federated multi-source domain adaptation for mmwave-based human activity recognition,”IEEE Trans. Mobile Com- put., vol. 24, no. 8, pp. 7283–7296, 2025

work page 2025
[16]

Mec-da: Memory-efficient collabora- tive domain adaptation for mobile edge devices,

X. Zhou, Y . Tian, and X. Wang, “Mec-da: Memory-efficient collabora- tive domain adaptation for mobile edge devices,”IEEE Trans. Mobile Comput., vol. 23, no. 5, pp. 3923–3937, 2024

work page 2024
[17]

Achieving cross-domain NLOS localization via edge-assisted semi-supervised learning,

P. Chen, K. Zhang, S. Gao, K. He, and J. Lv, “Achieving cross-domain NLOS localization via edge-assisted semi-supervised learning,”IEEE Trans. Wireless Commun., vol. 24, no. 9, pp. 7424–7436, 2025

work page 2025
[18]

Direct importance estimation with model selection and its applica- tion to covariate shift adaptation,

M. Sugiyama, S. Nakajima, H. Kashima, P. von B ¨unau, and M. Kawan- abe, “Direct importance estimation with model selection and its applica- tion to covariate shift adaptation,”Ann. Inst. Stat. Math., vol. 60, no. 4, pp. 699–746, 2008

work page 2008
[19]

A least-squares approach to direct importance estimation,

T. Kanamori, S. Hido, and M. Sugiyama, “A least-squares approach to direct importance estimation,”J. Mach. Learn. Res., vol. 10, pp. 1391– 1445, 2009

work page 2009
[20]

A kernel two-sample test,

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch ¨olkopf, and A. Smola, “A kernel two-sample test,”J. Mach. Learn. Res., vol. 13, no. 25, pp. 723–773, 2012

work page 2012
[21]

Learning transferable features with deep adaptation networks,

M. Long, Y . Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 37, 2015, pp. 97–105

work page 2015
[22]

Deep CORAL: Correlation alignment for deep domain adaptation,

B. Sun and K. Saenko, “Deep CORAL: Correlation alignment for deep domain adaptation,” inProc. Eur. Conf. Comput. Vis. Workshops (ECCVW). Springer, 2016, pp. 443–450

work page 2016
[23]

Pac-bayesian generalisation error bounds for gaussian process classification,

M. Seeger, “Pac-bayesian generalisation error bounds for gaussian process classification,”J. Mach. Learn. Res., vol. 3, pp. 233–269, 2002

work page 2002
[24]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

Trust region policy optimization,

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 37, 2015, pp. 1889–1897

work page 2015
[26]

G. B. Folland,Real Analysis: Modern Techniques and Their Applica- tions, 2nd ed. New York, NY , USA: John Wiley & Sons, 1999

work page 1999
[27]

High- dimensional continuous control using generalized advantage estimation,

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estimation,” inProc. Int. Conf. Learn. Represent. (ICLR), 2016

work page 2016
[28]

ETSI TR 138 901 V16.1.0: 5G; Study on channel model for frequencies from 0.5 to 100 GHz (3GPP TR 38.901 version 16.1.0 release 16)

3GPP, “ETSI TR 138 901 V16.1.0: 5G; Study on channel model for frequencies from 0.5 to 100 GHz (3GPP TR 38.901 version 16.1.0 release 16)”, European Telecomm. Std. Institute (ETSI), 2020. https://www.etsi.org/deliver/etsi tr/138900 138999/138901/16.01.00 60/ tr 138901v160100p.pdf

work page 2020
[29]

Humanoid everyday: A comprehensive robotic dataset for open-world humanoid manipulation,

Z. Zhao, H. Jing, X. Liu, J. Mao, A. Jha, H. Yang, R. Xue, S. Zakharor, V . Guizilini, and Y . Wang, “Humanoid everyday: A comprehensive robotic dataset for open-world humanoid manipulation,”arXiv preprint arXiv:2510.08807, 2025

work page arXiv 2025
[30]

XR-Teleoperate: An open-source teleoperation framework and data collection toolkit for embodied intelligence,

Unitree Robotics, “XR-Teleoperate: An open-source teleoperation framework and data collection toolkit for embodied intelligence,” https://github.com/unitreerobotics/xr teleoperate, 2024

work page 2024
[31]

Algorithm 778: L-bfgs- b: Fortran subroutines for large-scale bound-constrained optimization,

C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, “Algorithm 778: L-bfgs- b: Fortran subroutines for large-scale bound-constrained optimization,” ACM Trans. Math. Softw., vol. 23, no. 4, pp. 550–560, 1997

work page 1997

[1] [1]

Real-time execution of action chunking flow policies,

K. Black, M. Y . Galliker, and S. Levine, “Real-time execution of action chunking flow policies,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2026

work page 2026

[2] [2]

Causal World Modeling for Robot Control

L. Li, Q. Zhang, Y . Luo, S. Yang, R. Wang, F. Han, M. Yu, Z. Gao, N. Xue, X. Zhuet al., “Causal world modeling for robot control,”arXiv preprint arXiv:2601.21998, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

Meta Quest 3

“Meta Quest 3”, (2025), https://www.meta.com/quest/quest-3/

work page 2025

[4] [4]

Apple Vision Pro

“Apple Vision Pro”, https://www.apple.com/apple-vision-pro/

work page

[5] [5]

Prediction, communication, and com- puting duration optimization for vr video streaming,

X. Wei, C. Yang, and S. Han, “Prediction, communication, and com- puting duration optimization for vr video streaming,”IEEE Trans. Commun., vol. 69, no. 3, pp. 1947–1959, 2021

work page 1947

[6] [6]

Design and analysis of mec- and proactive caching-based360 ◦ mobile vr video streaming,

Q. Cheng, H. Shan, W. Zhuang, L. Yu, Z. Zhang, and T. Quek, “Design and analysis of mec- and proactive caching-based360 ◦ mobile vr video streaming,”IEEE Trans. Multimedia, vol. 24, pp. 1529–1544, 2022

work page 2022

[7] [7]

Wireless multiplayer interactive virtual reality game systems with edge computing: Modeling and optimization,

Z. Chen, H. Zhu, L. Song, D. He, and B. Xia, “Wireless multiplayer interactive virtual reality game systems with edge computing: Modeling and optimization,”IEEE Trans. Wireless Commun., vol. 21, no. 11, pp. 9684–9699, 2022

work page 2022

[8] [8]

Wireless multi-user interactive virtual reality in metaverse with edge-device collaborative computing,

C. Xu, Z. Chen, M. Tao, and W. Zhang, “Wireless multi-user interactive virtual reality in metaverse with edge-device collaborative computing,” IEEE Trans. Wireless Commun., pp. 1–1, 2025

work page 2025

[9] [9]

Task-oriented cross-system design for timely and accurate modeling in the metaverse,

Z. Meng, K. Chen, Y . Diao, C. She, G. Zhao, M. A. Imran, and B. Vucetic, “Task-oriented cross-system design for timely and accurate modeling in the metaverse,”IEEE J. Sel. Areas Commun., vol. 42, no. 3, pp. 752–766, 2024

work page 2024

[10] [10]

Sampling, communica- tion, and prediction co-design for synchronizing the real-world device and digital model in metaverse,

Z. Meng, C. She, G. Zhao, and D. De Martini, “Sampling, communica- tion, and prediction co-design for synchronizing the real-world device and digital model in metaverse,”IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 288–300, 2023

work page 2023

[11] [11]

Learning Humanoid Standing-up Control across Diverse Postures,

T. Huang, J. Ren, H. Wang, Z. Wang, Q. Ben, M. Wen, X. Chen, J. Li, and J. Pang, “Learning Humanoid Standing-up Control across Diverse Postures,” inProc. Robot. Sci. Syst. (RSS), Los Angeles, CA, USA, June 2025

work page 2025

[12] [12]

Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbabu, C. Pan, Z. Yi, G. Qu, K. Kitani, J. Hodgins, L. J. Fan, Y . Zhu, C. Liu, and G. Shi, “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,” inProc. Robot. Sci. Syst. (RSS), 2025

work page 2025

[13] [13]

Humanoid policy∼ human policy,

R.-Z. Qiu, S. Yang, X. Cheng, C. Chawla, J. Li, T. He, G. Yan, L. Paulsen, G. Yang, S. Yi, G. Shi, and X. Wang, “Humanoid policy∼ human policy,”Proc. Conf. Robot Learn. (CoRL), 2025

work page 2025

[14] [14]

Opening the sim-to-real door for humanoid pixel-to-action policy transfer,

H. Xue, T. He, Z. Wang, Q. Ben, W. Xiao, Z. Luo, X. Da, F. Casta ˜neda, G. Shi, S. Sastry, L. J. Fan, and Y . Zhu, “Opening the sim-to-real door for humanoid pixel-to-action policy transfer,” 2026, accepted to Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)

work page 2026

[15] [15]

Federated multi-source domain adaptation for mmwave-based human activity recognition,

C. Zhao, G. Fang, H. Ding, X. Liu, F. Wang, G. Wang, K. Zhao, Z. Wang, and W. Xi, “Federated multi-source domain adaptation for mmwave-based human activity recognition,”IEEE Trans. Mobile Com- put., vol. 24, no. 8, pp. 7283–7296, 2025

work page 2025

[16] [16]

Mec-da: Memory-efficient collabora- tive domain adaptation for mobile edge devices,

X. Zhou, Y . Tian, and X. Wang, “Mec-da: Memory-efficient collabora- tive domain adaptation for mobile edge devices,”IEEE Trans. Mobile Comput., vol. 23, no. 5, pp. 3923–3937, 2024

work page 2024

[17] [17]

Achieving cross-domain NLOS localization via edge-assisted semi-supervised learning,

P. Chen, K. Zhang, S. Gao, K. He, and J. Lv, “Achieving cross-domain NLOS localization via edge-assisted semi-supervised learning,”IEEE Trans. Wireless Commun., vol. 24, no. 9, pp. 7424–7436, 2025

work page 2025

[18] [18]

Direct importance estimation with model selection and its applica- tion to covariate shift adaptation,

M. Sugiyama, S. Nakajima, H. Kashima, P. von B ¨unau, and M. Kawan- abe, “Direct importance estimation with model selection and its applica- tion to covariate shift adaptation,”Ann. Inst. Stat. Math., vol. 60, no. 4, pp. 699–746, 2008

work page 2008

[19] [19]

A least-squares approach to direct importance estimation,

T. Kanamori, S. Hido, and M. Sugiyama, “A least-squares approach to direct importance estimation,”J. Mach. Learn. Res., vol. 10, pp. 1391– 1445, 2009

work page 2009

[20] [20]

A kernel two-sample test,

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch ¨olkopf, and A. Smola, “A kernel two-sample test,”J. Mach. Learn. Res., vol. 13, no. 25, pp. 723–773, 2012

work page 2012

[21] [21]

Learning transferable features with deep adaptation networks,

M. Long, Y . Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 37, 2015, pp. 97–105

work page 2015

[22] [22]

Deep CORAL: Correlation alignment for deep domain adaptation,

B. Sun and K. Saenko, “Deep CORAL: Correlation alignment for deep domain adaptation,” inProc. Eur. Conf. Comput. Vis. Workshops (ECCVW). Springer, 2016, pp. 443–450

work page 2016

[23] [23]

Pac-bayesian generalisation error bounds for gaussian process classification,

M. Seeger, “Pac-bayesian generalisation error bounds for gaussian process classification,”J. Mach. Learn. Res., vol. 3, pp. 233–269, 2002

work page 2002

[24] [24]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

Trust region policy optimization,

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 37, 2015, pp. 1889–1897

work page 2015

[26] [26]

G. B. Folland,Real Analysis: Modern Techniques and Their Applica- tions, 2nd ed. New York, NY , USA: John Wiley & Sons, 1999

work page 1999

[27] [27]

High- dimensional continuous control using generalized advantage estimation,

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estimation,” inProc. Int. Conf. Learn. Represent. (ICLR), 2016

work page 2016

[28] [28]

ETSI TR 138 901 V16.1.0: 5G; Study on channel model for frequencies from 0.5 to 100 GHz (3GPP TR 38.901 version 16.1.0 release 16)

3GPP, “ETSI TR 138 901 V16.1.0: 5G; Study on channel model for frequencies from 0.5 to 100 GHz (3GPP TR 38.901 version 16.1.0 release 16)”, European Telecomm. Std. Institute (ETSI), 2020. https://www.etsi.org/deliver/etsi tr/138900 138999/138901/16.01.00 60/ tr 138901v160100p.pdf

work page 2020

[29] [29]

Humanoid everyday: A comprehensive robotic dataset for open-world humanoid manipulation,

Z. Zhao, H. Jing, X. Liu, J. Mao, A. Jha, H. Yang, R. Xue, S. Zakharor, V . Guizilini, and Y . Wang, “Humanoid everyday: A comprehensive robotic dataset for open-world humanoid manipulation,”arXiv preprint arXiv:2510.08807, 2025

work page arXiv 2025

[30] [30]

XR-Teleoperate: An open-source teleoperation framework and data collection toolkit for embodied intelligence,

Unitree Robotics, “XR-Teleoperate: An open-source teleoperation framework and data collection toolkit for embodied intelligence,” https://github.com/unitreerobotics/xr teleoperate, 2024

work page 2024

[31] [31]

Algorithm 778: L-bfgs- b: Fortran subroutines for large-scale bound-constrained optimization,

C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, “Algorithm 778: L-bfgs- b: Fortran subroutines for large-scale bound-constrained optimization,” ACM Trans. Math. Softw., vol. 23, no. 4, pp. 550–560, 1997

work page 1997