X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

Apoorva Sharma; Boris Ivanovic; Edward Schmerling; Han Qi; Heng Yang; Marco Pavone; Michael Watson; Rachel Luo; Sushant Veer

arxiv: 2606.05159 · v1 · pith:UOZ4XMVZnew · submitted 2026-06-03 · 💻 cs.RO

X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

Rachel Luo , Michael Watson , Apoorva Sharma , Heng Yang , Han Qi , Edward Schmerling , Sushant Veer , Boris Ivanovic

show 1 more author

Marco Pavone

This is my paper

Pith reviewed 2026-06-28 05:43 UTC · model grok-4.3

classification 💻 cs.RO

keywords variance reductioncontrol variatespolicy evaluationrobotic systemsneural surrogatesmulti-domain dataautonomous drivingrobot manipulation

0 comments

The pith

X4Val learns a neural predictor from auxiliary data to cut variance in real-world robotic policy evaluation without paired samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces X4Val to make rigorous evaluation of robotic policies more efficient when real-world test data is scarce and expensive. It embeds samples from real and auxiliary domains such as simulation or historical logs into a shared space, then trains a transferable predictor of real metrics. This predictor feeds into a control-variates estimator that subtracts predictable components from the evaluation, lowering variance even without matched pairs across domains. A sympathetic reader cares because abundant mismatched data could then support high-confidence estimates, reducing the real-world samples needed for deployment validation. If correct, the approach directly improves sample efficiency in iterative policy development.

Core claim

X4Val embeds samples from real and auxiliary domains into a shared representation space and learns a transferable predictor of real-world metrics; this learned predictor is then incorporated into a control-variates estimator, enabling variance reduction even when paired samples are unavailable. The framework supplies theoretical analysis and achieves up to 38.4 percent variance reduction with consistent gains over baselines on autonomous driving and real-world robot manipulation tasks.

What carries the argument

The neural surrogate predictor trained in a shared embedding space and inserted into the control-variates estimator.

If this is right

Non-paired heterogeneous data sources become usable for high-confidence real-world metric estimation.
Variance reduction reaches up to 38.4 percent on autonomous driving and robot manipulation tasks.
Empirical results show consistent improvements over strong baselines.
Theoretical analysis backs the variance reduction property of the estimator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The shared embedding might support evaluation of policies in entirely new environments if the representation proves policy-invariant.
The method could shorten iteration cycles in robotics by lowering the real-world data volume required per policy update.
Similar surrogate-augmented control-variates setups might apply to other domains that rely on abundant but non-representative auxiliary data, such as simulation-heavy engineering validation.

Load-bearing premise

The predictor learned from auxiliary domains must remain accurate and unbiased enough when plugged into the control-variates estimator for the target real metric.

What would settle it

Collect a large held-out set of real-world samples, compute the estimator variance both with and without the learned predictor, and check whether the reduction matches or exceeds the reported levels or whether predictor bias nullifies the gain.

Figures

Figures reproduced from arXiv: 2606.05159 by Apoorva Sharma, Boris Ivanovic, Edward Schmerling, Han Qi, Heng Yang, Marco Pavone, Michael Watson, Rachel Luo, Sushant Veer.

**Figure 1.** Figure 1: Comparison of X4Val with standard control variates-based estimation. Standard approaches (bottom) can only use real world data with its log-replay simulations enforcing a strict pairing between the real and simulated data and limiting the amount of data that can be used. X4Val (top) on the other hand, can use a diverse range of data sources by projecting them to a shared embedding space and learning a tr… view at source ↗

**Figure 2.** Figure 2: AV deployment to a new region. With limited target-domain evaluation data from a new region (Germany), leveraging auxiliary data from past evaluations in another region (United States) can reduce variance of performance estimation in the target region. X4Val most efficiently combines data from tests in Germany with auxiliary data to yield consistent variance reduction relative to baselines. Error bars sh… view at source ↗

**Figure 3.** Figure 3: Driving examples for the US and Germany geographical regions. Differences include lane-marker types, signage, architecture, typical road features/junctions, etc. 5.1 Case Study 1: Autonomous Vehicle Deployment to a New Geographic Region In this case study, we consider a scenario in which an AV policy has been trained and validated in one geographic region, and must now be validated for deployment in a new… view at source ↗

**Figure 4.** Figure 4: Iterative AV policy development. When validating a newly trained policy with limited evaluation data, historical evaluation data from earlier policy versions can serve as auxiliary information to reduce uncertainty in estimating current performance. X4Val most efficiently combines the limited current-policy evaluation data with historical data from earlier policies, achieving the largest variance reductio… view at source ↗

**Figure 5.** Figure 5: X4Val for policy evaluation in a block-stacking manipulation task. (a) Example evaluation in the ManiSkill simulator. (b) Example evaluation on a real robot. (c) Variance reduction achieved by X4Val compared to Monte Carlo when estimating the policy’s mean success rate (each boxplot summarizes 20 random seeds). In this section, we demonstrate that X4Val enables leveraging robot manipulation policies train… view at source ↗

**Figure 6.** Figure 6: Effect of control-weight optimization on CV_MCF variance reduction relative to Simple Monte Carlo. (a) In the geographic-transfer case study, enabling optimization yields modest but consistent gains across MCF train fractions, with the largest gain (11% → 14%) occurring at fraction = 0.0. (b) In the iterative-policy-development case study, enabling optimization is essential: it transforms an estimator that… view at source ↗

read the original abstract

Rigorous evaluation of learning-based robotic systems is an essential prerequisite for deployment. However, real-world test data is expensive to gather; moreover, in a typical iterative development context, data gathered from the latest policy is necessarily limited in scale. This motivates evaluation methodologies that make use of heterogeneous data sources, including simulation, historical policy logs, and data collected from related platforms or environments. While such auxiliary data are abundant and inexpensive, they are generally not directly representative of real-world outcomes -- for example, performance in simulation may differ substantially from performance in the real world -- making their principled use for high-confidence performance estimation challenging. In this paper, we introduce X4Val, a general framework for variance-reduced real-world metric estimation in the presence of non-paired, multi-domain data. X4Val embeds samples from real and auxiliary domains into a shared representation space and learns a transferable predictor of real-world metrics; this learned predictor is then incorporated into a control-variates estimator, enabling variance reduction even when paired samples are unavailable. We provide theoretical analysis and empirical evaluations on autonomous driving and real-world robot manipulation tasks, domains across which X4Val achieves up to 38.4% variance reduction and demonstrates consistent improvements over strong baselines. These results show that non-paired, heterogeneous data can be leveraged to substantially improve the sample efficiency of rigorous robotic system validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

X4Val uses a learned neural predictor from unpaired auxiliary data as a control variate for real-world robotics metrics, but the unbiasedness claim under domain shift needs close checking.

read the letter

The core idea is to embed real and auxiliary samples in a shared space, train a predictor of the target metric on the auxiliary side, and plug that predictor into a control-variates estimator. This lets them claim variance reduction without needing paired samples.

The paper does a few things right. It targets a genuine bottleneck in robotics: real test data is expensive and limited, while simulation and historical logs are plentiful but mismatched. The empirical numbers (up to 38.4% variance reduction on driving and manipulation tasks) are concrete, and the authors report consistent gains over baselines. They also state they supply theoretical analysis, which is better than pure empirical claims.

The soft spot is exactly the one the stress test flags. Control variates stay unbiased only when the control variate has known expectation or the bias term cancels. Here the predictor is trained on non-representative auxiliary domains and transferred; any residual gap means E[P(real)] eq E[M], which injects bias the variance-reduction figures do not bound. The abstract does not say how the theory handles this offset or whether they prove the estimator remains unbiased. Without the actual proof or bias bounds, the central guarantee is not yet convincing.

This is a paper for robotics groups that already work on sim-to-real evaluation and variance reduction. A reader who needs practical sample-efficiency tricks might get something usable from the experiments. It is coherent enough and grounded enough in a real problem to deserve referee time rather than a desk reject, even if the theory section turns out to need tightening.

Referee Report

1 major / 2 minor

Summary. The paper introduces X4Val, a framework for variance-reduced real-world metric estimation in robotics using non-paired multi-domain data. Samples from real and auxiliary domains (e.g., simulation, historical logs) are embedded into a shared representation space; a transferable neural predictor of the target real-world metric is learned from this space and then plugged into a control-variates estimator. The method is supported by theoretical analysis and evaluated on autonomous driving and real-world robot manipulation tasks, where it reports up to 38.4% variance reduction over strong baselines while remaining consistent across domains.

Significance. If the unbiasedness of the control-variates estimator is preserved under domain shift, the approach would meaningfully improve sample efficiency for rigorous policy evaluation in robotics, where real-world data collection is costly and auxiliary data sources are abundant but non-representative. The explicit use of learned surrogates inside control variates, together with the reported empirical gains, would constitute a practical advance over standard Monte-Carlo or paired-sample methods.

major comments (1)

[Theoretical Analysis] The central claim that the estimator remains unbiased (and therefore that reported variance reductions are meaningful for high-confidence estimation) rests on the learned predictor satisfying E[P(real)] = E[M] (or a known offset) after transfer from auxiliary domains. The skeptic note correctly identifies this as the least secure assumption; the theoretical analysis section must therefore contain an explicit derivation or bound showing that any residual domain gap does not introduce a non-zero bias term in the control-variates estimator E[M - β(P - E[P])]. Without such a derivation or a sensitivity analysis, the variance-reduction numbers alone do not establish that the estimator is suitable for rigorous validation.

minor comments (2)

[Introduction] The abstract and introduction would benefit from a short statement clarifying whether the control-variates coefficient β is estimated from the same data or held out, as this choice directly affects both bias and variance.
[Experiments] Figure captions should explicitly state the number of independent runs and whether error bars represent standard error or standard deviation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for emphasizing the need to rigorously address potential bias under domain shift. We respond to the major comment below and will revise the manuscript to include the requested derivation and sensitivity analysis.

read point-by-point responses

Referee: [Theoretical Analysis] The central claim that the estimator remains unbiased (and therefore that reported variance reductions are meaningful for high-confidence estimation) rests on the learned predictor satisfying E[P(real)] = E[M] (or a known offset) after transfer from auxiliary domains. The skeptic note correctly identifies this as the least secure assumption; the theoretical analysis section must therefore contain an explicit derivation or bound showing that any residual domain gap does not introduce a non-zero bias term in the control-variates estimator E[M - β(P - E[P])]. Without such a derivation or a sensitivity analysis, the variance-reduction numbers alone do not establish that the estimator is suitable for rigorous validation.

Authors: We agree that the unbiasedness claim requires explicit handling of residual domain gap after transfer. Section 3 derives that the control-variates estimator E[M - β(P - E[P])] is unbiased whenever E[P] = E[M] holds in the target (real) domain; the analysis treats the learned predictor as satisfying this equality after embedding into the shared space. However, the current write-up does not provide a quantitative bound on the bias that would arise if transfer is imperfect. In the revision we will add (i) a derivation bounding the absolute bias |E[M - β(P - E[P])]| by the product of the control-variate coefficient and an integral probability metric (e.g., Wasserstein-1) between the embedded real and auxiliary distributions, and (ii) a sensitivity study in the experimental section that injects controlled domain discrepancies and reports both the resulting bias and the observed variance reduction. These additions will make the theoretical guarantees and empirical claims directly comparable. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The provided abstract and description present X4Val as a new framework that learns a transferable neural predictor from multi-domain embeddings and plugs it into a standard control-variates estimator. No equations, self-citations, or fitted quantities are shown that reduce the claimed variance reduction to a tautology or to inputs by construction. The method relies on independent statistical theory (control variates) and standard supervised learning, with the domain-transfer assumption stated explicitly rather than smuggled in via prior self-work. This is the common honest case of a self-contained proposal against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no equations or implementation details, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.1-grok · 5799 in / 1132 out tokens · 15192 ms · 2026-06-28T05:43:30.113209+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 6 linked inside Pith

[1]

Science382(6671), 669–674 (2023)

Angelopoulos, A.N., Bates, S., Fannjiang, C., Jordan, M.I., Zrnic, T.: Prediction- powered inference. Science382(6671), 669–674 (2023)

2023
[2]

arXiv preprint arXiv:2311.01453 (2023)

Angelopoulos, A.N., Duchi, J.C., Zrnic, T.: Ppi++: Efficient prediction-powered inference. arXiv preprint arXiv:2311.01453 (2023)

Pith/arXiv arXiv 2023
[3]

arXiv preprint arXiv:2510.04354 (2025)

Badithela, A., Snyder, D., Zha, L., Mikhail, J., O’Kelly, M., Dixit, A., Majumdar, A.: Reliable and scalable robot policy evaluation with imperfect simulators. arXiv preprint arXiv:2510.04354 (2025)

arXiv 2025
[4]

arXiv preprint arXiv:2403.07008 (2024)

Boyeau, P., Angelopoulos, A.N., Yosef, N., Malik, J., Jordan, M.I.: Autoeval done right: Using synthetic data for model evaluation. arXiv preprint arXiv:2403.07008 (2024)

Pith/arXiv arXiv 2024
[5]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11621–11631 (2020)

2020
[6]

Journal of Personalized Medicine13(2023)

Chato, L., Regentova, E.E.: Survey of transfer learning approaches in the machine learning of digital health sensing data. Journal of Personalized Medicine13(2023)

2023
[7]

The International Journal of Robotics Research44(10-11), 1684–1704 (2025)

Chi,C.,Xu,Z.,Feng,S.,Cousineau,E.,Du,Y.,Burchfiel,B.,Tedrake,R.,Song,S.: Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research44(10-11), 1684–1704 (2025)

2025
[8]

In: North American Chapter of the Association for Computational Linguistics (2019)

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidi- rectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2019)

2019
[9]

Advances in Neural Information Process- ing Systems36, 7730–7742 (2023)

Gulino, C., Fu, J., Luo, W., Tucker, G., Bronstein, E., Lu, Y., Harb, J., Pan, X., Wang, Y., Chen, X., et al.: Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research. Advances in Neural Information Process- ing Systems36, 7730–7742 (2023)

2023
[10]

arXiv preprint arXiv:2008.12037 (2020)

Iakovleva, E., Verbeek, J.J., Karteek, A.: Meta-learning with shared amortized variational inference. arXiv preprint arXiv:2008.12037 (2020)

arXiv 2008
[11]

arXiv preprint arXiv:2201.05867 (2022)

Jiang, J., Shu, Y., Wang, J., Long, M.: Transferability in deep learning: A survey. arXiv preprint arXiv:2201.05867 (2022)

arXiv 2022
[12]

In: International conference on machine learning

Jiang, N., Li, L.: Doubly robust off-policy value evaluation for reinforcement learn- ing. In: International conference on machine learning. pp. 652–661. PMLR (2016)

2016
[13]

In: Conference on Robot Learning (2023)

Katdare, P., Jiang, N., Driggs-Campbell, K.: Marginalized importance sampling for off-environment policy evaluation. In: Conference on Robot Learning (2023)

2023
[14]

Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review,andperspectivesonopenproblems.arXivpreprintarXiv:2005.01643(2020) X4Val 17

Pith/arXiv arXiv 2005
[15]

In: Proceedings of the Conference on Robot Learning (CoRL) (2025)

Luo, R., Yang, H., Watson, M., Sharma, A., Veer, S., Schmerling, E., Pavone, M.: Sim2val: Leveraging correlation across test platforms for variance-reduced metric estimation. In: Proceedings of the Conference on Robot Learning (CoRL) (2025)

2025
[16]

arXiv preprint arXiv:2507.20068 (2025)

Mandyam, A., Meng, J., Gao, G., Sun, J., Schwager, M., Engelhardt, B.E., Brun- skill, E.: Perry: Policy evaluation with confidence intervals using auxiliary data. arXiv preprint arXiv:2507.20068 (2025)

Pith/arXiv arXiv 2025
[17]

arXiv preprint arXiv:2107.14483 (2021)

Mu, T., Ling, Z., Xiang, F., Yang, D., Li, X., Tao, S., Huang, Z., Jia, Z., Su, H.: Maniskill: Generalizable manipulation skill benchmark with large-scale demonstra- tions. arXiv preprint arXiv:2107.14483 (2021)

arXiv 2021
[18]

ArXivabs/2402.04580(2024)

Niu, H., Hu, J., Zhou, G., Zhan, X.: A comprehensive survey of cross-domain policy transfer for embodied agents. ArXivabs/2402.04580(2024)

arXiv 2024
[19]

NVIDIA, Cao, Y., de Lutio, R., Fidler, S., Cobo, G.G., Gojcic, Z., Igl, M., Ivanovic, B.,Karkus,P.,Esturo,J.M.,Pavone,M.,Smith,A.,Tanimura,E.,Tyszkiewicz,M., Watson, M., Wu, Q., Zhang, L.: Alpasim: A modular, lightweight, and data-driven research simulator for autonomous driving (October 2025),https://github.com/ NVlabs/alpasim

2025
[20]

NVIDIA Corporation: PhysicalAI-Autonomous-Vehicles dataset (October 2025), https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles

2025
[21]

arXiv preprint arXiv:2304.07193 (2023)

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

Pith/arXiv arXiv 2023
[22]

domains/mc/(2013)

Owen, A.B.: Monte Carlo theory, methods and examples.https://artowen.su. domains/mc/(2013)

2013
[23]

In: ICML

Precup, D., Sutton, R.S., Singh, S.: Eligibility traces for off-policy policy evalua- tion. In: ICML. vol. 2000, pp. 759–766. Citeseer (2000)

2000
[24]

In: International Confer- ence on Learning Representations (2018)

Ravi, S., Beatson, A.: Amortized bayesian meta-learning. In: International Confer- ence on Learning Representations (2018)

2018
[25]

In: International Conference on Artificial Neural Networks (2018)

Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: International Conference on Artificial Neural Networks (2018)

2018
[26]

Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? ArXivabs/1411.1792(2014)

Pith/arXiv arXiv 2014
[27]

Zaheer, M., Kottur, S., Ravanbakhsh, S., Póczos, B., Salakhutdinov, R., Smola, A.: Deep sets (2017)

2017
[28]

arXiv preprint arXiv:2508.14285 (2025)

Zhang, L., Snell, J., Griffiths, T.: Amortized bayesian meta-learning for low-rank adaptation of large language models. arXiv preprint arXiv:2508.14285 (2025)

arXiv 2025
[29]

arXiv preprint arXiv:2502.10563 (2025)

Zhou, Z., Song, Y., Zanette, A.: Accelerating unbiased llm evaluation via synthetic feedback. arXiv preprint arXiv:2502.10563 (2025)

arXiv 2025
[30]

Proceedings of the National Academy of Sciences of the United States of America121(2024) 18 R

Zrnic, T., Candès, E.J.: Cross-prediction-powered inference. Proceedings of the National Academy of Sciences of the United States of America121(2024) 18 R. Luo et al. A Cross-Fitted Estimator and Confidence Intervals This section gives the full cross-fitted version of the estimator described in Sec- tion 4. Cross-fitting allows all labeled target-domain s...

2024

[1] [1]

Science382(6671), 669–674 (2023)

Angelopoulos, A.N., Bates, S., Fannjiang, C., Jordan, M.I., Zrnic, T.: Prediction- powered inference. Science382(6671), 669–674 (2023)

2023

[2] [2]

arXiv preprint arXiv:2311.01453 (2023)

Angelopoulos, A.N., Duchi, J.C., Zrnic, T.: Ppi++: Efficient prediction-powered inference. arXiv preprint arXiv:2311.01453 (2023)

Pith/arXiv arXiv 2023

[3] [3]

arXiv preprint arXiv:2510.04354 (2025)

Badithela, A., Snyder, D., Zha, L., Mikhail, J., O’Kelly, M., Dixit, A., Majumdar, A.: Reliable and scalable robot policy evaluation with imperfect simulators. arXiv preprint arXiv:2510.04354 (2025)

arXiv 2025

[4] [4]

arXiv preprint arXiv:2403.07008 (2024)

Boyeau, P., Angelopoulos, A.N., Yosef, N., Malik, J., Jordan, M.I.: Autoeval done right: Using synthetic data for model evaluation. arXiv preprint arXiv:2403.07008 (2024)

Pith/arXiv arXiv 2024

[5] [5]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11621–11631 (2020)

2020

[6] [6]

Journal of Personalized Medicine13(2023)

Chato, L., Regentova, E.E.: Survey of transfer learning approaches in the machine learning of digital health sensing data. Journal of Personalized Medicine13(2023)

2023

[7] [7]

The International Journal of Robotics Research44(10-11), 1684–1704 (2025)

Chi,C.,Xu,Z.,Feng,S.,Cousineau,E.,Du,Y.,Burchfiel,B.,Tedrake,R.,Song,S.: Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research44(10-11), 1684–1704 (2025)

2025

[8] [8]

In: North American Chapter of the Association for Computational Linguistics (2019)

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidi- rectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2019)

2019

[9] [9]

Advances in Neural Information Process- ing Systems36, 7730–7742 (2023)

Gulino, C., Fu, J., Luo, W., Tucker, G., Bronstein, E., Lu, Y., Harb, J., Pan, X., Wang, Y., Chen, X., et al.: Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research. Advances in Neural Information Process- ing Systems36, 7730–7742 (2023)

2023

[10] [10]

arXiv preprint arXiv:2008.12037 (2020)

Iakovleva, E., Verbeek, J.J., Karteek, A.: Meta-learning with shared amortized variational inference. arXiv preprint arXiv:2008.12037 (2020)

arXiv 2008

[11] [11]

arXiv preprint arXiv:2201.05867 (2022)

Jiang, J., Shu, Y., Wang, J., Long, M.: Transferability in deep learning: A survey. arXiv preprint arXiv:2201.05867 (2022)

arXiv 2022

[12] [12]

In: International conference on machine learning

Jiang, N., Li, L.: Doubly robust off-policy value evaluation for reinforcement learn- ing. In: International conference on machine learning. pp. 652–661. PMLR (2016)

2016

[13] [13]

In: Conference on Robot Learning (2023)

Katdare, P., Jiang, N., Driggs-Campbell, K.: Marginalized importance sampling for off-environment policy evaluation. In: Conference on Robot Learning (2023)

2023

[14] [14]

Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review,andperspectivesonopenproblems.arXivpreprintarXiv:2005.01643(2020) X4Val 17

Pith/arXiv arXiv 2005

[15] [15]

In: Proceedings of the Conference on Robot Learning (CoRL) (2025)

Luo, R., Yang, H., Watson, M., Sharma, A., Veer, S., Schmerling, E., Pavone, M.: Sim2val: Leveraging correlation across test platforms for variance-reduced metric estimation. In: Proceedings of the Conference on Robot Learning (CoRL) (2025)

2025

[16] [16]

arXiv preprint arXiv:2507.20068 (2025)

Mandyam, A., Meng, J., Gao, G., Sun, J., Schwager, M., Engelhardt, B.E., Brun- skill, E.: Perry: Policy evaluation with confidence intervals using auxiliary data. arXiv preprint arXiv:2507.20068 (2025)

Pith/arXiv arXiv 2025

[17] [17]

arXiv preprint arXiv:2107.14483 (2021)

Mu, T., Ling, Z., Xiang, F., Yang, D., Li, X., Tao, S., Huang, Z., Jia, Z., Su, H.: Maniskill: Generalizable manipulation skill benchmark with large-scale demonstra- tions. arXiv preprint arXiv:2107.14483 (2021)

arXiv 2021

[18] [18]

ArXivabs/2402.04580(2024)

Niu, H., Hu, J., Zhou, G., Zhan, X.: A comprehensive survey of cross-domain policy transfer for embodied agents. ArXivabs/2402.04580(2024)

arXiv 2024

[19] [19]

NVIDIA, Cao, Y., de Lutio, R., Fidler, S., Cobo, G.G., Gojcic, Z., Igl, M., Ivanovic, B.,Karkus,P.,Esturo,J.M.,Pavone,M.,Smith,A.,Tanimura,E.,Tyszkiewicz,M., Watson, M., Wu, Q., Zhang, L.: Alpasim: A modular, lightweight, and data-driven research simulator for autonomous driving (October 2025),https://github.com/ NVlabs/alpasim

2025

[20] [20]

NVIDIA Corporation: PhysicalAI-Autonomous-Vehicles dataset (October 2025), https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles

2025

[21] [21]

arXiv preprint arXiv:2304.07193 (2023)

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

Pith/arXiv arXiv 2023

[22] [22]

domains/mc/(2013)

Owen, A.B.: Monte Carlo theory, methods and examples.https://artowen.su. domains/mc/(2013)

2013

[23] [23]

In: ICML

Precup, D., Sutton, R.S., Singh, S.: Eligibility traces for off-policy policy evalua- tion. In: ICML. vol. 2000, pp. 759–766. Citeseer (2000)

2000

[24] [24]

In: International Confer- ence on Learning Representations (2018)

Ravi, S., Beatson, A.: Amortized bayesian meta-learning. In: International Confer- ence on Learning Representations (2018)

2018

[25] [25]

In: International Conference on Artificial Neural Networks (2018)

Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: International Conference on Artificial Neural Networks (2018)

2018

[26] [26]

Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? ArXivabs/1411.1792(2014)

Pith/arXiv arXiv 2014

[27] [27]

Zaheer, M., Kottur, S., Ravanbakhsh, S., Póczos, B., Salakhutdinov, R., Smola, A.: Deep sets (2017)

2017

[28] [28]

arXiv preprint arXiv:2508.14285 (2025)

Zhang, L., Snell, J., Griffiths, T.: Amortized bayesian meta-learning for low-rank adaptation of large language models. arXiv preprint arXiv:2508.14285 (2025)

arXiv 2025

[29] [29]

arXiv preprint arXiv:2502.10563 (2025)

Zhou, Z., Song, Y., Zanette, A.: Accelerating unbiased llm evaluation via synthetic feedback. arXiv preprint arXiv:2502.10563 (2025)

arXiv 2025

[30] [30]

Proceedings of the National Academy of Sciences of the United States of America121(2024) 18 R

Zrnic, T., Candès, E.J.: Cross-prediction-powered inference. Proceedings of the National Academy of Sciences of the United States of America121(2024) 18 R. Luo et al. A Cross-Fitted Estimator and Confidence Intervals This section gives the full cross-fitted version of the estimator described in Sec- tion 4. Cross-fitting allows all labeled target-domain s...

2024