pith. machine review for the scientific record. sign in

arxiv: 2605.07799 · v2 · submitted 2026-05-08 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:17 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords privileged informationtabular foundation modelsLUPIlearning accelerationgeneralizationPIQLdata efficiencyconvergence
0
0 comments X

The pith

PIQL integrates train-only privileged information to speed convergence and improve generalization in tabular foundation models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PIQL as a framework that supplies two kinds of privileged information only at training time: dataset-level aggregate statistics and encodings of the underlying data-generating program. These help tabular foundation models reduce the load on in-context learning and gain knowledge beyond what raw observations provide. An architecture is designed so the model learns to reconstruct the privileged signals from regular context alone when the information is no longer available at inference. Theoretical analysis identifies conditions under which the added information shrinks the population approximation gap and accelerates finite-sample convergence. Experiments indicate the result is faster training, lower final loss, and stronger generalization, which together lower the data and compute needed to reach target performance.

Core claim

PIQL is the first systematic method to embed privileged information into tabular foundation models by supplying aggregate statistics and data-generating program encodings during training, then training the model to recover those signals from observable inputs at test time, with theory showing reduced approximation error and faster convergence under stated conditions, and experiments confirming quicker training, lower loss, and better generalization.

What carries the argument

The PIQL architecture that learns to reconstruct train-time-only privileged information from observed context at inference time.

If this is right

  • Tabular foundation models reach target performance with fewer training examples and less compute.
  • Final loss decreases and out-of-sample accuracy rises under the same data budget.
  • The pretraining stage can be guided by domain knowledge encoded as privileged signals rather than raw data volume alone.
  • The same reconstruction mechanism can be applied to other modalities where auxiliary signals exist only during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may allow smaller tabular models to match the performance of larger ones trained without privileged information.
  • Combining PIQL with existing efficiency methods such as parameter-efficient fine-tuning could compound resource savings.
  • If reconstruction quality can be monitored, the framework might adaptively decide how much privileged information to supply per batch.

Load-bearing premise

The model can reliably reconstruct the privileged information from regular inputs at inference without adding instability or hidden error.

What would settle it

Run the same training schedule with and without the reconstruction module active; if the version that cannot recover privileged signals shows no convergence speedup or generalization gain, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2605.07799 by Leman Akoglu, Xueying Ding.

Figure 1
Figure 1. Figure 1: Privileged TFMs pretrained on a GMM-only data prior and equipped with bronze- (empirical [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Training loss (left) and average test AUROC (right) for the proposed PIQL model (red) versus the no-PI baseline (blue). PIQL exhibits accelerated learning, achieving lower loss and higher AUROC in fewer epochs. Despite minor fluctuations during teacher-forcing annealing, PIQL achieves similar loss and performance to the no-transfer variant (gray) at convergence. (best in color) 6.2 Ablation Analyses LLM-as… view at source ↗
Figure 3
Figure 3. Figure 3: (left) LLM-as-Program Encoder; (middle) GeneratorPI + MetaPI; (right) MetaPI++ 6.3 Layer-wise Representation Probing Finally, we use linear probes [6] (diagnostic classifiers) to analyze representations from each layer (0–9) of pretrained privileged vs. standard TFMs. Specifically, we predict two targets: (i) type of generator (5-way, balanced) and (ii) number of components (M=5-way balanced, for GMM￾gener… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of our PIQL diagram: the teacher-student TFM and training to transfer PI. [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Pairwise permutation test results for the 10-layer and 2-layer models. [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
read the original abstract

Training foundation models is computationally intensive and often slow to converge. We introduce PIQL,Privileged Information for Quick and Quality Learning, the first framework to systematically integrate privileged information (PI) to simultaneously accelerate learning and improve generalization in tabular foundation models (TFMs). We construct two complementary forms of PI: (i) aggregate dataset-level statistics that reduce the burden on in-context learning, and (ii) encodings of the underlying data-generating program, providing knowledge beyond observable data. We further design an architecture that effectively transfers the train-time-only PI by learning to reconstruct it from observed context at inference. We provide a theoretical analysis characterizing conditions under which PI reduces the population-level approximation gap and accelerates convergence in finite-data regimes. Empirical evidence shows that PIQL enables TFMs to achieve faster convergence, lower final loss, and better generalization, in effect, reducing data and compute requirements. Our work establishes PI-guided pretraining as a principled and practical paradigm for improving the efficiency and performance of foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PIQL, a framework for integrating privileged information (PI) into tabular foundation models (TFMs) to accelerate learning and improve generalization. Two forms of PI are constructed: aggregate dataset-level statistics and encodings of the data-generating program. An architecture is proposed to reconstruct this PI from observed context at inference time. Theoretical analysis characterizes conditions for reducing approximation gap and accelerating convergence, with empirical results showing faster convergence, lower loss, and better generalization, thereby reducing data and compute needs.

Significance. If the proposed reconstruction mechanism reliably infers the PI and the theoretical conditions translate to practice without instabilities, this work could establish a principled approach to improve efficiency of foundation models using LUPI. The theoretical analysis and empirical evidence are positive aspects, but the practical feasibility of PI transfer is key to the significance.

major comments (2)
  1. The architecture's ability to learn to reconstruct train-time-only PI (aggregate statistics and data-generating program encodings) from observed context at inference is central to the claims, yet no quantitative measures of reconstruction fidelity (e.g., MSE or mutual information) or ablation studies on reconstruction quality are referenced, leaving the link between theory and observed speed-ups unverified.
  2. The conditions under which PI reduces the population-level approximation gap and accelerates convergence in finite-data regimes need to be explicitly stated with any assumptions on the reconstruction error; if reconstruction is imperfect, the finite-data acceleration may not hold as claimed.
minor comments (2)
  1. The title appears to have a missing space: 'Models:LUPI' should be 'Models: LUPI'.
  2. The acronym PIQL is introduced but its expansion 'Privileged Information for Quick and Quality Learning' could be clarified earlier for readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the manuscript accordingly to strengthen the presentation of the reconstruction mechanism and its theoretical grounding.

read point-by-point responses
  1. Referee: The architecture's ability to learn to reconstruct train-time-only PI (aggregate statistics and data-generating program encodings) from observed context at inference is central to the claims, yet no quantitative measures of reconstruction fidelity (e.g., MSE or mutual information) or ablation studies on reconstruction quality are referenced, leaving the link between theory and observed speed-ups unverified.

    Authors: We agree that explicit quantitative evaluation of reconstruction fidelity is necessary to connect the theoretical claims to the observed empirical speed-ups. In the revised manuscript we will report MSE for reconstruction of the aggregate dataset-level statistics and mutual information for the data-generating program encodings. We will also add ablation studies that systematically vary reconstruction quality (via controlled noise injection or reduced context length) and measure the resulting effects on convergence rate, final loss, and generalization error. revision: yes

  2. Referee: The conditions under which PI reduces the population-level approximation gap and accelerates convergence in finite-data regimes need to be explicitly stated with any assumptions on the reconstruction error; if reconstruction is imperfect, the finite-data acceleration may not hold as claimed.

    Authors: We appreciate this observation. While the theoretical section already derives bounds on the approximation gap under privileged information, the dependence on reconstruction error was stated only implicitly. In the revision we will explicitly list the assumptions: (i) the reconstruction error is bounded by a term that vanishes as the length of observed context grows, and (ii) this error term appears additively in the finite-sample convergence rate. We will then show that the acceleration result continues to hold whenever the reconstruction error is o(1/sqrt(n)) in the finite-data regime, thereby clarifying the conditions under imperfect reconstruction. revision: yes

Circularity Check

0 steps flagged

No circularity: new framework elements and theory are independent of inputs

full rationale

The paper introduces PIQL as a novel framework, defines two new forms of privileged information (aggregate statistics and data-generating program encodings), designs a reconstruction architecture, and supplies a theoretical characterization of approximation-gap reduction. No equations, self-citations, or fitted parameters are shown to reduce any claimed prediction or gain to the inputs by construction. The derivation chain remains self-contained; empirical gains are presented as external validation rather than tautological outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the framework is presented at a high level without detailing any fitted constants or unproven assumptions beyond standard ML practice.

pith-pipeline@v0.9.0 · 5466 in / 1043 out tokens · 46045 ms · 2026-05-15T06:17:44.334259+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions.IEEE Transactions on Knowledge and Data Engineering, 17(6):734–749, 2005

    Gediminas Adomavicius and Alexander Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions.IEEE Transactions on Knowledge and Data Engineering, 17(6):734–749, 2005

  2. [2]

    The power of personalization: A systematic review of personality-adaptive chatbots.SN Computer Science, 4(5):661, 2023

    Tarek Ait Baha, Mohamed El Hajji, Youssef Es-Saady, and Hammou Fadili. The power of personalization: A systematic review of personality-adaptive chatbots.SN Computer Science, 4(5):661, 2023

  3. [3]

    Scheduled sampling for sequence prediction with recurrent neural networks

    Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), 2015

  4. [4]

    Unleashing the potential of prompt engineering for large language models.Patterns, 6(6), 2025

    Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, and Shengxin Zhu. Unleashing the potential of prompt engineering for large language models.Patterns, 6(6), 2025

  5. [5]

    Learning personas from dialogue with attentive memory networks

    Eric Chu, Prashanth Vijayaraghavan, and Deb Roy. Learning personas from dialogue with attentive memory networks. InProceedings of the 2018 conference on empirical methods in natural language processing, pages 2638–2646, 2018

  6. [6]

    What you can cram into a single vector: Probing sentence embeddings for linguistic properties

    Alexis Conneau, Germán Kruszewski, Guillaume Lample, Loïc Barrault, and Marco Baroni. What you can cram into a single vector: Probing sentence embeddings for linguistic properties. InACL, 2018

  7. [7]

    Dempster, Nan M

    Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society: Series B, 39(1):1–22, 1977

  8. [8]

    From zero to hero: Advancing zero-shot foundation models for tabular outlier detection.arXiv preprint arXiv:2602.03018, 2026

    Xueying Ding, Haomin Wen, Simon Klütterman, and Leman Akoglu. From zero to hero: Advancing zero-shot foundation models for tabular outlier detection.arXiv preprint arXiv:2602.03018, 2026

  9. [9]

    Fabian Falck, Ziyu Wang, and Christopher C. Holmes. Is in-context learning in large language models bayesian? a martingale perspective. InProceedings of the 41st International Conference on Machine Learning, pages 12784–12805. PMLR, 2024

  10. [10]

    What can transformers learn in-context? a case study of simple function classes.Advances in neural information processing systems, 35:30583–30598, 2022

    Shivam Garg, Dimitris Tsipras, Percy S Liang, and Gregory Valiant. What can transformers learn in-context? a case study of simple function classes.Advances in neural information processing systems, 35:30583–30598, 2022

  11. [11]

    xVal: A continuous number encoding for large language models

    Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno Régaldo- Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, and Shirley Ho. xVal: A continuous number encoding for large language models. InNeurIPS 2023 AI for Science Workshop, 2023

  12. [12]

    Adbench: Anomaly detection benchmark.Advances in Neural Information Processing Systems, 35, 2022

    Songqiao Han, Xiyang Hu, Hailiang Huang, Minqi Jiang, and Yue Zhao. Adbench: Anomaly detection benchmark.Advances in Neural Information Processing Systems, 35, 2022

  13. [13]

    TabPFN: A transformer that solves small tabular classification problems in a second

    Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. TabPFN: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

  14. [14]

    Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

    Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

  15. [15]

    Personamem-v2: Towards person- alized intelligence via learning implicit user personas and agentic memory.arXiv preprint arXiv:2512.06688, 2025

    Bowen Jiang, Yuan Yuan, Maohao Shen, Zhuoqun Hao, Zhangchen Xu, Zichen Chen, Ziyi Liu, Anvesh Rao Vijjini, Jiashu He, Hanchao Yu, et al. Personamem-v2: Towards person- alized intelligence via learning implicit user personas and agentic memory.arXiv preprint arXiv:2512.06688, 2025

  16. [16]

    Varshney, Caiming Xiong, and Richard Socher

    Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. CTRL: A conditional transformer language model for controllable generation.arXiv preprint arXiv:1909.05858, 2019. 10

  17. [17]

    Vilt: Vision-and-language transformer without convolution or region supervision

    Wonjae Kim, Bokyung Son, and Ildoo Kim. Vilt: Vision-and-language transformer without convolution or region supervision. InInternational conference on machine learning, pages 5583–5594. PMLR, 2021

  18. [18]

    The power of scale for parameter-efficient prompt tuning

    Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 3045–3059, 2021

  19. [19]

    Persona-aware alignment framework for personalized dialogue generation.Transactions of the Association for Computational Linguistics, 13:1722–1742, 2025

    Guanrong Li, Xinyu Liu, Zhen Wu, and Xinyu Dai. Persona-aware alignment framework for personalized dialogue generation.Transactions of the Association for Computational Linguistics, 13:1722–1742, 2025

  20. [20]

    Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

    Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InInternational conference on machine learning, pages 19730–19742. PMLR, 2023

  21. [21]

    Prefix-tuning: Optimizing continuous prompts for generation

    Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021

  22. [22]

    Temporal fusion transformers for interpretable multi-horizon time series forecasting.International journal of forecasting, 37(4):1748–1764, 2021

    Bryan Lim, Sercan Ö Arık, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International journal of forecasting, 37(4):1748–1764, 2021

  23. [23]

    Conditional Generative Adversarial Nets

    Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets.arXiv preprint arXiv:1411.1784, 2014

  24. [24]

    Cross-task gener- alization via natural language crowdsourcing instructions

    Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. Cross-task gener- alization via natural language crowdsourcing instructions. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, 2022

  25. [25]

    Transformers can do bayesian inference

    Samuel Müller, Noah Hollmann, Sebastian Pineda-Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InInternational Conference on Learning Representa- tions (ICLR), 2022

  26. [26]

    On the theory of learning with privileged information

    Dmitry Pechyony and Vladimir Vapnik. On the theory of learning with privileged information. InAdvances in Neural Information Processing Systems (NIPS), volume 23, pages 1894–1902, 2010

  27. [27]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

  28. [28]

    Zero-shot text-to-image generation

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InInternational conference on machine learning, pages 8821–8831. Pmlr, 2021

  29. [29]

    Introduction to recommender systems handbook

    Francesco Ricci, Lior Rokach, and Bracha Shapira. Introduction to recommender systems handbook. InRecommender Systems Handbook, pages 1–35. Springer, 2010

  30. [30]

    High- resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

  31. [31]

    A foundation model for zero-shot tabular outlier detection.Transactions on Machine Learning Research, 2025

    Yuchen Shen, Haomin Wen, and Leman Akoglu. A foundation model for zero-shot tabular outlier detection.Transactions on Machine Learning Research, 2025

  32. [32]

    van der Vaart.Asymptotic Statistics

    Aad W. van der Vaart.Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998. 11

  33. [33]

    Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(61):2023–2049, 2015

    Vladimir Vapnik and Rauf Izmailov. Learning using privileged information: Similarity control and knowledge transfer.Journal of Machine Learning Research, 16(61):2023–2049, 2015

  34. [34]

    Knowledge transfer in SVM and neural networks.Annals of Mathematics and Artificial Intelligence, 81(1):3–19, 2017

    Vladimir Vapnik and Rauf Izmailov. Knowledge transfer in SVM and neural networks.Annals of Mathematics and Artificial Intelligence, 81(1):3–19, 2017

  35. [35]

    A new learning paradigm: Learning using privileged information.Neural Networks, 22(5-6):544–557, 2009

    Vladimir Vapnik and Akshay Vashist. A new learning paradigm: Learning using privileged information.Neural Networks, 22(5-6):544–557, 2009

  36. [36]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Informa- tion Processing Systems (NIPS), pages 5998–6008, 2017

  37. [37]

    Transformers learn in-context by gradient descent

    Johannes V on Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. InInternational Conference on Machine Learning, pages 35151–35174. PMLR, 2023

  38. [38]

    CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation

    Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 8696–8708, 2021

  39. [39]

    Dai, and Quoc V Le

    Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022

  40. [40]

    An explanation of in-context learning as implicit bayesian inference.arXiv preprint arXiv:2111.02080, 2021

    Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in-context learning as implicit bayesian inference.arXiv preprint arXiv:2111.02080, 2021

  41. [41]

    Does multimodality lead to better time series forecasting?arXiv preprint arXiv:2506.21611,

    Xiyuan Zhang, Boran Han, Haoyang Fang, Abdul Fatir Ansari, Shuai Zhang, Danielle C Maddix, Cuixiong Hu, Andrew Gordon Wilson, Michael W Mahoney, Hao Wang, et al. When does multimodality lead to better time series forecasting?arXiv preprint arXiv:2506.21611, 2025

  42. [42]

    Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W

    Xiyuan Zhang, Danielle C. Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W. Mahoney, Cuixiong Hu, Huzefa Rangwala, George Karypis, and Bernie Wang. Mitra: Mixed synthetic priors for enhancing tabular foundation models. InThe Thirty-ninth Annual Conference on Neural Information Proc...

  43. [43]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. Qwen3 embed- ding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025. 12 Appendix Broader Impact and Limitations Broader Impact:Our work introduces a ...

  44. [44]

    (no transfer) reach No-PI’s loss at convergence @1500 epochs respectively @900 and @750 epochs

    Specifically, PIQL (transferred) and Silver Enc. (no transfer) reach No-PI’s loss at convergence @1500 epochs respectively @900 and @750 epochs. Figure 5a presents the corresponding permu- tation test results, showing that Silver Enc@750 outperforms No-PI@1500, and that PIQL@900 significantly outperforms both. 25 Similarly, we compare Gold@250, Silver@450...