arxiv: 2604.17257 · v2 · submitted 2026-04-19 · 💻 cs.CL · cs.AI

Recognition: unknown

REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning

Seungmin Lee , Jeonghwan Lee , Hyunkuk Lim , Sejoon Kim , Mingi Sung

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:10 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords representation regularizationtext embeddingspre-finetuningdomain adaptationcontrastive learningeigenspace analysistask biasembedding geometry

0 comments

The pith

REZE controls representation shifts during text embedding pre-finetuning by shrinking task-variant directions in the eigenspace while preserving semantic structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that contrastive pre-finetuning on scattered heterogeneous tasks injects task-induced bias into text embeddings, distorting their geometry and degrading downstream performance. REZE counters this by decomposing relations between anchor and positive pairs into eigencomponents, measuring dispersion per task along each direction, and applying adaptive soft-shrinkage to suppress only the variant noise. The approach keeps shifts aligned with the original pretrained manifold and requires no extra computation at inference. A sympathetic reader would care because pre-finetuning is widely used to adapt embeddings to specialized domains, yet uncontrolled shifts make the resulting models unreliable across benchmarks.

Core claim

REZE is a representation regularization framework that explicitly controls representation shift during embedding pre-finetuning. It operates on the relations of anchor-positive pairs and decomposes them in an eigenspace. It then measures task-wise dispersion along each eigencomponent to identify task-variant directions and applies adaptive soft-shrinkage to suppress task-induced noise while preserving task-invariant semantic structure, without inference-time overhead.

What carries the argument

Eigenspace decomposition of anchor-positive pair relations combined with task-wise dispersion measurement and adaptive soft-shrinkage to separate task-variant noise from invariant semantic structure.

If this is right

REZE outperforms standard pre-finetuning and isotropy-oriented post-hoc regularization on most embedding backbones and specialized benchmarks.
It remains stable in settings where existing PFT variants collapse under heterogeneous supervision.
Embedding space analyses show that REZE produces controlled shifts aligned with the original pretrained manifold.
The regularization adds no overhead during inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Dispersion measurement in eigencomponents could serve as a diagnostic tool for bias in other contrastive adaptation pipelines facing mixed data sources.
Applying similar shrinkage during the initial pretraining stage itself might limit bias accumulation before any domain adaptation occurs.
The approach could extend to vision or multimodal embeddings where heterogeneous supervision also distorts representation geometry.

Load-bearing premise

That task-induced bias from heterogeneous supervision is the dominant driver of harmful representation shifts and that eigenspace dispersion reliably separates that noise from useful semantic information.

What would settle it

Run REZE on a pre-finetuning dataset where all tasks come from a single homogeneous distribution; if gains over standard PFT disappear, the dispersion-based separation is not capturing generalizable task variance.

Figures

Figures reproduced from arXiv: 2604.17257 by Hyunkuk Lim, Jeonghwan Lee, Mingi Sung, Sejoon Kim, Seungmin Lee.

**Figure 2.** Figure 2: Effect of the regularization weight α on domain-average performance and their overall mean, across different training sample (# samples = 100/500/1000), with the shrink strength fixed to η = 0.7 [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: IsoScore comparison between PFT and REZE [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Embedding space visualization across three benchmarks. All datasets within each benchmark are encoded [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Recent text embedding models are often adapted to specialized domains via contrastive pre-finetuning (PFT) on a naive collection of scattered, heterogeneous tasks. However, this approach often introduces task-induced bias alongside domain knowledge, leading to uncontrolled representation shifts that distort the pretrained embedding geometry and cause substantial performance degradation. To address this issue, we propose REZE, a representation regularization framework that explicitly controls representation shift during embedding pre-finetuning. REZE operates on the relations of anchor-positive pairs and decomposes them in an eigenspace. It then measures task-wise dispersion along each eigencomponent to identify task-variant directions and applies adaptive soft-shrinkage to suppress task-induced noise while preserving task-invariant semantic structure, without inference-time overhead. Experiments across multiple embedding backbones and specialized benchmarks show that REZE outperforms standard pre-finetuning and isotropy-oriented post-hoc regularization in most settings, remaining stable where existing PFT variants collapse. Embedding space analyses further confirm that REZE induces controlled shifts aligned with the original embedding manifold, underscoring representation shift control as a key principle for robust embedding pre-finetuning under heterogeneous supervision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

REZE adds eigenspace decomposition and adaptive shrinkage to limit task-induced shifts in embedding pre-finetuning, with experiments showing stability gains, but the evidence that dispersion isolates the intended bias remains indirect. The paper identifies a practical issue: contrastive pre-finetuning on mixed tasks often distorts pretrained geometry and hurts downstream performance. It proposes to decompose anchor-positive relations into an eigenspace, measure per-task dispersion along each component, and apply soft shrinkage only to the high-dispersion directions. This is a clear step beyond standard PFT or post-hoc isotropy fixes, and the method runs at training time with no inference cost. Experiments on several embedding backbones and domain-specific benchmarks report better average results and fewer collapses than the baselines. The embedding-space visualizations also show shifts that stay closer to the original manifold. Those are the concrete positives. The main weakness is that the paper does not directly test whether the dispersion metric actually flags task-variant noise rather than batch effects, pair-construction artifacts, or low-variance directions unrelated to supervision. The analyses confirm controlled shifts after the fact, but they do not include controls that would separate the claimed mechanism from generic regularization. The adaptive shrinkage step also depends on choices for the shrinkage strength and the number of components kept, and it is not clear how sensitive final performance is to those choices. This work is aimed at people who adapt embeddings to specialized domains and need training-time fixes that do not require new data or post-processing. Readers who already run contrastive PFT on heterogeneous collections will find the stability results useful to try. It deserves a serious referee because the problem is real, the proposed fix is new enough to discuss, and the empirical comparisons are there to evaluate. Reviewers will likely press on the mechanistic validation, but that is the right kind of question for this paper.

Referee Report

2 major / 2 minor

Summary. The paper proposes REZE, a representation regularization framework for domain-adaptive text embedding pre-finetuning on heterogeneous tasks. It decomposes anchor-positive pair relations in an eigenspace, measures task-wise dispersion along eigencomponents to identify task-variant directions, and applies adaptive soft-shrinkage to suppress task-induced noise while preserving task-invariant semantic structure, without inference-time overhead. Experiments across multiple embedding backbones and specialized benchmarks claim that REZE outperforms standard pre-finetuning and isotropy-oriented post-hoc regularization in most settings, remains stable where existing PFT variants collapse, and induces controlled shifts aligned with the original embedding manifold.

Significance. If the core mechanism and experimental claims hold, this could meaningfully advance robust domain adaptation for text embeddings by treating representation shift control as a first-class principle under heterogeneous supervision. The no-inference-overhead design and focus on eigenspace-based adaptive regularization are practical strengths. However, the significance is limited by the absence of direct validation that dispersion isolates task-induced bias rather than generic factors, which weakens the mechanistic interpretation of the stability gains.

major comments (2)

[Method (eigenspace decomposition and dispersion measurement)] The central claim that task-wise dispersion along eigencomponents of anchor-positive relations reliably separates task-variant noise from task-invariant semantics (enabling targeted soft-shrinkage) is load-bearing but insufficiently validated. Embedding space analyses show controlled shifts, yet there is no direct test (e.g., correlation of dispersion scores with task labels, ablation removing dispersion-based selection, or comparison against batch-statistic controls) to rule out that dispersion instead reflects pair-construction artifacts or low-variance noise unrelated to heterogeneous supervision.
[Experiments and results] Experimental claims of outperformance and stability across backbones and benchmarks lack reported error bars, statistical significance tests, or full ablation tables on the adaptive shrinkage hyperparameters and dispersion thresholds. This makes it impossible to determine whether superiority is consistent or attributable to the proposed mechanism versus generic regularization effects.

minor comments (2)

[Abstract] The abstract uses vague qualifiers such as 'most settings' and 'substantial performance degradation' without any quantitative anchors; adding specific deltas or ranges would improve clarity.
[Method] Notation for eigencomponents and dispersion metrics should be introduced with explicit equations early in the method section to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We appreciate the acknowledgment of REZE's practical design and have addressed the major comments on mechanistic validation and experimental rigor below, with revisions incorporated where the concerns are valid.

read point-by-point responses

Referee: [Method (eigenspace decomposition and dispersion measurement)] The central claim that task-wise dispersion along eigencomponents of anchor-positive relations reliably separates task-variant noise from task-invariant semantics (enabling targeted soft-shrinkage) is load-bearing but insufficiently validated. Embedding space analyses show controlled shifts, yet there is no direct test (e.g., correlation of dispersion scores with task labels, ablation removing dispersion-based selection, or comparison against batch-statistic controls) to rule out that dispersion instead reflects pair-construction artifacts or low-variance noise unrelated to heterogeneous supervision.

Authors: We agree that the current embedding-space analyses provide only indirect support and that direct tests would strengthen the interpretation. In the revised manuscript we add (i) a correlation analysis between per-component dispersion scores and task labels across the heterogeneous collection, (ii) an ablation that disables dispersion-based direction selection and substitutes random or batch-statistic controls, and (iii) a brief clarification in Section 3.2 explaining why the task-wise formulation inherently isolates supervision-induced variation. These additions are reported in new Tables 4–5 and Figure 3. revision: yes
Referee: [Experiments and results] Experimental claims of outperformance and stability across backbones and benchmarks lack reported error bars, statistical significance tests, or full ablation tables on the adaptive shrinkage hyperparameters and dispersion thresholds. This makes it impossible to determine whether superiority is consistent or attributable to the proposed mechanism versus generic regularization effects.

Authors: We accept this observation. The revised version now reports standard-deviation error bars over five random seeds for every main result, includes paired t-tests (p < 0.05) against all baselines, and supplies complete ablation tables for the shrinkage coefficient and dispersion threshold (new Appendix C). These tables confirm that gains remain consistent and are driven by the adaptive component rather than generic regularization. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on explicit eigenspace decomposition and adaptive shrinkage without reducing to fitted inputs or self-citation chains.

full rationale

The paper's core derivation decomposes anchor-positive relations into an eigenspace, computes task-wise dispersion per eigencomponent to flag variant directions, and applies adaptive soft-shrinkage to suppress noise while preserving invariant structure. No quoted equations or steps reduce any claimed prediction or result to the inputs by construction (e.g., no dispersion metric defined circularly in terms of the shrinkage it enables, and no load-bearing uniqueness theorem imported from the authors' prior work). The method introduces independent regularization mechanics on top of standard contrastive PFT, with experimental validation on external benchmarks providing falsifiable content outside any internal fits. This is the common case of a self-contained proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only access prevents identification of specific free parameters, axioms, or invented entities. The method appears to build on standard contrastive pairs and linear algebra for eigenspace analysis, with adaptive shrinkage likely requiring at least one tunable strength parameter not detailed here.

pith-pipeline@v0.9.0 · 5507 in / 1152 out tokens · 55320 ms · 2026-05-10T06:10:31.799656+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 22 canonical work pages · 7 internal anchors

[2]

2025 , eprint=

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics , author=. 2025 , eprint=

2025
[6]

Using the Nystr

Williams, Christopher and Seeger, Matthias , journal=. Using the Nystr
[8]

Journal of the American statistical association , volume=

Exploratory projection pursuit , author=. Journal of the American statistical association , volume=. 1987 , publisher=

1987
[9]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Kernel-whitening: Overcome dataset bias with isotropic sentence embedding , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

2022
[10]

Pacific-Asia Conference on Knowledge Discovery and Data Mining , pages=

Isotropic representation can improve dense retrieval , author=. Pacific-Asia Conference on Knowledge Discovery and Data Mining , pages=. 2023 , organization=

2023
[12]

IEEE transactions on pattern analysis and machine intelligence , volume=

Normalizing flows: An introduction and review of current methods , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

2020
[13]

Advances in neural information processing systems , volume=

Glow: Generative flow with invertible 1x1 convolutions , author=. Advances in neural information processing systems , volume=
[14]

International conference on artificial neural networks , pages=

Learning to remove: Towards isotropic pre-trained bert embedding , author=. International conference on artificial neural networks , pages=. 2021 , organization=

2021
[15]

Findings of the Association for Computational Linguistics: ACL 2022 , pages=

IsoScore: Measuring the uniformity of embedding space utilization , author=. Findings of the Association for Computational Linguistics: ACL 2022 , pages=

2022
[16]

6th International Conference on Learning Representations, ICLR 2018 , year=

All-but-the-top: Simple and effective post-processing for word representations , author=. 6th International Conference on Learning Representations, ICLR 2018 , year=

2018
[17]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=

Mteb: Massive text embedding benchmark , author=. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=
[19]

2024 , editor =

Shiraee Kasmaee, Ali and Khodadad, Mohammad and Arshi Saloot, Mohammad and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila , booktitle =. 2024 , editor =

2024
[20]

2024 , eprint=

mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval , author=. 2024 , eprint=

2024
[22]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[25]

Large Dual Encoders Are Generalizable Retrievers

Ni, Jianmo and Qu, Chen and Lu, Jing and Dai, Zhuyun and Hernandez Abrego, Gustavo and Ma, Ji and Zhao, Vincent and Luan, Yi and Hall, Keith and Chang, Ming-Wei and Yang, Yinfei. Large Dual Encoders Are Generalizable Retrievers. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.669

work page doi:10.18653/v1/2022.emnlp-main.669 2022
[27]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Improving gradient trade-offs between tasks in multi-task text classification , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[32]

Universal Sentence Encoder

Universal sentence encoder , author=. arXiv preprint arXiv:1803.11175 , year=

work page Pith review arXiv
[34]

, author=

Dense Passage Retrieval for Open-Domain Question Answering. , author=. EMNLP (1) , pages=
[35]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

On isotropy, contextualization and learning dynamics of contrastive-based sentence representation learning , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023
[38]

Randall Balestriero and Yann LeCun. 2025. https://arxiv.org/abs/2511.08544 Lejepa: Provable and scalable self-supervised learning without the heuristics . Preprint, arXiv:2511.08544

work page arXiv 2025
[39]

Heyan Chai, Jinhao Cui, Ye Wang, Min Zhang, Binxing Fang, and Qing Liao. 2023. Improving gradient trade-offs between tasks in multi-task text classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2565--2579

2023
[40]

Laurent Dinh, David Krueger, and Yoshua Bengio. 2014. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516

work page internal anchor Pith review arXiv 2014
[41]

Kawin Ethayarajh. 2019. How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:1909.00512

work page arXiv 2019
[42]

Jerome H Friedman. 1987. Exploratory projection pursuit. Journal of the American statistical association, 82(397):249--266

1987
[43]

SongYang Gao, Shihan Dou, Qi Zhang, and Xuan-Jing Huang. 2022. Kernel-whitening: Overcome dataset bias with isotropic sentence embedding. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4112--4122

2022
[44]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.552 S im CSE : Simple contrastive learning of sentence embeddings . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894--6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics

work page doi:10.18653/v1/2021.emnlp-main.552 2021
[45]

Junjie Huang, Duyu Tang, Wanjun Zhong, Shuai Lu, Linjun Shou, Ming Gong, Daxin Jiang, and Nan Duan. 2021. https://doi.org/10.18653/v1/2021.findings-emnlp.23 W hitening BERT : An easy unsupervised sentence embedding approach . In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 238--244, Punta Cana, Dominican Republic. Associati...

work page doi:10.18653/v1/2021.findings-emnlp.23 2021
[46]

Euna Jung, Jungwon Park, Jaekeol Choi, Sungyoon Kim, and Wonjong Rhee. 2023. Isotropic representation can improve dense retrieval. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 125--137. Springer

2023
[47]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick SH Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In EMNLP (1), pages 6769--6781

2020
[48]

Durk P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31

2018
[49]

Ivan Kobyzev, Simon JD Prince, and Marcus A Brubaker. 2020. Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11):3964--3979

2020
[50]

Changho Lee, Janghoon Han, Seonghyeon Ye, Stanley Jungkyu Choi, Honglak Lee, and Kyunghoon Bae. 2024. Instruction matters: A simple yet effective task selection for optimized instruction tuning of specific tasks. arXiv preprint arXiv:2404.16418

work page arXiv 2024
[51]

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the sentence embeddings from pre-trained language models. arXiv preprint arXiv:2011.05864

work page arXiv 2020
[52]

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281

work page internal anchor Pith review arXiv 2023
[53]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

work page internal anchor Pith review arXiv 2013
[54]

Jiaqi Mu and Pramod Viswanath. 2018. All-but-the-top: Simple and effective post-processing for word representations. In 6th International Conference on Learning Representations, ICLR 2018

2018
[55]

Niklas Muennighoff, Nouamane Tazi, Lo \" c Magne, and Nils Reimers. 2023. Mteb: Massive text embedding benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014--2037

2023
[56]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748

work page internal anchor Pith review Pith/arXiv arXiv 2018
[57]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. https://doi.org/10.3115/v1/D14-1162 G lo V e: Global vectors for word representation . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pages 1532--1543, Doha, Qatar. Association for Computational Linguistics

work page doi:10.3115/v1/d14-1162 2014
[58]

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084

work page internal anchor Pith review arXiv 2019
[59]

Miguel Romero, Shuoyang Ding, Corey D Barret, Georgiana Dinu, and George Karypis. 2025. Beyond instruction-conditioning, mote: Mixture of task experts for multi-task embedding models. arXiv preprint arXiv:2506.17781

work page arXiv 2025
[60]

William Rudman, Nate Gillman, Taylor Rayne, and Carsten Eickhoff. 2022. Isoscore: Measuring the uniformity of embedding space utilization. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3325--3339

2022
[61]

Ali Shiraee Kasmaee, Mohammad Khodadad, Mohammad Arshi Saloot, Nick Sherck, Stephen Dokas, Hamidreza Mahyar, and Soheila Samiee. 2024. https://proceedings.mlr.press/v262/shiraee-kasmaee24a.html ChemTEB : Chemical text embedding benchmark, an overview of embedding models performance efficiency on a specific domain . In Proceedings of The 4th NeurIPS Effici...

2024
[62]

Jianlin Su, Jiarun Cao, Weijie Liu, and Yangyiwen Ou. 2021. Whitening sentence representations for better semantics and faster retrieval. arXiv preprint arXiv:2103.15316

work page arXiv 2021
[63]

Yixuan Tang and Yi Yang. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.179 F in MTEB : Finance massive text embedding benchmark . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3620--3638, Suzhou, China. Association for Computational Linguistics

work page doi:10.18653/v1/2025.emnlp-main.179 2025
[64]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533

work page internal anchor Pith review arXiv 2022
[65]

Benjamin Warner, Antoine Chaffin, Benjamin Clavi \'e , Orion Weller, Oskar Hallstr \"o m, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, and 1 others. 2025. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference. In Proceedings of the 63rd Annual Me...

2025
[66]

Christopher Williams and Matthias Seeger. 2000. Using the nystr \"o m method to speed up kernel machines. Advances in neural information processing systems, 13

2000
[67]

Chenghao Xiao, Yang Long, and Noura Al Moubayed. 2023. On isotropy, contextualization and learning dynamics of contrastive-based sentence representation learning. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12266--12283

2023
[68]

Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, and Min Zhang. 2024 a . https://arxiv.org/abs/2407.19669 mgte: Generalized long-context text representation and reranking models for multilingual text retrieval . Preprint, arXiv:2407.19669

work page arXiv 2024
[69]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, and 1 others. 2025. Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176

work page internal anchor Pith review arXiv 2025
[70]

Zhi Zhang, Jiayi Shen, Congfeng Cao, Gaole Dai, Shiji Zhou, Qizhe Zhang, Shanghang Zhang, and Ekaterina Shutova. 2024 b . Proactive gradient conflict mitigation in multi-task learning: A sparse training perspective. arXiv preprint arXiv:2411.18615

work page arXiv 2024
[71]

Kun Zhou, Beichen Zhang, Wayne Xin Zhao, and Ji-Rong Wen. 2022. Debiased contrastive learning of unsupervised sentence representations. arXiv preprint arXiv:2205.00656

work page arXiv 2022