Recognition: 2 theorem links
· Lean TheoremLayer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs
Pith reviewed 2026-05-14 21:45 UTC · model grok-4.3
The pith
Layer-wise dynamics in language models reveal performance signals beyond final representations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Applying LRD to 31 models on 30 MTEB tasks reveals architectural and task-level differences that are not apparent from final-layer representations alone. Model-level scores correlate positively with downstream MTEB performance, with end-to-end subspace displacement the strongest predictor. For inference-time pruning, GFMI is the only measurement-guided rule that beats random selection at the 15 percent and 20 percent budgets and shows the best median change at every budget tested.
What carries the argument
Layer-wise Representation Dynamics (LRD), a framework using Frenet measurements for subspace speed and curvature, Neighborhood Retention Score for local neighbor preservation, and Graph Filtration Mutual Information for final-layer alignment.
If this is right
- End-to-end subspace displacement serves as the strongest single predictor of downstream MTEB performance for label-free model ranking.
- GFMI-based layer selection outperforms random pruning at moderate compute budgets while other LRD scores do not transfer as reliably.
- Encoder-based and decoder-based models exhibit distinct layer-wise motion patterns that final embeddings obscure.
- Task categories such as retrieval versus classification show different rates of representation change across layers.
Where Pith is reading between the lines
- If the correlations generalize, LRD scores could be used to design training objectives that encourage or suppress specific layer-wise behaviors.
- The same measurements might help diagnose why certain models perform well on narrow tasks but degrade on others by pinpointing the layers where useful structure is lost.
- Extending the framework to very large models could test whether pruning guided by GFMI scales to reduce inference cost without retraining.
Load-bearing premise
The three proposed measurements capture dynamics that are causally relevant to downstream performance rather than merely correlated on the tested set of models and tasks.
What would settle it
Re-running the model-selection and pruning experiments on a fresh collection of models and tasks where the LRD scores no longer correlate with performance or fail to beat random pruning would falsify the claim that layer-wise structure supplies useful deployment signals.
Figures
read the original abstract
Hidden states change substantially across the layers of modern language models, but most layer-wise analyses focus on one aspect of that change. We propose Layer-wise Representation Dynamics (LRD), a framework with three layer-wise measurement families: Frenet (Grassmann speed and curvature) for global subspace motion, Neighborhood Retention Score (NRS) for local nearest-neighbor retention, and Graph Filtration Mutual Information (GFMI) for alignment with the final layer. Applying LRD to 31 models (encoder-based and decoder-based embedders, plus base LLMs) on 30 MTEB tasks reveals architectural and task-level differences that are not apparent from final-layer representations alone. We then use LRD for two applications: label-free model selection and inference-time layer pruning. For selection, all three model-level scores correlate positively with downstream MTEB performance, with end-to-end subspace displacement (d_{0,L}) the strongest, and the same direction holds on a smaller base-LLM MMLU panel. For pruning, GFMI is the only measurement-guided rule that beats Random at the 15% and 20% budgets and has the best median change at every budget. Frenet is effective only at the lightest budget, while NRS does not transfer from model selection to pruning. These results show that layer-wise structure provides signal for both interpretation and deployment decisions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Layer-wise Representation Dynamics (LRD) framework with three measurement families: Frenet for global subspace motion using Grassmann speed and curvature, Neighborhood Retention Score (NRS) for local nearest-neighbor retention, and Graph Filtration Mutual Information (GFMI) for alignment with the final layer. Applying LRD to 31 models on 30 MTEB tasks reveals architectural and task-level differences not apparent from final-layer representations. The framework is applied to label-free model selection, where all three scores correlate positively with MTEB performance (d_{0,L} strongest), and to inference-time layer pruning, where GFMI outperforms random at 15% and 20% budgets.
Significance. If the LRD measurements capture dynamics independent of scale and architecture, the work could supply useful tools for interpreting LLM internals and guiding deployment choices such as label-free selection and pruning. The evaluation span across 31 models and two applications is a positive feature, though the empirical correlations require stronger controls to support the claimed utility.
major comments (2)
- [Model Selection Application] Model Selection Application: the positive correlations between the three LRD scores (including d_{0,L}) and MTEB performance are reported without regression controls or matching for model family, parameter count, or embedding dimension. The 31 models comprise three distinct families that differ systematically in both average performance and representation evolution, so the correlations may be driven by these confounders rather than by independent signal from the proposed layer-wise measurements.
- [Pruning Experiments] Pruning Experiments: GFMI is stated to beat random pruning at the 15% and 20% budgets with the best median change at every budget, yet no error bars, number of runs, task-level variance, or statistical significance tests are provided. This omission prevents assessment of whether the reported improvements are robust or merely within noise.
minor comments (2)
- [Abstract] Abstract: the term 'end-to-end subspace displacement (d_{0,L})' is used without a definition or equation reference, reducing immediate clarity for readers unfamiliar with the Frenet family.
- [Methodology] Methodology: the exact algorithmic definitions and any hyperparameters for computing Frenet, NRS, and GFMI are not fully detailed in the text, which would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and describe the revisions that will be incorporated.
read point-by-point responses
-
Referee: [Model Selection Application] Model Selection Application: the positive correlations between the three LRD scores (including d_{0,L}) and MTEB performance are reported without regression controls or matching for model family, parameter count, or embedding dimension. The 31 models comprise three distinct families that differ systematically in both average performance and representation evolution, so the correlations may be driven by these confounders rather than by independent signal from the proposed layer-wise measurements.
Authors: We agree that the reported correlations would be strengthened by explicit controls for model family, parameter count, and embedding dimension. In the revised manuscript we will add a multiple linear regression that includes dummy variables for the three model families, log(parameter count), and embedding dimension as covariates. We will report the partial coefficients and p-values for each LRD score (with particular emphasis on d_{0,L}). In addition, we will include within-family correlation tables to demonstrate that the positive relationship is not solely an artifact of cross-family differences. These controls will clarify the independent contribution of the layer-wise measurements. revision: yes
-
Referee: [Pruning Experiments] Pruning Experiments: GFMI is stated to beat random pruning at the 15% and 20% budgets with the best median change at every budget, yet no error bars, number of runs, task-level variance, or statistical significance tests are provided. This omission prevents assessment of whether the reported improvements are robust or merely within noise.
Authors: We acknowledge that the pruning results lack the statistical detail needed to evaluate robustness. In the revision we will report: (i) mean and standard error across five independent runs that differ only in random seed for layer selection; (ii) per-task performance deltas with inter-task variance; and (iii) p-values from paired Wilcoxon signed-rank tests comparing each guided rule against the random baseline at every budget. These additions will allow readers to judge whether the observed median improvements exceed noise. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper defines three independent measurement families (Frenet subspace dynamics, Neighborhood Retention Score, and Graph Filtration Mutual Information) via explicit geometric and information-theoretic constructions on hidden-state matrices. These are computed directly from layer activations without reference to downstream MTEB or MMLU labels. Observed positive correlations between the resulting model-level aggregates (e.g., end-to-end displacement d_{0,L}) and task performance are reported as empirical findings across 31 models, not as quantities obtained by fitting parameters to the target metrics. No equations reduce the reported scores to fitted inputs by construction, no self-citations supply load-bearing uniqueness theorems, and no ansatzes are smuggled via prior work. The model-selection and pruning applications are post-hoc uses of the pre-defined measurements rather than derivations that collapse to the inputs.
Axiom & Free-Parameter Ledger
invented entities (3)
-
Frenet (Grassmann speed and curvature)
no independent evidence
-
Neighborhood Retention Score (NRS)
no independent evidence
-
Graph Filtration Mutual Information (GFMI)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearWe propose Layer-wise Representation Dynamics (LRD), a framework with three layer-wise measurement families: Frenet (Grassmann speed and curvature) for global subspace motion, Neighborhood Retention Score (NRS) for local nearest-neighbor retention, and Graph Filtration Mutual Information (GFMI) for alignment with the final layer.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearApplying LRD to 31 models on 30 MTEB tasks reveals architectural and task-level differences...
Reference graph
Works this paper leans on
-
[1]
Princeton University Press, 2008
P-A Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, 2008
work page 2008
-
[2]
The falcon series of open language models.arXiv preprint arXiv:2311.16867, 2023
Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxan- dra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, et al. The falcon series of open language models.arXiv preprint arXiv:2311.16867, 2023
-
[3]
Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural computation, 15(6):1373–1396, 2003
work page 2003
-
[4]
Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples.Journal of machine learning research, 7(11), 2006
work page 2006
-
[5]
A full-text learning to rank dataset for medical information retrieval
Vera Boteva, Demian Gholipour, Artem Sokolov, and Stefan Riezler. A full-text learning to rank dataset for medical information retrieval. InEuropean Conference on Information Retrieval (ECIR), 2016
work page 2016
-
[6]
Efficient intent detection with dual sentence encoders
Iñigo Casanueva, Tadas Temˇcinas, Daniela Gerz, Matthew Henderson, and Ivan Vuli´c. Efficient intent detection with dual sentence encoders. InProceedings of the 2nd workshop on natural language processing for conversational AI, pages 38–45, 2020
work page 2020
-
[7]
Semeval- 2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation
Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. Semeval- 2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pages 1–14, 2017
work page 2017
-
[8]
Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3- embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv preprint arXiv:2402.03216, 4(5), 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld. SPECTER: Document-level representation learning using citation-informed transformers. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020
work page 2020
-
[10]
Tianyu Cui, Yogesh Kumar, Pekka Marttinen, and Samuel Kaski. Deconfounded representation similarity for comparison of neural networks.Advances in Neural Information Processing Systems, 35:19138–19151, 2022
work page 2022
-
[11]
Maxime Darrin, Philippe Formont, Ismail B AYED, Jackie C CHEUNG, and Pablo Piantanida. When is an embedding model more promising than another?Advances in Neural Information Processing Systems, 37:68330–68379, 2024
work page 2024
-
[12]
Topological persistence and simplification.Discrete & computational geometry, 28(4):511–533, 2002
Edelsbrunner, Letscher, and Zomorodian. Topological persistence and simplification.Discrete & computational geometry, 28(4):511–533, 2002
work page 2002
-
[13]
Kawin Ethayarajh. How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 55–65, 2019
work page 2019
-
[14]
Reducing transformer depth on demand with structured dropout.arXiv preprint arXiv:1909.11556, 2019
Angela Fan, Edouard Grave, and Armand Joulin. Reducing transformer depth on demand with structured dropout.arXiv preprint arXiv:1909.11556, 2019
-
[15]
Mohsen Fayyaz, Ehsan Aghazadeh, Ali Modarressi, Hosein Mohebbi, and Mohammad Taher Pilehvar. Not all models localize linguistic knowledge in the same place: A layer-wise probing on bertoids’ representations. InProceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 375–388, 2021. 10
work page 2021
-
[16]
Jack FitzGerald, Christopher Hench, Charith Peris, Scott Mackie, Kay Rottmann, Ana Sanchez, Aaron Nash, Liam Urbach, Vishesh Kakarala, Richa Singh, et al. MASSIVE: A 1m-example multilingual natural language understanding dataset with 51 typologically-diverse languages. InProceedings of the 60th Annual Meeting of the Association for Computational Linguisti...
work page 2022
-
[17]
Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu. Representation degeneration problem in training natural language generation models.arXiv preprint arXiv:1907.12009, 2019
-
[18]
Representa- tion similarity reveals implicit layer grouping in neural networks
Tian Gao, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, and Dennis Wei. Representa- tion similarity reveals implicit layer grouping in neural networks. InMechanistic Interpretability Workshop at NeurIPS 2025, 2025
work page 2025
-
[19]
Quentin Garrido, Randall Balestriero, Laurent Najman, and Yann Lecun. Rankme: Assessing the downstream performance of pretrained self-supervised representations by their rank. In International conference on machine learning, pages 10929–10974. PMLR, 2023
work page 2023
-
[20]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
The unreasonable ineffectiveness of the deeper layers
Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, and Dan Roberts. The unreasonable ineffectiveness of the deeper layers. InThe Thirteenth International Conference on Learning Representations, 2024
work page 2024
-
[22]
Furkan Gürsoy, Mounir Haddad, and Cécile Bothorel. Alignment and stability of embeddings: measurement and inference improvement.Neurocomputing, 553:126517, 2023
work page 2023
-
[23]
DBpedia-Entity v2: A test collection for entity search
Faegheh Hasibi, Fedor Nikolaev, Chenyan Xiong, Krisztian Balog, Svein Erik Bratsberg, Alexander Kotov, and Jamie Callan. DBpedia-Entity v2: A test collection for entity search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017
work page 2017
-
[24]
Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding.arXiv preprint arXiv:2009.03300, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[25]
A structural probe for finding syntax in word representations
John Hewitt and Christopher D Manning. A structural probe for finding syntax in word representations. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, 2019
work page 2019
-
[26]
Doris Hoogeveen, Karin M. Verspoor, and Timothy Baldwin. CQADupStack: A benchmark data set for community question-answering research. InProceedings of the 20th Australasian Document Computing Symposium (ADCS), 2015
work page 2015
-
[27]
Eghbal Hosseini and Evelina Fedorenko. Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language.Ad- vances in Neural Information Processing Systems, 36:43918–43930, 2023
work page 2023
-
[28]
Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. What does bert learn about the structure of language? InProceedings of the 57th annual meeting of the association for computational linguistics, pages 3651–3657, 2019
work page 2019
-
[29]
Jiachen Jiang, Jinxin Zhou, and Zhihui Zhu. Tracing representation progression: Analyzing and enhancing layer-wise similarity.arXiv preprint arXiv:2406.14479, 2024
-
[30]
Yihang Jiang, Xiaoyang Li, Guangxu Zhu, Hang Li, Jing Deng, Kaifeng Han, Chao Shen, Qingjiang Shi, and Rui Zhang. 6g non-terrestrial networks enabled low-altitude economy: Opportunities and challenges.arXiv preprint arXiv:2311.09047, 2023. 11
-
[31]
Similarity of neural network representations revisited
Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pages 3519–3529. PMlR, 2019
work page 2019
-
[32]
Nikolaus Kriegeskorte, Marieke Mur, and Peter A Bandettini. Representational similarity analysis-connecting the branches of systems neuroscience.Frontiers in systems neuroscience, 2:249, 2008
work page 2008
-
[33]
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al. Natural questions: a benchmark for question answering research.Transactions of the Association for Computational Linguistics, 7:453–466, 2019
work page 2019
-
[34]
A continuously growing dataset of sentential paraphrases
Wuwei Lan, Siyu Qiu, Hua He, and Wei Xu. A continuously growing dataset of sentential paraphrases. InProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017
work page 2017
-
[35]
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. Nv-embed: Improved techniques for training llms as generalist embedding models.arXiv preprint arXiv:2405.17428, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
MTOP: A comprehensive multilingual task-oriented semantic parsing benchmark
Haoran Li, Abhinav Arora, Shuohui Chen, Anchit Gupta, Sonal Gupta, and Yashar Mehdad. MTOP: A comprehensive multilingual task-oriented semantic parsing benchmark. InProceed- ings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021
work page 2021
-
[37]
Melody Zixuan Li, Kumar Krishna Agrawal, Arna Ghosh, Komal Kumar Teru, Adam Santoro, Guillaume Lajoie, and Blake A Richards. Tracing the representation geometry of language models from pretraining to post-training.arXiv preprint arXiv:2509.23024, 2025
-
[38]
Towards General Text Embeddings with Multi-stage Contrastive Learning
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
Gui Ling, Ziyang Wang, Yuliang Yan, and Qingwen Liu. Slimgpt: Layer-wise structured pruning for large language models.Advances in Neural Information Processing Systems, 37: 107112–107137, 2024
work page 2024
-
[40]
Linguistic knowledge and transferability of contextual representations
Nelson F Liu, Matt Gardner, Yonatan Belinkov, Matthew E Peters, and Noah A Smith. Linguistic knowledge and transferability of contextual representations. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1073–1094, 2019
work page 2019
-
[41]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y . Ng, and Christopher Potts. Learning word vectors for sentiment analysis. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011
work page 2011
-
[42]
A SICK cure for the evaluation of compositional distributional semantic models
Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardini, and Roberto Zamparelli. A SICK cure for the evaluation of compositional distributional semantic models. InProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC), 2014
work page 2014
-
[43]
Hidden factors and hidden topics: understanding rating dimensions with review text
Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. InProceedings of the 7th ACM conference on Recommender systems, pages 165–172, 2013
work page 2013
-
[44]
Shortgpt: Layers in large language models are more redundant than you expect
Xin Men, Mingyu Xu, Qingyu Zhang, Qianhao Yuan, Bingning Wang, Hongyu Lin, Yaojie Lu, Xianpei Han, and Weipeng Chen. Shortgpt: Layers in large language models are more redundant than you expect. InFindings of the Association for Computational Linguistics: ACL 2025, pages 20192–20204, 2025
work page 2025
-
[45]
Rui Meng, Ye Liu, Shafiq Rayhan Joty, Caiming Xiong, Yingbo Zhou, and Semih Yavuz. Sfrembedding-mistral: enhance text retrieval with transfer learning.Salesforce AI Research Blog, 3(6), 2024. 12
work page 2024
-
[46]
Luke Merrick, Danmei Xu, Gaurav Nuti, and Daniel Campos. Arctic-embed: Scalable, efficient, and accurate text embedding models.arXiv preprint arXiv:2405.05374, 2024
-
[47]
Ari Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31, 2018
work page 2018
-
[48]
Mteb: Massive text embedding benchmark
Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. Mteb: Massive text embedding benchmark. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2014–2037, 2023
work page 2014
-
[49]
Generative representational instruction tuning.arXiv preprint arXiv:2402.09906, 2024
Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, and Douwe Kiela. Generative representational instruction tuning.arXiv preprint arXiv:2402.09906, 2024
-
[50]
Zach Nussbaum, John X Morris, Brandon Duderstadt, and Andriy Mulyar. Nomic embed: Training a reproducible long context text embedder.arXiv preprint arXiv:2402.01613, 2024
-
[51]
James O’Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, and Danushka Bollegala. I wish i would have loved this one, but i didn’t–a multilingual dataset for counterfactual detection in product review. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7092–7108, 2021
work page 2021
-
[52]
Elements of information theory, 1992
SJD Phoenix. Elements of information theory, 1992
work page 1992
-
[53]
Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
work page 2019
-
[54]
Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in neural information processing systems, 30, 2017
work page 2017
-
[55]
Emily Reif, Ann Yuan, Martin Wattenberg, Fernanda B Viegas, Andy Coenen, Adam Pearce, and Been Kim. Visualizing and measuring the geometry of bert.Advances in neural information processing systems, 32, 2019
work page 2019
-
[56]
Sentence-bert: Sentence embeddings using siamese bert- networks
Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert- networks. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP- IJCNLP), pages 3982–3992, 2019
work page 2019
-
[57]
Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen V oorhees, Lucy Lu Wang, and William R. Hersh. Searching for scientific evidence in a pandemic: An overview of TREC-COVID.Journal of the American Medical Informatics Association, 2021
work page 2021
-
[58]
CARER: Contextualized affect representations for emotion recognition
Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. CARER: Contextualized affect representations for emotion recognition. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018
work page 2018
-
[59]
Adversarial domain adaptation for duplicate question detection
Darsh Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, and Preslav Nakov. Adversarial domain adaptation for duplicate question detection. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018
work page 2018
-
[60]
Oscar Skean, Md Rifat Arefin, Dan Zhao, Niket Patel, Jalal Naghiyev, Yann LeCun, and Ravid Shwartz-Ziv. Layer by layer: Uncovering hidden representations in language models.arXiv preprint arXiv:2502.02013, 2025
-
[61]
Demystifying the roles of llm layers in retrieval, knowledge, and reasoning
Xinyuan Song, Keyu Wang, PengXiang Li, Lu Yin, and Shiwei Liu. Demystifying the roles of llm layers in retrieval, knowledge, and reasoning. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 16792–16796. IEEE, 2026. 13
work page 2026
-
[62]
jina- embeddings-v3: Multilingual embeddings with task lora.arXiv preprint arXiv:2409.10173, 2024
Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, et al. jina- embeddings-v3: Multilingual embeddings with task lora.arXiv preprint arXiv:2409.10173, 2024
-
[63]
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[64]
Joshua B Tenenbaum, Vin de Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction.science, 290(5500):2319–2323, 2000
work page 2000
-
[65]
Bert rediscovers the classical nlp pipeline
Ian Tenney, Dipanjan Das, and Ellie Pavlick. Bert rediscovers the classical nlp pipeline. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 4593–4601, 2019
work page 2019
-
[66]
BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021
work page 2021
-
[67]
Evaluation of word vector representations by subspace alignment
Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Guillaume Lample, and Chris Dyer. Evaluation of word vector representations by subspace alignment. InProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2049–2054, 2015
work page 2015
-
[68]
The geometry of hidden representations of large transformer models
Lucrezia Valeriani, Diego Doimo, Francesca Cuturello, Alessandro Laio, Alessio Ansuini, and Alberto Cazzaniga. The geometry of hidden representations of large transformer models. Advances in Neural Information Processing Systems, 36:51234–51252, 2023
work page 2023
-
[69]
Elena V oita, Rico Sennrich, and Ivan Titov. The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pa...
work page 2019
-
[70]
Retrieval of the best counterargument without prior topic knowledge
Henning Wachsmuth, Shahbaz Syed, and Benno Stein. Retrieval of the best counterargument without prior topic knowledge. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018
work page 2018
-
[71]
Fact or fiction: Verifying scientific claims
David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. Fact or fiction: Verifying scientific claims. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550, 2020
work page 2020
-
[72]
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[73]
Alex H Williams, Erin Kunz, Simon Kornblith, and Scott Linderman. Generalized shape metrics on neural representations.Advances in neural information processing systems, 34:4738–4750, 2021
work page 2021
-
[74]
Mind: A large-scale dataset for news recommendation
Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, et al. Mind: A large-scale dataset for news recommendation. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 3597–3606, 2020
work page 2020
-
[75]
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. 14
work page 2018
-
[76]
Arctic-embed 2.0: Multilingual retrieval without compromise.arXiv preprint arXiv:2412.04506, 2024
Puxuan Yu, Luke Merrick, Gaurav Nuti, and Daniel Campos. Arctic-embed 2.0: Multilingual retrieval without compromise.arXiv preprint arXiv:2412.04506, 2024
-
[77]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[78]
Ej Zhou, Caiqi Zhang, Tiancheng Hu, Chengzu Li, Nigel Collier, Ivan Vuli´c, and Anna Korho- nen. Beyond the final layer: Intermediate representations for better multilingual calibration in large language models.arXiv preprint arXiv:2510.03136, 2025. 15 A Formula Details and Conventions A.1 Primitive conventions This subsection records the finite-sample co...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.