arxiv: 2502.02013 · v2 · submitted 2025-02-04 · 💻 cs.LG · cs.AI· cs.CL

Recognition: 2 theorem links

Layer by Layer: Uncovering Hidden Representations in Language Models

Oscar Skean , Md Rifat Arefin , Dan Zhao , Niket Patel , Jalal Naghiyev , Yann LeCun , Ravid Shwartz-Ziv

Authors on Pith no claims yet

Pith reviewed 2026-05-15 16:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords language modelsintermediate layersrepresentation qualityinformation theorygeometric analysisperturbation invariancedownstream tasksembeddings

0 comments

The pith

Intermediate layers in language models often encode richer representations than the final layer for downstream tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the standard practice of using only final-layer outputs from LLMs by showing that intermediate layers frequently capture more useful features for a variety of tasks. It develops metrics grounded in information theory, geometry, and robustness to input changes to measure how each layer trades off compression against preservation of relevant signals. Experiments across 32 embedding tasks, covering transformers and state-space models in both language and vision, demonstrate consistent advantages for mid-depth layers. A reader would care because this finding questions default feature extraction methods and points to simple ways to improve performance from existing models without retraining.

Core claim

The authors establish that intermediate layers balance information compression and signal preservation more effectively than the final layer, leading to stronger representations that improve results on downstream tasks. Their unified metrics quantify these properties layer by layer and confirm the pattern holds across architectures and domains through extensive testing on 32 tasks.

What carries the argument

Unified framework of representation quality metrics based on information theory, geometry, and invariance to input perturbations, which tracks the compression-preservation trade-off at each depth.

If this is right

Mid-layer embeddings improve accuracy on text and vision embedding tasks compared with final-layer use.
The same layer-wise pattern appears in both transformer and state-space model families.
Final-layer embeddings are not reliably optimal for feature extraction across tasks.
Selecting representations from intermediate depths becomes a viable direction for more robust embeddings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Users could routinely extract and compare activations from several layers before choosing the best one for a given task.
The compression-preservation balance observed here may appear in other neural architectures beyond language models.
Task-specific layer selection might allow lighter inference by skipping deeper computations in some applications.

Load-bearing premise

The metrics from information theory, geometry, and perturbation invariance accurately reflect the qualities that determine usefulness for real downstream tasks.

What would settle it

An experiment on a new collection of tasks where final-layer embeddings match or exceed every intermediate layer on all metrics and actual task performance.

read the original abstract

From extracting features to generating text, the outputs of large language models (LLMs) typically rely on the final layers, following the conventional wisdom that earlier layers capture only low-level cues. However, our analysis shows that intermediate layers can encode even richer representations, often improving performance on a range of downstream tasks. To explain and quantify these hidden-layer properties, we propose a unified framework of representation quality metrics based on information theory, geometry, and invariance to input perturbations. Our framework highlights how each layer balances information compression and signal preservation, revealing why mid-depth embeddings can exceed the last layer's performance. Through extensive experiments on 32 text-embedding tasks across various architectures (transformers, state-space models) and domains (language, vision), we demonstrate that intermediate layers consistently provide stronger features, challenging the standard view on final-layer embeddings and opening new directions on using mid-layer representations for more robust and accurate representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Intermediate layers often beat final ones for embeddings across many tasks, with a unified metric framework that explains the pattern.

read the letter

The main point is that this paper documents how mid-depth layers in language models frequently yield stronger features for downstream embedding tasks than the final layer, and it supplies a set of metrics to quantify why. The experiments cover 32 tasks, multiple model families including transformers and state-space models, and both language and vision domains, which gives the comparison real breadth. The metrics draw from information theory, geometry, and invariance to perturbations, and they frame each layer as balancing compression against signal retention. That framing is useful because it turns the layer-choice question into something measurable rather than purely empirical trial-and-error. The systematic sweep across architectures is the clearest addition over scattered earlier observations that mid-layers can help. If the numbers hold, the practical takeaway is straightforward: stop defaulting to the last layer when pulling representations. The soft spots are mainly in the details that are hard to judge from the abstract alone. It is not obvious how tightly the proposed metrics correlate with actual task gains, or whether layer selection involved any post-hoc adjustment. Statistical reporting and baseline controls will matter for how much weight the consistency claim can carry. The work is aimed at people who extract features from pretrained models without retraining them. It gives both an empirical pattern and a diagnostic toolkit, so anyone doing representation work or probing studies would find it worth reading. I would send it to peer review because the scope is wide enough to justify checking the controls and metric validation, even if the central claim needs tighter confirmation.

Referee Report

0 major / 3 minor

Summary. The paper claims that intermediate layers in LLMs and related architectures often encode richer representations than final layers for downstream tasks. It introduces a unified framework of representation-quality metrics grounded in information theory, geometry, and invariance to input perturbations, and supports the claim with experiments across 32 text-embedding tasks, multiple model families (transformers, state-space models), and domains (language and vision).

Significance. If the empirical findings and metric framework hold under scrutiny, the work would meaningfully challenge the default practice of relying on final-layer embeddings and could shift feature-extraction pipelines toward mid-layer representations, with potential gains in robustness and accuracy on embedding-based tasks.

minor comments (3)

[§3] Abstract and §3: the claim that the metrics are 'independently motivated' and 'parameter-free' should be supported by explicit definitions; any implicit hyperparameters or normalization choices must be stated so readers can verify independence from downstream-task fitting.
[§4] §4 and Table 2: the reported 'consistent' outperformance of intermediate layers requires per-task statistical significance tests (e.g., paired t-tests or Wilcoxon with correction) and effect-size reporting; aggregate win rates alone are insufficient to support the strong claim.
[§5] §5: the cross-architecture and cross-domain generalization (transformers vs. state-space models, language vs. vision) is central; ensure that layer-indexing conventions and token-aggregation methods are identical across families so the comparison is apples-to-apples.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary and significance assessment of the manuscript, as well as the recommendation for minor revision. No specific major comments were provided in the report, so we have no individual points requiring detailed rebuttal. We will incorporate any minor editorial adjustments in the revised version.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes a framework of representation quality metrics motivated independently by information theory, geometry, and invariance to perturbations. No equations, derivations, or fitted parameters are described that reduce predictions to inputs by construction. Experiments across 32 tasks on multiple architectures provide external validation of the claim that intermediate layers can outperform final layers. No self-citation chains, uniqueness theorems, or ansatz smuggling are referenced in the abstract or context. The central claim rests on empirical results rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are specified in the provided text.

pith-pipeline@v0.9.0 · 5479 in / 1032 out tokens · 61913 ms · 2026-05-15T16:25:43.473004+00:00 · methodology

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Inference-Time Machine Unlearning via Gated Activation Redirection
cs.LG 2026-05 conditional novelty 8.0

GUARD-IT performs machine unlearning in LLMs via inference-time gated activation redirection, matching or exceeding gradient-based baselines on TOFU and MUSE while preserving utility and working under quantization.
UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs
cs.CV 2026-05 unverdicted novelty 7.0

UniVLR unifies textual and visual reasoning in multimodal LLMs by compressing reasoning traces and auxiliary images into visual latent tokens for direct inference without interleaved text CoT.
Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models
cs.AI 2026-04 unverdicted novelty 7.0

Intermediate layers in single-cell foundation models encode optimal representations for biological tasks, outperforming final layers in a task- and context-dependent manner.
Instruction Data Selection via Answer Divergence
cs.CL 2026-04 unverdicted novelty 7.0

ADG selects 10K instruction examples by scoring the geometric divergence of multiple high-temperature model outputs in embedding space, outperforming prior selectors on reasoning, knowledge, and coding benchmarks acro...
Overcoming the Modality Gap in Context-Aided Forecasting
cs.LG 2026-03 unverdicted novelty 7.0

A semi-synthetic augmentation creates the CAF-7M dataset and demonstrates that improved context data enables multimodal models to outperform unimodal baselines in context-aided forecasting.
A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
cs.CL 2026-03 unverdicted novelty 7.0

Diffusion language models form more global representations with early-layer redundancy compared to autoregressive models, allowing layer skipping for up to 18.75% FLOP savings while maintaining over 90% performance.
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs
cs.LG 2026-05 unverdicted novelty 6.0

LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
Mitigating Action-Relation Hallucinations in LVLMs via Relation-aware Visual Enhancement
cs.CV 2026-05 unverdicted novelty 6.0

A new attention-enhancement method using ARS scores and RVE reduces action-relation hallucinations in LVLMs while generalizing to spatial and object hallucinations.
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
cs.CL 2026-05 unverdicted novelty 6.0

On-policy distillation gains efficiency from early foresight in module allocation and low-rank update directions, enabling EffOPD to accelerate training by 3x via adaptive extrapolation without extra modules or tuning.
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation
cs.CL 2026-05 unverdicted novelty 6.0

On-policy distillation gains efficiency from early foresight in module focus and update directions, enabling EffOPD to accelerate training 3x with comparable performance.
FlashAR: Efficient Post-Training Acceleration for Autoregressive Image Generation
cs.CV 2026-05 unverdicted novelty 6.0

FlashAR achieves up to 22.9x speedup in 512x512 autoregressive image generation by post-training a pre-trained model with a complementary vertical head and dynamic fusion using only 0.05% of original training data.
FlashAR: Efficient Post-Training Acceleration for Autoregressive Image Generation
cs.CV 2026-05 unverdicted novelty 6.0

FlashAR accelerates autoregressive image generation up to 22.9x by post-training a pre-trained raster-scan model with a complementary vertical head and dynamic fusion for two-way next-token prediction.
Large Vision-Language Models Get Lost in Attention
cs.AI 2026-05 unverdicted novelty 6.0

In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.
Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions
cs.CL 2026-04 unverdicted novelty 6.0

LLMs encode accurate but brittle internal beliefs about latent game states and convert them poorly into actions, creating systematic gaps that explain strategic failures.
LLM Safety From Within: Detecting Harmful Content with Internal Representations
cs.AI 2026-04 unverdicted novelty 6.0

SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.
Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models
cs.AI 2026-04 unverdicted novelty 6.0

Omni-modal LLMs exhibit visual preference that emerges in mid-to-late layers, enabling hallucination detection without task-specific training.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
cs.LG 2026-04 unverdicted novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MAT...
From Words to Amino Acids: Does the Curse of Depth Persist?
cs.LG 2026-02 unverdicted novelty 6.0

Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.
Semantic Structure of Feature Space in Large Language Models
cs.CL 2026-04 unverdicted novelty 5.0

LLM hidden states encode semantic features whose geometric relations, including axis projections, cosine similarities, low-dimensional subspaces, and steering spillovers, closely mirror human psychological associations.
Do Vision Language Models Need to Process Image Tokens?
cs.CV 2026-04 unverdicted novelty 5.0

Visual representations in VLMs converge quickly to stable low-complexity forms while text continues evolving, with task-dependent needs for sustained image token access.
LTX-2: Efficient Joint Audio-Visual Foundation Model
cs.CV 2026-01 conditional novelty 5.0

LTX-2 generates high-quality synchronized audiovisual content from text prompts via an asymmetric 14B-video / 5B-audio dual-stream transformer with cross-attention and modality-aware guidance.
Adaptive Forensic Feature Refinement via Intrinsic Importance Perception
cs.CV 2026-04 unverdicted novelty 4.0

I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harmi...

Reference graph

Works this paper leans on

173 extracted references · 173 canonical work pages · cited by 20 Pith papers

[1]

K., Mondal, A

Agrawal, K. K., Mondal, A. K., Ghosh, A., and Richards, B. - ReQ : Assessing representation quality in self-supervised learning by measuring eigenspectrum decay. NeurIPs, 2022

work page 2022
[2]

and Bengio, Y

Alain, G. and Bengio, Y. Understanding intermediate layers using linear classifier probes. ICLR, 2017

work page 2017
[3]

R., Subbaraj, G., Gontier, N., LeCun, Y., Rish, I., Shwartz-Ziv, R., and Pal, C

Arefin, M. R., Subbaraj, G., Gontier, N., LeCun, Y., Rish, I., Shwartz-Ziv, R., and Pal, C. Seq-VCR : Preventing collapse in intermediate transformer representations for enhanced reasoning. ICLR, 2025

work page 2025
[4]

Information theory with kernel methods

Bach, F. Information theory with kernel methods. IEEE Transactions on Information Theory, 2022

work page 2022
[5]

BeIT : Bert pre-training of image transformers

Bao, H., Dong, L., Piao, S., and Wei, F. BeIT : Bert pre-training of image transformers. ICLR, 2022

work page 2022
[6]

Why do LLMs attend to the first token? arXiv, 2025

Barbero, F., Arroyo, A., Gu, X., Perivolaropoulos, C., Bronstein, M., Veli c kovi \'c , P., and Pascanu, R. Why do LLMs attend to the first token? arXiv, 2025

work page 2025
[7]

LLM2Vec : Large language models are secretly powerful text encoders

Behnam Ghader, P., Adlakha, V., Mosbach, M., Bahdanau, D., Chapados, N., and Reddy, S. LLM2Vec : Large language models are secretly powerful text encoders. COLM, 2024

work page 2024
[8]

G., Bradley, H., O’Brien, K., Hallahan, E., Khan, M

Biderman, S., Schoelkopf, H., Anthony, Q. G., Bradley, H., O’Brien, K., Hallahan, E., Khan, M. A., Purohit, S., Prashanth, U. S., Raff, E., et al. Pythia: A suite for analyzing large language models across training and scaling. ICML, 2023

work page 2023
[9]

P., and Wilming, H

Boes, P., Eisert, J., Gallego, R., M \"u ller, M. P., and Wilming, H. Von neumann entropy from unitarity. Physical review letters, 2019

work page 2019
[10]

Guillotine regularization: Why removing layers is needed to improve generalization in self-supervised learning

Bordes, F., Balestriero, R., Garrido, Q., Bardes, A., and Vincent, P. Guillotine regularization: Why removing layers is needed to improve generalization in self-supervised learning. TMLR, 2023

work page 2023
[11]

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A.,...

work page 2020
[12]

On identifiability in transformers

Brunner, G., Liu, Y., Pascual, D., Richter, O., Ciaramita, M., and Wattenhofer, R. On identifiability in transformers. ICLR, 2020

work page 2020
[13]

Discovering latent knowledge in language models without supervision

Burns, C., Ye, H., Klein, D., and Steinhardt, J. Discovering latent knowledge in language models without supervision. ICLR, 2023

work page 2023
[14]

Generative pretraining from pixels

Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., and Sutskever, I. Generative pretraining from pixels. ICML, 2020

work page 2020
[15]

Emergence of a high-dimensional abstraction phase in language transformers

Cheng, E., Doimo, D., Kervadec, C., Macocco, I., Yu, J., Laio, A., and Baroni, M. Emergence of a high-dimensional abstraction phase in language transformers. ICLR, 2025

work page 2025
[16]

D., and Potts, C

Csord \'a s, R., Manning, C. D., and Potts, C. Do language models use their depth efficiently? arXiv, 2025

work page 2025
[17]

Deepseek- R1 : Incentivizing reasoning capability in LLMs via reinforcement learning

DeepSeek-AI. Deepseek- R1 : Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv, 2025

work page 2025
[18]

K., Aitchison, M., Orseau, L., et al

Deletang, G., Ruoss, A., Duquenne, P.-A., Catt, E., Genewein, T., Mattern, C., Grau-Moya, J., Wenliang, L. K., Aitchison, M., Orseau, L., et al. Language modeling is compression. ICLR, 2024

work page 2024
[19]

BERT : Pre-training of deep bidirectional transformers for language understanding

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT : Pre-training of deep bidirectional transformers for language understanding. NAACL, 2019

work page 2019
[20]

An image is worth 16x16 words: Transformers for image recognition at scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021

work page 2021
[21]

The Llama 3 herd of models

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al. The Llama 3 herd of models. arXiv, 2024

work page 2024
[22]

A., Toshev, A., Shankar, V., Susskind, J

El-Nouby, A., Klein, M., Zhai, S., Bautista, M. A., Toshev, A., Shankar, V., Susskind, J. M., and Joulin, A. Scalable pre-training of large autoregressive image models. ICML, 2024

work page 2024
[23]

Not all layers of LLMs are necessary during inference

Fan, S., Jiang, X., Li, X., Meng, X., Han, P., Shang, S., Sun, A., Wang, Y., and Wang, Z. Not all layers of LLMs are necessary during inference. arXiv, 2024

work page 2024
[24]

Fini, E., Shukor, M., Li, X., Dufter, P., Klein, M., Haldimann, D., Aitharaju, S., da Costa, V. G. T., B \'e thune, L., Gan, Z., et al. Multimodal autoregressive pre-training of large vision encoders. CVPR, 2025

work page 2025
[25]

RankMe : Assessing the downstream performance of pretrained self-supervised representations by their rank

Garrido, Q., Balestriero, R., Najman, L., and Lecun, Y. RankMe : Assessing the downstream performance of pretrained self-supervised representations by their rank. ICML, 2023

work page 2023
[26]

Giraldo, L. G. S., Rao, M., and Principe, J. C. Measures of entropy from data using infinitely divisible kernels. IEEE Transactions on Information Theory, 2014

work page 2014
[27]

and Dao, T

Gu, A. and Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. COLM, 2024

work page 2024
[28]

When attention sink emerges in language models: An empirical view

Gu, X., Pang, T., Du, C., Liu, Q., Zhang, F., Du, C., Wang, Y., and Lin, M. When attention sink emerges in language models: An empirical view. ICLR, 2025

work page 2025
[29]

and Tegmark, M

Gurnee, W. and Tegmark, M. Language models represent space and time. arXiv, 2023

work page 2023
[30]

Training large language models to reason in a continuous latent space

Hao, S., Sukhbaatar, S., Su, D., Li, X., Hu, Z., Weston, J., and Tian, Y. Training large language models to reason in a continuous latent space. arXiv, 2024

work page 2024
[31]

Masked autoencoders are scalable vision learners

He, K., Chen, X., Xie, S., Li, Y., Doll \'a r, P., and Girshick, R. Masked autoencoders are scalable vision learners. CVPR, 2022

work page 2022
[32]

and Fedorenko, E

Hosseini, E. and Fedorenko, E. Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language. NeurIPs, 2023

work page 2023
[33]

Exploring concept depth: How large language models acquire knowledge at different layers? arXiv, 2024

Jin, M., Yu, Q., Huang, J., Zeng, Q., Wang, Z., Hua, W., Zhao, H., Mei, K., Meng, Y., Ding, K., et al. Exploring concept depth: How large language models acquire knowledge at different layers? arXiv, 2024

work page 2024
[34]

The remarkable robustness of LLMs : Stages of inference? arXiv, 2024

Lad, V., Gurnee, W., and Tegmark, M. The remarkable robustness of LLMs : Stages of inference? arXiv, 2024

work page 2024
[35]

Competition-level code generation with alphacode

Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Eccles, T., Keeling, J., Gimeno, F., Dal Lago, A., et al. Competition-level code generation with alphacode. Science, 2022

work page 2022
[36]

F., Gardner, M., Belinkov, Y., Peters, M

Liu, N. F., Gardner, M., Belinkov, Y., Peters, M. E., and Smith, N. A. Linguistic knowledge and transferability of contextual representations. NAACL, 2019

work page 2019
[37]

NLP Augmentation , 2019

Ma, E. NLP Augmentation , 2019. URL https://github.com/makcedward/nlpaug

work page 2019
[38]

Mallen, A. T. and Belrose, N. Eliciting latent knowledge from quirky language models. ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2024

work page 2024
[39]

A., Stephenson, C., Tang, H., Kim, Y., and Chung, S

Mamou, J., Le, H., Del Rio, M. A., Stephenson, C., Tang, H., Kim, Y., and Chung, S. Emergence of separable manifolds in deep language representations. ICML, 2020

work page 2020
[40]

E., and Biau, G

Marion, P., Wu, Y.-H., Sander, M. E., and Biau, G. Implicit regularization of deep residual networks towards neural odes. ICLR, 2024

work page 2024
[41]

Pointer sentinel mixture models

Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer sentinel mixture models. ICLR, 2017

work page 2017
[42]

MTEB : Massive text embedding benchmark

Muennighoff, N., Tazi, N., Magne, L., and Reimers, N. MTEB : Massive text embedding benchmark. EACL, 2022

work page 2022
[43]

Oord, A. v. d., Li, Y., and Vinyals, O. Representation learning with contrastive predictive coding. ICLR, 2018

work page 2018
[44]

DINOv2 : Learning robust visual features without supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al. DINOv2 : Learning robust visual features without supervision. TMLR, 2024

work page 2024
[45]

J., Jiang, Y., and Veitch, V

Park, K., Choe, Y. J., Jiang, Y., and Veitch, V. The geometry of categorical and hierarchical concepts in large language models. ICML 2024 Workshop on Mechanistic Interpretability, 2024 a

work page 2024
[46]

J., and Veitch, V

Park, K., Choe, Y. J., and Veitch, V. The linear representation hypothesis and the geometry of large language models. ICML, 2024 b

work page 2024
[47]

W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. ICML, 2021

work page 2021
[48]

SVCCA : Singular vector canonical correlation analysis for deep learning dynamics and interpretability

Raghu, M., Gilmer, J., Yosinski, J., and Sohl-Dickstein, J. SVCCA : Singular vector canonical correlation analysis for deep learning dynamics and interpretability. NeurIPs, 2017

work page 2017
[49]

The shape of learning: Anisotropy and intrinsic dimensions in transformer-based models

Razzhigaev, A., Mikhalchuk, M., Goncharova, E., Oseledets, I., Dimitrov, D., and Kuznetsov, A. The shape of learning: Anisotropy and intrinsic dimensions in transformer-based models. EACL, 2024

work page 2024
[50]

On measures of entropy and information

R \'e nyi, A. On measures of entropy and information. Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, 1961

work page 1961
[51]

and Vetterli, M

Roy, O. and Vetterli, M. The effective rank: A measure of effective dimensionality. European signal processing conference, 2007

work page 2007
[52]

V., Stadelmann, T., and Grewe, B

Saponati, M., Sager, P., Aceituno, P. V., Stadelmann, T., and Grewe, B. The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in transformer training. arXiv, 2025

work page 2025
[53]

and Smola, A

Scholkopf, B. and Smola, A. J. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2018

work page 2018
[54]

Information flow in deep neural networks

Shwartz-Ziv, R. Information flow in deep neural networks. PhD thesis, Hebrew University, 2022

work page 2022
[55]

and Tishby, N

Shwartz-Ziv, R. and Tishby, N. Opening the black box of deep neural networks via information. Entropy, 2019

work page 2019
[56]

G., and LeCun, Y

Shwartz-Ziv, R., Balestriero, R., Kawaguchi, K., Rudner, T. G., and LeCun, Y. An information theory perspective on variance-invariance-covariance regularization. NeurIPs, 2023

work page 2023
[57]

Skean, O., Osorio, J. K. H., Brockmeier, A. J., and Giraldo, L. G. S. DiME : Maximizing mutual information by a difference of matrix-based entropies. arXiv, 2023

work page 2023
[58]

Skean, O., Dhakal, A., Jacobs, N., and Giraldo, L. G. S. FroSSL : Frobenius norm minimization for self-supervised learning. ECCV, 2024

work page 2024
[59]

Neural representational geometry underlies few-shot concept learning

Sorscher, B., Ganguli, S., and Sompolinsky, H. Neural representational geometry underlies few-shot concept learning. Proceedings of the National Academy of Sciences, 2022

work page 2022
[60]

BERT rediscovers the classical nlp pipeline

Tenney, I., Das, D., and Pavlick, E. BERT rediscovers the classical nlp pipeline. NAACL, 2019

work page 2019
[61]

M., and Littwin, E

Thilak, V., Huang, C., Saremi, O., Dinh, L., Goh, H., Nakkiran, P., Susskind, J. M., and Littwin, E. LiDAR : Sensing linear probing performance in joint embedding ssl architectures. ICLR, 2024

work page 2024
[62]

Contrastive multiview coding

Tian, Y., Krishnan, D., and Isola, P. Contrastive multiview coding. ECCV, 2020

work page 2020
[63]

Llama 2: Open foundation and fine-tuned chat models

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. Llama 2: Open foundation and fine-tuned chat models. arXiv, 2023

work page 2023
[64]

The geometry of hidden representations of large transformer models

Valeriani, L., Doimo, D., Cuturello, F., Laio, A., Ansuini, A., and Cazzaniga, A. The geometry of hidden representations of large transformer models. NeurIPs, 2023

work page 2023
[65]

N., Kaiser, L

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. NeurIPs, 2017

work page 2017
[66]

The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives

Voita, E., Sennrich, R., and Titov, I. The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives. EMNLP-IJCNLP, 2019

work page 2019
[67]

Diff-eRank : A novel rank-based metric for evaluating large language models

Wei, L., Tan, Z., Li, C., Wang, J., and Huang, W. Diff-eRank : A novel rank-based metric for evaluating large language models. NeurIPs, 2024

work page 2024
[68]

Efficient streaming language models with attention sinks

Xiao, G., Tian, Y., Chen, B., Han, S., and Lewis, M. Efficient streaming language models with attention sinks. ICLR, 2024

work page 2024
[69]

Qwen2.5 technical report

Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Xia, T., Ren, X., Ren, X., Fan, Y., Su, Y., Zhang, Y., Wan, Y., Liu, Y., Cui...

work page 2024
[70]

Zhao, Z., Ziser, Y., and Cohen, S. B. Layer by layer: Uncovering where multi-task learning happens in instruction-tuned large language models. EMNLP-IJCNLP, 2024

work page 2024
[71]

and Liu, D

Zhouyin, Z. and Liu, D. Understanding neural networks with logarithm determinant entropy estimator. arXiv, 2021

work page 2021
[72]

Székely and Maria L

Gábor J. Székely and Maria L. Rizzo and Nail K. Bakirov , Title =. 2008 , journal =

work page 2008
[73]

Training Large Language Models to Reason in a Continuous Latent Space , author=

work page
[74]

Qwen2.5 Technical Report , author =

work page
[75]

DeepSeek-

DeepSeek-AI , year=. DeepSeek-

work page
[76]

2019 , publisher=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , publisher=

work page 2019
[77]

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , journal = neurips, title =

work page
[78]

AI Medical Chatbot dataset , author=

work page
[79]

Pythia: A suite for analyzing large language models across training and scaling , author=

work page
[80]

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , journal=naacl, year=

work page

Showing first 80 references.