pith. sign in

hub

Layer by Layer: Uncovering Hidden Representations in Language Models

32 Pith papers cite this work. Polarity classification is still indexing.

32 Pith papers citing it
abstract

From extracting features to generating text, the outputs of large language models (LLMs) typically rely on the final layers, following the conventional wisdom that earlier layers capture only low-level cues. However, our analysis shows that intermediate layers can encode even richer representations, often improving performance on a range of downstream tasks. To explain and quantify these hidden-layer properties, we propose a unified framework of representation quality metrics based on information theory, geometry, and invariance to input perturbations. Our framework highlights how each layer balances information compression and signal preservation, revealing why mid-depth embeddings can exceed the last layer's performance. Through extensive experiments on 32 text-embedding tasks across various architectures (transformers, state-space models) and domains (language, vision), we demonstrate that intermediate layers consistently provide stronger features, challenging the standard view on final-layer embeddings and opening new directions on using mid-layer representations for more robust and accurate representations.

hub tools

citation-role summary

background 4

citation-polarity summary

years

2026 30 2025 2

roles

background 4

polarities

background 3 support 1

clear filters

representative citing papers

Instruction Data Selection via Answer Divergence

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

ADG selects 10K instruction examples by scoring the geometric divergence of multiple high-temperature model outputs in embedding space, outperforming prior selectors on reasoning, knowledge, and coding benchmarks across two model backbones.

Overcoming the Modality Gap in Context-Aided Forecasting

cs.LG · 2026-03-12 · unverdicted · novelty 7.0

A semi-synthetic augmentation creates the CAF-7M dataset and demonstrates that improved context data enables multimodal models to outperform unimodal baselines in context-aided forecasting.

Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.

Automatic Layer Selection for Hallucination Detection

cs.AI · 2026-05-25 · unverdicted · novelty 6.0

FEPoID automatically selects optimal or near-optimal intermediate layers for hallucination detection across LLM architectures and tasks, outperforming prior criteria and baselines, with an added truncation step that further improves performance.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

Inference-Time Machine Unlearning via Gated Activation Redirection

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

GUARD-IT performs machine unlearning in LLMs via input-dependent activation steering at inference time, matching or exceeding gradient-based baselines on TOFU and MUSE while preserving utility and working under quantization.

Large Vision-Language Models Get Lost in Attention

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.

Fast Wireless Foundation Models with Early-Exits

eess.SP · 2026-06-28 · unverdicted · novelty 5.0

Early-exit framework for wireless FMs attaches per-task heads to a frozen encoder, achieving up to 93% fewer FLOPs and superior OOD performance via fixed per-task exits rather than dynamic routing.

NITP: Next Implicit Token Prediction for LLM Pre-training

cs.CL · 2026-05-24 · unverdicted · novelty 5.0

NITP adds dense supervision from shallow model layers to predict implicit next-token semantics, yielding consistent downstream gains on 0.5B-9B models with ~2% extra training FLOPs.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Fast Wireless Foundation Models with Early-Exits eess.SP · 2026-06-28 · unverdicted · none · ref 16 · internal anchor

    Early-exit framework for wireless FMs attaches per-task heads to a frozen encoder, achieving up to 93% fewer FLOPs and superior OOD performance via fixed per-task exits rather than dynamic routing.