arXiv preprint arXiv:2403.02181 , year=

Not all layers of llms are necessary during inference , author= · 2024 · arXiv 2403.02181

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving

cs.DC · 2026-04-22 · unverdicted · novelty 7.0

FASER delivers up to 53% higher throughput and 1.92x lower latency in dynamic LLM serving by adjusting speculative lengths per request, early pruning of rejects, and overlapping draft/verification phases via frontiers.

Two-dimensional early exit optimisation of LLM inference

cs.CL · 2026-03-27 · unverdicted · novelty 7.0

Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

The Generalization Ridge: Information Flow in Natural Language Generation

cs.CL · 2025-07-07 · unverdicted · novelty 6.0

InfoRidge reveals a non-monotonic pattern in which predictive mutual information between hidden states and outputs peaks in intermediate layers before declining in final layers.

ART: Attention Replacement Technique to Improve Factuality in LLMs

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

ART replaces uniform attention in shallow LLM layers with local attention patterns to reduce hallucinations across multiple model architectures.

citing papers explorer

Showing 5 of 5 citing papers.

FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving cs.DC · 2026-04-22 · unverdicted · none · ref 13
FASER delivers up to 53% higher throughput and 1.92x lower latency in dynamic LLM serving by adjusting speculative lengths per request, early pruning of rejects, and overlapping draft/verification phases via frontiers.
Two-dimensional early exit optimisation of LLM inference cs.CL · 2026-03-27 · unverdicted · none · ref 7
Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.
Uncovering the Latent Potential of Deep Intermediate Representations cs.LG · 2026-05-21 · unverdicted · none · ref 43
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
The Generalization Ridge: Information Flow in Natural Language Generation cs.CL · 2025-07-07 · unverdicted · none · ref 8
InfoRidge reveals a non-monotonic pattern in which predictive mutual information between hidden states and outputs peaks in intermediate layers before declining in final layers.
ART: Attention Replacement Technique to Improve Factuality in LLMs cs.CL · 2026-04-07 · unverdicted · none · ref 3
ART replaces uniform attention in shallow LLM layers with local attention patterns to reduce hallucinations across multiple model architectures.

arXiv preprint arXiv:2403.02181 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer