Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Intermediate layers in LLMs consistently provide stronger features than final layers across tasks and architectures, as quantified by a new framework of information-theoretic, geometric, and invariance metrics.
citing papers explorer
-
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.
-
Layer by Layer: Uncovering Hidden Representations in Language Models
Intermediate layers in LLMs consistently provide stronger features than final layers across tasks and architectures, as quantified by a new framework of information-theoretic, geometric, and invariance metrics.