Computer Speech & Language , volume=

On the effect of dropping layers of pre-trained transformer models , author= · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Eliciting Latent Predictions from Transformers with the Tuned Lens

cs.LG · 2023-03-14 · accept · novelty 7.0

Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.

Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.

citing papers explorer

Showing 2 of 2 citing papers.

Eliciting Latent Predictions from Transformers with the Tuned Lens cs.LG · 2023-03-14 · accept · none · ref 77
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity cs.LG · 2026-05-13 · unverdicted · none · ref 51
Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.

Computer Speech & Language , volume=

fields

years

verdicts

representative citing papers

citing papers explorer