Advances in neural information processing systems , volume=

Are sixteen heads really better than one? , author=

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

cs.LG · 2022-11-01 · conditional · novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.

HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads

cs.IR · 2026-04-19 · unverdicted · novelty 7.0

HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.

RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical prediction tasks.

FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

FedProxy replaces weak adapters with a proxy SLM for federated LLM fine-tuning, outperforming prior methods and approaching centralized performance via compression, heterogeneity-aware aggregation, and training-free fusion.

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

cs.CL · 2026-05-01

citing papers explorer

Showing 6 of 6 citing papers.

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small cs.LG · 2022-11-01 · conditional · none · ref 48
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads cs.IR · 2026-04-19 · unverdicted · none · ref 18
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing cs.CL · 2026-04-21 · unverdicted · none · ref 40
DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 42
RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical prediction tasks.
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion cs.LG · 2026-04-21 · unverdicted · none · ref 142
FedProxy replaces weak adapters with a proxy SLM for federated LLM fine-tuning, outperforming prior methods and approaching centralized performance via compression, heterogeneity-aware aggregation, and training-free fusion.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models cs.CL · 2026-05-01 · unreviewed · ref 8

Advances in neural information processing systems , volume=

fields

years

verdicts

representative citing papers

citing papers explorer