pith. sign in

arxiv: 2607.01002 · v1 · pith:YVQVQXSNnew · submitted 2026-07-01 · 💻 cs.CL · cs.AI· cs.LG

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

Pith reviewed 2026-07-02 12:53 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords attention headsnon-literal retrievallogit contributionoutput-value circuitlong-context modelsmodel ablationmechanistic interpretability
0
0 comments X

The pith

Logit-Contribution Scoring detects the attention heads that synthesize non-literal answers from context meaning via their output-value circuits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing detectors for retrieval heads in long-context models reward literal token matches at attended positions, missing the synthesis performed by output-value circuits. Logit-Contribution Scoring instead projects each head's OV output onto the answer-token unembedding direction and contrasts needle versus off-needle source positions in one forward pass. Mean-ablating the highest-scoring heads on the NoLiMa benchmark reduces ROUGE-L more sharply and at lower head counts than attention-based baselines across Qwen3, Gemma-3, and OLMo-3.1 models. The same heads prove retrieval-specific, leaving parametric recall and arithmetic tasks intact under identical ablation.

Core claim

Logit-Contribution Scoring identifies non-literal retrieval heads by scoring each attention head according to the projection of its OV-circuit output onto the answer-token unembedding direction, contrasting needle and off-needle positions; ablating the top-scoring heads collapses ROUGE-L on NoLiMa at lower counts than prior methods, drops MuSiQue and BABI-Long scores substantially, and leaves unrelated tasks unaffected.

What carries the argument

Logit-Contribution Scoring (LOCOS), which scores each head by the projection of its OV-circuit output onto the answer-token unembedding direction while contrasting needle and off-needle positions.

If this is right

  • Ablating 50 top LOCOS heads on Qwen3-8B drops ROUGE-L from 0.401 to 0.000 on NoLiMa while the strongest baseline retains 0.292.
  • The selected heads are retrieval-specific, leaving parametric recall and arithmetic reasoning at baseline levels.
  • The same ablation drops MuSiQue from 0.55 to 0.08 and BABI-Long from 0.62 to 0.20.
  • LOCOS outperforms attention-based detectors across three model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could support more precise circuit-level interventions for long-context synthesis behaviors.
  • Similar projection-based scoring might extend to identifying heads involved in other non-copying operations such as multi-hop inference.
  • The heads isolated by LOCOS may participate in broader circuits whose structure could be tested by tracing their downstream effects.

Load-bearing premise

The projection of a head's OV-circuit output onto the answer-token unembedding direction isolates the non-literal synthesis contribution rather than other logit effects or correlations in the forward pass.

What would settle it

Ablating the top LOCOS heads on NoLiMa fails to reduce ROUGE-L more than ablating the same number of attention-based heads or random heads.

Figures

Figures reproduced from arXiv: 2607.01002 by Aryo Pradipta Gema, Beatrice Alex, Pasquale Minervini.

Figure 1
Figure 1. Figure 1: Non-literal retrieval requires synthesis. The same context answers two questions differently: a literal question requires reading “Eiffel Tower” directly from the needle, while a non-literal question must produce “Yuki” after synthesizing the context. Our method, Logit-Contribution Scoring (LOCOS), measures how each attention head contributes to the correct answer token in the unembedding space (See [PITH… view at source ↗
Figure 2
Figure 2. Figure 2: An attention head has two circuits: where it reads (QK) and what it writes (OV). Logit￾Contribution Scoring uses the OV circuit to identify non-literal retrieval heads. (a) Anatomy of a head’s per-position output: the QK circuit produces attention weight αt,j ; the OV circuit produces WOvj . Attention-based methods measure only α. Logit-contribution scoring (LOCOS) measures ϕ = u ⊤ yt (α · WOvj ), capturin… view at source ↗
Figure 3
Figure 3. Figure 3: LOCOS heads produce steeper ROUGE-L degradation under mean-ablation across all six models. Each panel shows NoLiMa ROUGE-L (800 trials) as a function of the number of ablated heads k for four scoring methods across three model families at two scales each: Qwen3 (8B, 14B, 32B), OLMo-3.1 (32B), and Gemma-3 (12B, 27B). LOCOS (blue) produces the steepest degradation curve in every model, reaching near-zero ROU… view at source ↗
Figure 4
Figure 4. Figure 4: OV projections improve causal head selection on most models. Each panel shows NoLiMa ROUGE-L (800 held-out trials) under mean-ablation of the top-k heads ranked by LOCOS (blue) and the attention-only control (cyan). Both scorers use identical spatial-contrast aggregation; only the per-position observable differs. LOCOS is stronger on Qwen3-8B, Qwen3-32B, and Gemma￾3-12B, comparable on Qwen3-14B and OLMo-3.… view at source ↗
Figure 5
Figure 5. Figure 5: Bottom-k ablation does not degrade retrieval. Each panel shows NoLiMa ROUGE-L as a function of ablation depth k for top-k (blue), bottom-k (cyan), and random heads (orange) for three representative models (one per family); the full six-model version is in Appx. L. Top-k heads produce steep degradation; bottom-k heads track the random baseline despite having equally large absolute logit contribution, ruling… view at source ↗
Figure 6
Figure 6. Figure 6: LOCOS heads are more concentrated in late layers than Wu/NIAH-scored scores. Layer × Head heatmaps on NoLiMa for Gemma-3-27B (left) and Qwen3-32B (right). The left-hand panel of each model shows LOCOS; the right shows Wu/NIAH-scored token-matching. Red squares mark top-10 heads. Both LOCOS and Wu/NIAH-scored assign high scores predominantly to late layers, but Wu/NIAH-scored additionally identifies heads i… view at source ↗
Figure 7
Figure 7. Figure 7: LOCOS heads exhibit the strongest functional dissociation between retrieval and parametric capabilities. Each panel shows DS(k) (lines, right axis) and parametric accuracy (bars, left axis) as a function of ablation depth k for four scoring methods, on three representative models (one per family); the full six-model version is in Appx. L. Higher DS indicates that ablation degrades retrieval far more than p… view at source ↗
Figure 8
Figure 8. Figure 8: Ablating LOCOS heads damages non-literal retrieval more than literal retrieval. Each panel shows ROUGE-L on NoLiMa (solid) and standard NIAH (dashed) under mean-ablation of the same top-k LOCOS heads, with the NoLiMa and NIAH baselines marked by solid and dashed gray lines. Three representative models are shown here; the full six-model version is in Appx. L. The NoLiMa curve declines more steeply in every … view at source ↗
Figure 9
Figure 9. Figure 9: Mean-ablating top-50 LOCOS heads degrades downstream long-context performance, most strongly on the Qwen3 family. Accuracy on MuSiQue (top) and BABILong qa2+qa3 (bottom) for six models. Bars show the unablated baseline (gray) and the three ablation conditions: random heads (orange), Wu/NIAH-scored heads (pink), and LOCOS (blue). Error bars are standard deviations across three independent runs. LOCOS produc… view at source ↗
Figure 10
Figure 10. Figure 10: Distribution of LOCOS scores across all heads for each model. Heads are sorted by Sl,h; the top-50 (blue, left) and bottom-50 (red, right) are highlighted. In every model, the bottom-50 heads have strictly negative scores, confirming that the bottom-k experiments (§ 4.4) exclusively ablate heads whose answer-aligned logit contribution originates from off-needle positions. variant in [PITH_FULL_IMAGE:figu… view at source ↗
Figure 11
Figure 11. Figure 11: Bottom-k ablation produces near-zero dissociation. Dissociation score DS(k) and parametric accuracy as a function of ablation depth k for bottom-k heads across six models. Unlike top-k ablation ( [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Top-10 LOCOS cells concentrate in late layers in the Qwen3 family on NoLiMa, but span broader layer ranges in Gemma-3-12B and OLMo-3.1-32B. Per-(layer, KV-group) mean LOCOS score on NoLiMa for Qwen3-8B, Qwen3-14B, Qwen3-32B, OLMo-3.1-32B, Gemma-3- 12B, and Gemma-3-27B. Layer is on the x-axis, KV group on the y-axis, color encodes the mean score across passing trials. Red boxes mark the top-10 (layer, KV-g… view at source ↗
Figure 13
Figure 13. Figure 13: Bottom-k ablation does not degrade retrieval (six-model version of [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Functional dissociation between retrieval and parametric capabilities across all six models (six-model version of [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Non-literal vs. literal retrieval damage across all six models (six-model version of [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Late-layer concentration persists un￾der tuned-lens projection. Heatmaps for Gemma￾3-27B: direct-path LOCOS (left) vs. tuned-lens variant (right). Both methods concentrate high￾scoring heads in layers 35–60; the layer-marginal distributions peak in the same band. The tuned￾lens variant surfaces two additional heads at layer 11 (heads 26 and 27) that do not appear in the direct-path top-k set, but does not… view at source ↗
Figure 17
Figure 17. Figure 17: Tuned-lens correction only partly resolves the Gemma-3-27B inversion. No￾LiMa ROUGE-L under mean-ablation of top-k heads ranked by direct LOCOS, the attention￾only spatial-contrast control, and the tuned-lens￾corrected LOCOS variant on Gemma-3-27B. The tuned-lens variant closes much of the gap with attention-only scoring at large k, but direct LO￾COS selects the most damaging heads at small k. Replacing t… view at source ↗
Figure 18
Figure 18. Figure 18: Causal attribution vs. LOCOS top-10 heads on Qwen3-8B and Gemma-3-12B. Per-(layer, head) score heatmaps with red boxes marking each method’s top-10 cells; layer-marginal kernel densities on the right of each panel. Both methods concentrate top-10 heads in the upper layers in both models, but the top-10 sets overlap only marginally (2/10 for Qwen3-8B, 3/10 for Gemma-3-12B). On Gemma-3-12B, LOCOS surfaces s… view at source ↗
read the original abstract

In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construction: they reward heads whose attended token matches the generated token, a literal-copy criterion that captures where a head reads but not what it writes through its output-value (OV) circuit, the very mechanism that carries non-literal retrieval. We introduce Logit-Contribution Scoring (LOCOS), a write-aware detector that scores each head by the projection of its OV-circuit output onto the answer-token unembedding direction, contrasting needle and off-needle source positions in a single forward pass. Across three model families (Qwen3, Gemma-3, OLMo-3.1), mean-ablating the top LOCOS heads on the NoLiMa non-literal retrieval benchmark collapses ROUGE-L at lower head counts than prior attention-based detections; on Qwen3-8B, ablating 50 heads drives ROUGE-L from 0.401 to 0.000 while the strongest baseline still retains 0.292. The selected heads are retrieval-specific: parametric recall and arithmetic reasoning stay at baseline under the same ablation. On Qwen3-8B, the same ablation also drops MuSiQue from 0.55 to 0.08 and BABI-Long from 0.62 to 0.20, while a random-heads control stays within 0.05 of baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces Logit-Contribution Scoring (LOCOS), a write-aware detector that scores attention heads by projecting their OV-circuit output (at needle vs. off-needle positions) onto the answer-token unembedding direction in a single forward pass. It claims this identifies heads performing non-literal synthesis rather than literal copying, supported by ablation experiments showing that mean-ablating top LOCOS heads on NoLiMa collapses ROUGE-L faster than prior attention-based methods (e.g., Qwen3-8B: 50 heads drop ROUGE-L from 0.401 to 0.000 vs. baseline retaining 0.292), with similar drops on MuSiQue and BABI-Long but no effect on parametric recall or arithmetic reasoning across Qwen3, Gemma-3, and OLMo-3.1 families.

Significance. If the central claim holds, LOCOS provides a mechanistic tool for isolating heads that contribute to non-literal retrieval in long-context settings, with ablation results demonstrating task-specific necessity. This could enable more precise interpretability analyses and interventions compared to read-focused detectors, particularly given the reproducible ablation protocol and cross-model consistency.

major comments (1)
  1. [Method section (LOCOS definition)] Method section (LOCOS definition): the projection of OV-circuit output onto the answer-token unembedding direction measures marginal logit contribution but does not isolate non-literal synthesis, as any head whose output correlates with needle presence (via attention patterns, residual mixing, or downstream computations) receives a high score regardless of whether it performs the synthesis step. The needle/off-needle contrast in a single forward pass controls for position but leaves internal forward-pass correlations unaddressed, so necessity shown by ablation does not entail that the selected heads implement the claimed mechanism.
minor comments (1)
  1. [Experiments section] The manuscript would benefit from explicit reporting of data splits, statistical significance tests on ablation deltas, and full hyperparameter details for the mean-ablation procedure to strengthen reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying a key interpretive distinction. We respond to the single major comment below.

read point-by-point responses
  1. Referee: the projection of OV-circuit output onto the answer-token unembedding direction measures marginal logit contribution but does not isolate non-literal synthesis, as any head whose output correlates with needle presence (via attention patterns, residual mixing, or downstream computations) receives a high score regardless of whether it performs the synthesis step. The needle/off-needle contrast in a single forward pass controls for position but leaves internal forward-pass correlations unaddressed, so necessity shown by ablation does not entail that the selected heads implement the claimed mechanism.

    Authors: We agree that LOCOS computes a marginal logit contribution of each head's OV output to the answer token and that the needle/off-needle contrast primarily removes positional confounds rather than all possible internal forward-pass correlations. Consequently, the ablation results demonstrate necessity of the selected heads for non-literal retrieval performance but do not establish that those heads perform the synthesis computation itself. We will revise the manuscript to clarify this scope: LOCOS is presented as a write-aware detector that ranks heads by their retrieval-specific contribution to the answer logit, with empirical support from stronger ablation effects on non-literal benchmarks than literal-copy baselines. We will add explicit language in the method and discussion sections acknowledging that the method does not isolate the internal mechanism and that further targeted interventions would be required to confirm synthesis. revision: yes

Circularity Check

0 steps flagged

No circularity: LOCOS is a direct projection from model components, validated externally by ablation.

full rationale

The paper defines LOCOS explicitly as the projection of each attention head's OV-circuit output onto the answer-token unembedding direction, with a needle vs. off-needle contrast computed in a single forward pass. This uses only the model's existing weights and activations with no parameter fitting to the NoLiMa benchmark or any target metric. Ablation results (e.g., ROUGE-L collapse on Qwen3-8B) function as an independent external test of necessity rather than entering the score definition. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps in the derivation; the central claim rests on the mechanistic definition plus post-hoc empirical validation. No equations reduce the claimed detection to a fitted input or self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard transformer components (attention heads, OV circuits, unembedding) with no new entities postulated; the central claim is supported by ablation experiments rather than additional fitted parameters.

axioms (1)
  • standard math The output of an attention head's OV circuit contributes additively to the residual stream and thereby to next-token logits via the unembedding matrix.
    Invoked when defining the projection scoring in the abstract.

pith-pipeline@v0.9.1-grok · 5822 in / 1258 out tokens · 19341 ms · 2026-07-02T12:53:29.872177+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 15 canonical work pages · 4 internal anchors

  1. [1]

    Attention is All you Need , booktitle =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser,. Attention is All you Need , booktitle =. 2017 , url =

  2. [2]

    2023 , howpublished =

    Kamradt, Greg , title =. 2023 , howpublished =

  3. [3]

    Text Summarization Branches Out , month = jul, year =

    Lin, Chin-Yew , title =. Text Summarization Branches Out , month = jul, year =

  4. [4]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,

    Ainslie, Joshua and Lee-Thorp, James and de Jong, Michiel and Zemlyanskiy, Yury and Lebr. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,. 2023 , url =

  5. [5]

    Zhang, Zhenyu and Sheng, Ying and Zhou, Tianyi and Chen, Tianlong and Zheng, Lianmin and Cai, Ruisi and Song, Zhao and Tian, Yuandong and R. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10--16, 2023 , pages =. 2023 , url =

  6. [6]

    Li, Yuhong and Huang, Yingbing and Yang, Bowen and Venkitesh, Bharat and Locatelli, Acyr and Ye, Hanchen and Cai, Tianle and Lewis, Patrick and Chen, Deming , title =. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10--15, 2024 , pages =. ...

  7. [7]

    The Twelfth International Conference on Learning Representations,

    Xiao, Guangxuan and Tian, Yuandong and Chen, Beidi and Han, Song and Lewis, Mike , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

  8. [8]

    PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

    Cai, Zefan and Zhang, Yichi and Gao, Bofei and Liu, Yuliang and Liu, Tianyu and Lu, Keming and Xiong, Wayne and Dong, Yue and Hu, Junjie and Xiao, Wen , title =. arXiv preprint arXiv:2406.02069 , year =

  9. [9]

    Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , month = jul, year =

    Is Attention Interpretable? , author =. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , month = jul, year =. doi:10.18653/v1/P19-1282 , pages =

  10. [10]

    The Elephant in the Interpretability Room:

    Bastings, Jasmijn and Filippova, Katja , booktitle =. The Elephant in the Interpretability Room:. 2020 , address =. doi:10.18653/v1/2020.blackboxnlp-1.14 , pages =

  11. [11]

    Nanda, Neel and Bloom, Joseph , year =

  12. [12]

    2023 , eprint =

    Copy Suppression: Comprehensively Understanding an Attention Head , author =. 2023 , eprint =

  13. [13]

    Transactions of the Association for Computational Linguistics , volume =

    Lost in the Middle: How Language Models Use Long Contexts , author =. Transactions of the Association for Computational Linguistics , volume =. 2024 , publisher =

  14. [14]

    2024 , url =

    Hsieh, Cheng-Ping and Sun, Simeng and Kriman, Samuel and Acharya, Shantanu and Rekesh, Dima and Jia, Fei and Ginsburg, Boris , journal =. 2024 , url =

  15. [15]

    2024 , month = aug, address =

    Bai, Yushi and Lv, Xin and Zhang, Jiajie and Lyu, Hongchang and Tang, Jiankai and Huang, Zhidian and Du, Zhengxiao and Liu, Xiao and Zeng, Aohan and Hou, Lei and Dong, Yuxiao and Tang, Jie and Li, Juanzi , booktitle =. 2024 , month = aug, address =

  16. [16]

    Retrieval-Augmented Generation for Knowledge-Intensive

    Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K. Retrieval-Augmented Generation for Knowledge-Intensive. Advances in Neural Information Processing Systems , volume =. 2020 , publisher =

  17. [17]

    The Thirteenth International Conference on Learning Representations , year=

    Retrieval Head Mechanistically Explains Long-Context Factuality , author=. The Thirteenth International Conference on Learning Representations , year=

  18. [18]

    CompressKV: Seman- tic retrieval heads know what tokens are not important before generation.arXiv preprint arXiv:2508.02401, 2025

    CompressKV: Semantic Retrieval Heads Know What Tokens are Not Important Before Generation , author=. arXiv preprint arXiv:2508.02401 , year=

  19. [19]

    Not All Heads Matter: A Head-Level

    Yu Fu and Zefan Cai and Abedelkadir Asi and Wayne Xiong and Yue Dong and Wen Xiao , booktitle=. Not All Heads Matter: A Head-Level. 2025 , url=

  20. [20]

    Forty-second International Conference on Machine Learning , year=

    NoLiMa: Long-Context Evaluation Beyond Literal Matching , author=. Forty-second International Conference on Machine Learning , year=

  21. [21]

    D e C o R e: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

    Gema, Aryo Pradipta and Jin, Chen and Abdulaal, Ahmed and Diethe, Tom and Teare, Philip Alexander and Alex, Beatrice and Minervini, Pasquale and Saseendran, Amrutha. D e C o R e: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.531

  22. [22]

    interpreting

    nostalgebraist , year=. interpreting

  23. [23]

    Eliciting Latent Predictions from Transformers with the Tuned Lens

    Eliciting latent predictions from transformers with the tuned lens , author=. arXiv preprint arXiv:2303.08112 , year=

  24. [24]

    2021 , journal=

    A Mathematical Framework for Transformer Circuits , author=. 2021 , journal=

  25. [25]

    In-context Learning and Induction Heads

    In-context learning and induction heads , author=. arXiv preprint arXiv:2209.11895 , year=

  26. [26]

    Interpretability in the Wild: a Circuit for Indirect Object Identification in

    Wang, Kevin and Variengien, Alexandre and Conmy, Arthur and Shlegeris, Buck and Steinhardt, Jacob , booktitle=. Interpretability in the Wild: a Circuit for Indirect Object Identification in. 2023 , url=

  27. [27]

    Advances in Neural Information Processing Systems , volume=

    Towards automated circuit discovery for mechanistic interpretability , author=. Advances in Neural Information Processing Systems , volume=

  28. [28]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  29. [29]

    The Thirteenth International Conference on Learning Representations , year=

    Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition , author=. The Thirteenth International Conference on Learning Representations , year=

  30. [30]

    2025 , url=

    Xiao, Guangxuan and Tang, Jiaming and Zuo, Jingwei and Guo, Junxian and Yang, Shang and Tang, Haotian and Fu, Yao and Han, Song , booktitle=. 2025 , url=

  31. [31]

    When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

    Mallen, Alex and Asai, Akari and Zhong, Victor and Das, Rajarshi and Khashabi, Daniel and Hajishirzi, Hannaneh. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023...

  32. [32]

    2024 , url =

    Llama 3 Model Card , author=. 2024 , url =

  33. [33]

    2025 , eprint=

    Qwen3 Technical Report , author=. 2025 , eprint=

  34. [34]

    ArXiv , year=

    Gemma 3 Technical Report , author=. ArXiv , year=

  35. [35]

    2025 , eprint=

    Olmo 3 , author=. 2025 , eprint=

  36. [36]

    L ong B ench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

    Bai, Yushi and Tu, Shangqing and Zhang, Jiajie and Peng, Hao and Wang, Xiaozhi and Lv, Xin and Cao, Shulin and Xu, Jiazheng and Hou, Lei and Dong, Yuxiao and Tang, Jie and Li, Juanzi. L ong B ench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks. Proceedings of the 63rd Annual Meeting of the Association for Computational...

  37. [37]

    Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

    Entity-based knowledge conflicts in question answering , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

  38. [38]

    Proceedings of the ACM on Web Conference 2025 , pages=

    MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot , author=. Proceedings of the ACM on Web Conference 2025 , pages=

  39. [39]

    International Conference on Learning Representations , year=

    Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=

  40. [40]

    Are We Done with MMLU ?

    Gema, Aryo Pradipta and Leang, Joshua Ong Jun and Hong, Giwon and Devoto, Alessio and Mancino, Alberto Carlo Maria and Saxena, Rohit and He, Xuanli and Zhao, Yu and Du, Xiaotang and Ghasemi Madani, Mohammad Reza and Barale, Claire and McHardy, Robert and Harris, Joshua and Kaddour, Jean and Van Krieken, Emile and Minervini, Pasquale. Are We Done with MMLU...

  41. [41]

    Applied Sciences , volume=

    What disease does this patient have? a large-scale open domain question answering dataset from medical exams , author=. Applied Sciences , volume=. 2021 , publisher=

  42. [42]

    Xeron Du and Yifan Yao and Kaijing Ma and Bingli Wang and Tianyu Zheng and King Zhu and Minghao Liu and Yiming Liang and Xiaolong Jin and Zhenlin Wei and Chujie Zheng and Kaixin Deng and Shuyue Guo and Shian Jia and Sichao Jiang and Yiyan Liao and Rui Li and Qinrui Li and Sirun Li and Yizhi LI and Yunwen Li and dehua ma and Yuansheng Ni and Haoran Que and...

  43. [43]

    Incorporating Copying Mechanism in Sequence-to-Sequence Learning

    Gu, Jiatao and Lu, Zhengdong and Li, Hang and Li, Victor O.K. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016. doi:10.18653/v1/P16-1154

  44. [44]

    Progress measures for grokking via mechanistic interpretability

    Progress measures for grokking via mechanistic interpretability , author=. arXiv preprint arXiv:2301.05217 , year=

  45. [45]

    ZhongXiang Sun and Xiaoxue Zang and Kai Zheng and Jun Xu and Xiao Zhang and Weijie Yu and Yang Song and Han Li , booktitle=. ReDe. 2025 , url=

  46. [46]

    Locating and Editing Factual Associations in

    Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov , journal=. Locating and Editing Factual Associations in. 2022 , note=

  47. [47]

    A ttention is not E xplanation

    Jain, Sarthak and Wallace, Byron C. A ttention is not E xplanation. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1357

  48. [48]

    Attention is not not Explanation

    Wiegreffe, Sarah and Pinter, Yuval. Attention is not not Explanation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1002

  49. [49]

    ArXiv , year=

    From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models , author=. ArXiv , year=

  50. [50]

    2024 , url=

    Yuri Kuratov and Aydar Bulatov and Petr Anokhin and Ivan Rodkin and Dmitry Igorevich Sorokin and Artyom Sorokin and Mikhail Burtsev , booktitle=. 2024 , url=

  51. [51]

    ♫ M u S i Q ue: Multihop Questions via Single-hop Question Composition

    Trivedi, Harsh and Balasubramanian, Niranjan and Khot, Tushar and Sabharwal, Ashish. ♫ M u S i Q ue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00475