Recognition: 2 theorem links
· Lean TheoremHow Language Models Process Negation
Pith reviewed 2026-05-08 18:25 UTC · model grok-4.3
The pith
Language models process negation by constructing representations of negative phrases more than by suppressing positives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Models implement both suppressive attention heads that attend to negated phrases and suppress associated concepts, and constructive mechanisms that directly encode the negated phrase as a vector promoting alternatives. The constructive mechanism is more prominent, and ablating late-layer attention modules that promote shortcuts markedly improves accuracy on negation-related tasks.
What carries the argument
The constructive mechanism that builds a representation of the entire negative phrase as a vector promoting alternatives, operating alongside suppressive attention heads.
If this is right
- Ablating late-layer attention modules that promote shortcuts greatly improves accuracy on questions involving negation.
- Models implement both a suppressive route and a stronger constructive route for negation.
- The constructive route encodes the full negative phrase directly rather than only damping positives.
- Both mechanisms coexist inside the same models.
Where Pith is reading between the lines
- Training methods could be adjusted to favor the constructive route and reduce reliance on shortcut attention.
- Similar dual-mechanism patterns may appear for other logical operators such as quantifiers or conditionals.
- Interpretability tools that separate competing internal routes could generalize to debugging other logical failures.
Load-bearing premise
The observational and causal interpretability techniques such as attention ablation and activation analysis accurately isolate the negation mechanisms without interference from other components.
What would settle it
No gain in accuracy on negation questions after ablating the identified late-layer attention modules, or activation patterns that fail to match either the constructive vector or the suppressive attention behavior.
Figures
read the original abstract
We study how Large Language Models (LLMs) process negation mechanistically. First, we establish that even though open-weight models often provide wrong answers to questions involving negation, they do possess internal components that process negation correctly. Their poor accuracy is due to late-layer attention behavior that promotes simple shortcuts; ablating those attention modules greatly improves accuracy on negation-related questions. Second, we uncover how models process negation. We consider two hypotheses: models could use attention heads that attend to the phrase being negated and suppress related concepts, or they could directly construct a representation of the entire negative phrase (e.g., representing "not gas" as a vector that promotes liquids and solids). We apply a range of observational and causal interpretability techniques on Mistral-7B and Llama-3.1-8B to show that models implement both mechanisms, with the "constructive" mechanism being more prominent. Combined, our work deepens the understanding of LLMs' internals, highlighting construction-dominant computations and the coexistence of competing mechanisms within LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs such as Mistral-7B and Llama-3.1-8B possess internal components that correctly process negation, but late-layer attention promotes shortcut behaviors that degrade accuracy on negation-related questions; ablating those modules improves performance. It tests two hypothesized mechanisms—suppressive attention heads that attend to negated phrases versus constructive construction of negative representations (e.g., 'not gas' promoting liquids/solids)—and applies observational and causal interpretability techniques to conclude that both are implemented, with the constructive mechanism being more prominent.
Significance. If the causal interventions cleanly isolate the two mechanisms without residual cross-talk from distributed attention, the work provides concrete mechanistic evidence for coexistence of competing computations in LLMs and a dominance ordering favoring construction over suppression. This strengthens the case for construction-dominant circuits in logical reasoning and supplies falsifiable predictions about ablation effects that could guide future circuit-level analyses.
major comments (2)
- [Abstract / causal-intervention results] Abstract and the causal-intervention section: the claim that ablating late-layer attention modules 'greatly improves accuracy' on negation questions is load-bearing for the shortcut hypothesis, yet the abstract supplies no quantitative effect sizes, baseline comparisons, or controls showing that the same ablation leaves non-negation performance unchanged. Without these, it is impossible to rule out that the improvement is an artifact of general capacity reduction rather than targeted removal of negation shortcuts.
- [Interpretability experiments] The section applying observational and causal techniques to distinguish suppressive vs. constructive mechanisms: because transformer attention is distributed, ablating or patching individual heads can produce effects consistent with both hypotheses simultaneously. The manuscript does not report head-level orthogonality tests, circuit decomposition, or controls that would demonstrate the interventions isolate one mechanism from the other; this directly undermines the ability to assert that the constructive mechanism is 'more prominent.'
minor comments (2)
- [Hypotheses section] Clarify the precise definition of 'constructive' vs. 'suppressive' representations with an example vector or activation pattern so readers can replicate the classification criteria.
- [Experimental setup] The datasets and exact negation-question templates used for accuracy measurements are not listed; include them (or a pointer to the release) to allow reproduction of the reported accuracy gains.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on our work. We address each major comment below and indicate where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / causal-intervention results] Abstract and the causal-intervention section: the claim that ablating late-layer attention modules 'greatly improves accuracy' on negation questions is load-bearing for the shortcut hypothesis, yet the abstract supplies no quantitative effect sizes, baseline comparisons, or controls showing that the same ablation leaves non-negation performance unchanged. Without these, it is impossible to rule out that the improvement is an artifact of general capacity reduction rather than targeted removal of negation shortcuts.
Authors: We agree that the abstract would be strengthened by including quantitative details. The full manuscript reports ablation results with specific accuracy improvements on negation tasks alongside controls confirming stable performance on non-negation benchmarks. We will revise the abstract to incorporate these effect sizes, baseline comparisons, and non-negation controls to directly address concerns about general capacity reduction. revision: yes
-
Referee: [Interpretability experiments] The section applying observational and causal techniques to distinguish suppressive vs. constructive mechanisms: because transformer attention is distributed, ablating or patching individual heads can produce effects consistent with both hypotheses simultaneously. The manuscript does not report head-level orthogonality tests, circuit decomposition, or controls that would demonstrate the interventions isolate one mechanism from the other; this directly undermines the ability to assert that the constructive mechanism is 'more prominent.'
Authors: We acknowledge the distributed nature of attention and the resulting challenge in cleanly isolating mechanisms. Our interventions combine head-specific ablation (targeting suppressive attention patterns to negated phrases) with representation-level patching (targeting constructive negative phrase vectors), yielding convergent evidence from observational and causal methods that favors the constructive mechanism. We will add a dedicated discussion of potential cross-talk, any available orthogonality metrics, and a more qualified statement on prominence to reflect the limitations of isolation in distributed systems. revision: partial
Circularity Check
No circularity: empirical interpretability findings are self-contained
full rationale
The paper's central claims rely on applying observational and causal interpretability methods (attention ablation, activation analysis) to Mistral-7B and Llama-3.1-8B to identify negation-processing mechanisms. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains are present that would reduce any result to its own inputs by construction. The coexistence and prominence of constructive vs. suppressive mechanisms are reported as outcomes of direct interventions on model internals, not as logical equivalences or ansatzes smuggled via prior self-work. This is a standard empirical analysis with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Activation patching and attention ablation can causally isolate the contribution of specific heads or layers to a behavioral outcome.
- domain assumption Observational techniques (e.g., attention visualization, representation similarity) combined with causal interventions can distinguish between suppression and constructive mechanisms.
Reference graph
Works this paper leans on
-
[1]
2025 , eprint=
Don't Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load , author=. 2025 , eprint=
2025
-
[3]
The Twelfth International Conference on Learning Representations , year=
Function Vectors in Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=
-
[4]
The Twelfth International Conference on Learning Representations , year=
On the Foundations of Shortcut Learning , author=. The Twelfth International Conference on Learning Representations , year=
-
[5]
Advances in Neural Information Processing Systems , volume=
Pre-trained large language models use fourier features to compute addition , author=. Advances in Neural Information Processing Systems , volume=
-
[7]
Saphra, Naomi and Wiegreffe, Sarah. Mechanistic?. Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. 2024. doi:10.18653/v1/2024.blackboxnlp-1.30
-
[8]
Distill , year =
Olah, Chris and Cammarata, Nick and Schubert, Ludwig and Goh, Gabriel and Petrov, Michael and Carter, Shan , title =. Distill , year =
-
[9]
Mechanistic Interpretability Workshop at NeurIPS 2025 , year=
RelP: Faithful and Efficient Circuit Discovery via Relevance Patching , author=. Mechanistic Interpretability Workshop at NeurIPS 2025 , year=
2025
-
[11]
2023 , eprint=
Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. 2023 , eprint=
2023
-
[12]
On the properties of neural machine translation: Encoder-decoder approaches
Cho, Kyunghyun and van Merri. On the Properties of Neural Machine Translation: Encoder -- Decoder Approaches. Proceedings of SSST -8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. 2014. doi:10.3115/v1/W14-4012
-
[13]
Mechanistic Interpretability for
Leonard Bereska and Stratis Gavves , journal=. Mechanistic Interpretability for. 2024 , url=
2024
-
[14]
interpreting GPT: the logit lens , author=
-
[15]
The Twelfth International Conference on Learning Representations , year=
Efficient Streaming Language Models with Attention Sinks , author=. The Twelfth International Conference on Learning Representations , year=
-
[16]
Shortcut learning in deep neural networks , volume =. Nature Machine Intelligence , author =. 2020 , pages =. doi:10.1038/s42256-020-00257-z , number =
-
[17]
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks , year=
Räuker, Tilman and Ho, Anson and Casper, Stephen and Hadfield-Menell, Dylan , booktitle=. Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks , year=
-
[18]
2025 , eprint=
Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking , author=. 2025 , eprint=
2025
-
[19]
The Thirteenth International Conference on Learning Representations , year=
The Unreasonable Ineffectiveness of the Deeper Layers , author=. The Thirteenth International Conference on Learning Representations , year=
-
[20]
Goodfire Research , year =
Aranguri, Santiago and McGrath, Tom , title =. Goodfire Research , year =
-
[21]
Proceedings of the 41st International Conference on Machine Learning , pages =
Position: The Platonic Representation Hypothesis , author =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , editor =
2024
-
[22]
Interpretability in the
Wang, Kevin Ro and Variengien, Alexandre and Conmy, Arthur and Shlegeris, Buck and Steinhardt, Jacob , year =. Interpretability in the. The
-
[23]
Locating and Editing Factual Associations in GPT , url =
Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , booktitle =. Locating and Editing Factual Associations in GPT , url =
-
[24]
2024 , eprint=
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs , author=. 2024 , eprint=
2024
-
[26]
2024 , eprint=
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders , author=. 2024 , eprint=
2024
-
[28]
Proceedings of the 41st International Conference on Machine Learning , pages =
The Linear Representation Hypothesis and the Geometry of Large Language Models , author =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , editor =
2024
-
[29]
2023 , journal=
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. 2023 , journal=
2023
-
[30]
The Thirteenth International Conference on Learning Representations , year=
Not All Language Model Features Are One-Dimensionally Linear , author=. The Thirteenth International Conference on Learning Representations , year=
-
[31]
Distill , year =
Olah, Chris and Satyanarayan, Arvind and Johnson, Ian and Carter, Shan and Schubert, Ludwig and Ye, Katherine and Mordvintsev, Alexander , title =. Distill , year =
-
[32]
Root Mean Square Layer Normalization , url =
Zhang, Biao and Sennrich, Rico , booktitle =. Root Mean Square Layer Normalization , url =
-
[33]
2016 , eprint=
Layer Normalization , author=. 2016 , eprint=
2016
-
[34]
Attention is All you Need , url =
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =
-
[36]
Smith and Hannaneh Hajishirzi , booktitle=
Evan Pete Walsh and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Shane Arora and Akshita Bhagia and Yuling Gu and Shengyi Huang and Matt Jordan and Nathan Lambert and Dustin Schwenk and Oyvind Tafjord and Taira Anderson and David Atkinson and Faeze Brahman and Christopher Clark and Pradeep Dasigi and Nouha Dziri and Allyson Ettinger and Michal Guerqu...
2025
-
[37]
2023 , eprint=
Mistral 7B , author=. 2023 , eprint=
2023
-
[38]
2024 , eprint=
Gemma 2: Improving Open Language Models at a Practical Size , author=. 2024 , eprint=
2024
-
[39]
2024 , eprint=
The Llama 3 Herd of Models , author=. 2024 , eprint=
2024
-
[42]
2019 , eprint=
RoBERTa: A Robustly Optimized BERT Pretraining Approach , author=. 2019 , eprint=
2019
-
[46]
2001 , publisher=
A mathematical introduction to logic , author=. 2001 , publisher=
2001
-
[47]
Localizing model behavior with path patching , author=. arXiv preprint arXiv:2304.05969 , year=
-
[51]
E., Hume, T., Carter, S., Henighan, T., and Olah, C
Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Hatfield-Dodds, Z., Tamkin, A., Nguyen, K., McLean, B., Burke, J. E., Hume, T., Carter, S., Henighan, T., and Olah, C. Towards monosemanticity: Decomposing languag...
2023
-
[52]
Chughtai, B., Cooney, A., and Nanda, N. Summing up the facts: Additive mechanisms behind factual recall in llms, 2024. URL https://arxiv.org/abs/2402.07321
-
[53]
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Cunningham, H., Ewart, A., Riggs, L., Huben, R., and Sharkey, L. Sparse autoencoders find highly interpretable features in language models, 2023. URL https://arxiv.org/abs/2309.08600
work page Pith review arXiv 2023
-
[54]
doi:10.18653/v1/N19-1423 , pages =
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT : Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short...
-
[55]
J., Liao, I., Gurnee, W., and Tegmark, M
Engels, J., Michaud, E. J., Liao, I., Gurnee, W., and Tegmark, M. Not all language model features are one-dimensionally linear. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=d63a4AM4hb
2025
-
[56]
Fisher, R. A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7 0 (2): 0 179--188, 1936. doi:https://doi.org/10.1111/j.1469-1809.1936.tb02137.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-1809.1936.tb02137.x
-
[57]
Transformer Feed-Forward Layers Are Key-Value Memories
Geva, M., Schuster, R., Berant, J., and Levy, O. Transformer feed-forward layers are key-value memories. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 5484--5495, Online and Punta Cana, Dominican Republic, November 2021. Association for Computa...
work page internal anchor Pith review doi:10.18653/v1/2021.emnlp-main.446 2021
-
[58]
Geva, M., Bastings, J., Filippova, K., and Globerson, A. Dissecting recall of factual associations in auto-regressive language models. In Bouamor, H., Pino, J., and Bali, K. (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 12216--12235, Singapore, December 2023. Association for Computational Linguistics....
-
[59]
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., et al. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783
work page internal anchor Pith review arXiv 2024
-
[60]
The unreasonable ineffectiveness of the deeper layers
Gromov, A., Tirumala, K., Shapourian, H., Glorioso, P., and Roberts, D. The unreasonable ineffectiveness of the deeper layers. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=ngmEcEer8a
2025
-
[61]
Gubelmann, R. and Handschuh, S. Context matters: A pragmatic study of PLM s' negation understanding. In Muresan, S., Nakov, P., and Villavicencio, A. (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 4602--4621, Dublin, Ireland, May 2022. Association for Computational Linguistics....
-
[62]
Hasson, U. and Glucksberg, S. Does understanding negation entail affirmation?: An examination of negated metaphors. Journal of Pragmatics, 38 0 (7): 0 1015--1032, 2006. ISSN 0378-2166. doi:https://doi.org/10.1016/j.pragma.2005.12.005. URL https://www.sciencedirect.com/science/article/pii/S0378216606000051. Special Issue: Processes and Products of Negation
-
[63]
Stefan Heimersheim and Neel Nanda
He, Z., Shu, W., Ge, X., Chen, L., Wang, J., Zhou, Y., Liu, F., Guo, Q., Huang, X., Wu, Z., Jiang, Y.-G., and Qiu, X. Llama scope: Extracting millions of features from llama-3.1-8b with sparse autoencoders, 2024. URL https://arxiv.org/abs/2410.20526
-
[64]
Hermann, K., Mobahi, H., FEL, T., and Mozer, M. C. On the foundations of shortcut learning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Tj3xLVuE9f
2024
-
[65]
R., Eberle, O., Khakzar, A., and Nanda, N
Jafari, F. R., Eberle, O., Khakzar, A., and Nanda, N. Relp: Faithful and efficient circuit discovery via relevance patching. In Mechanistic Interpretability Workshop at NeurIPS 2025, 2025. URL https://openreview.net/forum?id=5PKPy82sWN
2025
-
[66]
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. Mistral 7b, 2023. URL https://arxiv.org/abs/2310.06825
work page Pith review arXiv 2023
-
[67]
doi: 10.18653/v1/2020.acl-main.698
Kassner, N. and Sch \"u tze, H. Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J. (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.\ 7811--7818, Online, July 2020. Association for Computational Linguistic...
-
[68]
The self-contained negation test set
Kletz, D., Amsili, P., and Candito, M. The self-contained negation test set. In Belinkov, Y., Hao, S., Jumelet, J., Kim, N., McCarthy, A., and Mohebbi, H. (eds.), Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pp.\ 212--221, Singapore, December 2023. Association for Computational Linguistics. doi:10.18653/...
-
[69]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. Roberta: A robustly optimized bert pretraining approach, 2019. URL https://arxiv.org/abs/1907.11692
work page internal anchor Pith review arXiv 2019
-
[70]
Don’t think of the white bear: Ironic negation in transformer models under cognitive load
Mann, L., Saxena, N., Tandon, S., Sun, C., Toteja, S., and Zhu, K. Don't think of the white bear: Ironic negation in transformer models under cognitive load, 2025. URL https://arxiv.org/abs/2511.12381
-
[71]
S., Conmy, A., Rushing, C., McGrath, T., and Nanda, N
McDougall, C. S., Conmy, A., Rushing, C., McGrath, T., and Nanda, N. Copy suppression: Comprehensively understanding a motif in language model attention heads. In Belinkov, Y., Kim, N., Jumelet, J., Mohebbi, H., Mueller, A., and Chen, H. (eds.), Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pp.\ 337--363,...
-
[72]
Locating and editing factual associations in gpt
Meng, K., Bau, D., Andonian, A., and Belinkov, Y. Locating and editing factual associations in gpt. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.\ 17359--17372. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file...
2022
-
[73]
interpreting gpt: the logit lens, 2020
nostalgebraist. interpreting gpt: the logit lens, 2020. URL https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens
2020
-
[74]
Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A. The building blocks of interpretability. Distill, 2018. doi:10.23915/distill.00010. https://distill.pub/2018/building-blocks
-
[75]
The default computation of negated meanings
Papeo, L., Hochmann, J.-R., and Battelli, L. The default computation of negated meanings. Journal of Cognitive Neuroscience, 28 0 (12): 0 1980--1986, 12 2016. ISSN 0898-929X. doi:10.1162/jocn_a_01016. URL https://doi.org/10.1162/jocn_a_01016
-
[76]
Gemma 2: Improving Open Language Models at a Practical Size
Riviere, M., Pathak, S., Sessa, P. G., Hardin, C., Bhupatiraju, S., Hussenot, L., Mesnard, T., Shahriari, B., Ramé, A., Ferret, J., Liu, P., Tafti, P., Friesen, A., Casbon, M., Ramos, S., Kumar, R., Lan, C. L., Jerome, S., Tsitsulin, A., Vieillard, N., Stanczyk, P., Girgin, S., Momchev, N., Hoffman, M., Thakoor, S., Grill, J.-B., Neyshabur, B., Bachem, O....
work page internal anchor Pith review arXiv 2024
-
[77]
S., Mueller, A., Wallace, B
Todd, E., Li, M., Sharma, A. S., Mueller, A., Wallace, B. C., and Bau, D. Function vectors in large language models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=AwyxtyMwaG
2024
-
[78]
N., Kaiser, L
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proc...
2017
-
[79]
Walsh, E. P., Soldaini, L., Groeneveld, D., Lo, K., Arora, S., Bhagia, A., Gu, Y., Huang, S., Jordan, M., Lambert, N., Schwenk, D., Tafjord, O., Anderson, T., Atkinson, D., Brahman, F., Clark, C., Dasigi, P., Dziri, N., Ettinger, A., Guerquin, M., Heineman, D., Ivison, H., Koh, P. W., Liu, J., Malik, S., Merrill, W., Miranda, L. J. V., Morrison, J., Murra...
2025
-
[80]
R., Variengien, A., Conmy, A., Shlegeris, B., and Steinhardt, J
Wang, K. R., Variengien, A., Conmy, A., Shlegeris, B., and Steinhardt, J. Interpretability in the Wild : a Circuit for Indirect Object Identification in GPT -2 Small . In The Eleventh International Conference on Learning Representations , 2023. URL https://openreview.net/forum?id=NpsVSN6o4ul
2023
-
[81]
Efficient streaming language models with attention sinks
Xiao, G., Tian, Y., Chen, B., Han, S., and Lewis, M. Efficient streaming language models with attention sinks. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=NG7sS51zVF
2024
-
[82]
Yan, T. L. and Jia, R. Promote, suppress, iterate: How language models answer one-to-many factual queries. In Christodoulopoulos, C., Chakraborty, T., Rose, C., and Peng, V. (eds.), Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp.\ 16111--16134, Suzhou, China, November 2025. Association for Computational Linguist...
-
[83]
Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Xia, T., Ren, X., Ren, X., Fan, Y., Su, Y., Zhang, Y., Wan, Y., Liu, Y., Cui...
work page Pith review arXiv 2024
-
[84]
Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M., Li, M., Zhang, P., Wang, P., Zhu, Q...
work page internal anchor Pith review arXiv 2025
-
[85]
and Sennrich, R
Zhang, B. and Sennrich, R. Root mean square layer normalization. In Wallach, H., Larochelle, H., Beygelzimer, A., d Alch\' e -Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/1e8a19426224ca89e83cef47f1e7f53b-...
2019
-
[86]
Zuanazzi, A., Ripollés, P., Lin, W. M., Gwilliams, L., King, J.-R., and Poeppel, D. Negation mitigates rather than inverts the neural representations of adjectives. PLOS Biology, 22 0 (5): 0 1--33, 05 2024. doi:10.1371/journal.pbio.3002622. URL https://doi.org/10.1371/journal.pbio.3002622
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.