Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

Guruprakash J; Krithika L.B

arxiv: 2606.24331 · v1 · pith:HVWY4CU4new · submitted 2026-06-23 · 💻 cs.CL · cs.ET

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

Guruprakash J , Krithika L.B This is my paper

Pith reviewed 2026-06-26 00:05 UTC · model grok-4.3

classification 💻 cs.CL cs.ET

keywords transformer modelslanguage modelssurveycritical assessmentdeploymentenergy costalignment methodsdomain applications

0 comments

The pith

Transformer models show distinct trade-offs in energy use, parameters and domain fit across architectures

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper surveys the main families of transformer-based language models and organizes them into a taxonomy that includes encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, and generator-discriminator variants. It extends the review to post-2023 developments such as instruction tuning, reinforcement learning from human feedback, direct preference optimisation, mixture-of-experts scaling and retrieval augmentation. The paper then surveys deployments across healthcare, finance, legal, education and other verticals, linking each to the capabilities that make a given transformer suitable. Its central contribution is a critical assessment that compares architectures on four deployment axes, quantifies the parameter count versus energy cost trade-off, and discusses how alignment methods, data provenance and benchmark saturation alter the meaning of state-of-the-art. Practitioners gain a structured way to navigate rapid model releases when choosing tools for specific uses.

Core claim

Transformer-based language models are organised into a working taxonomy covering encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, and generator-discriminator variants. Post-2023 developments that changed practice include instruction tuning, reinforcement learning from human feedback, direct preference optimisation, mixture-of-experts scaling, retrieval augmentation and the flagship families from major providers. Deployments in healthcare, finance, legal, education, customer service, creative writing and scientific work are surveyed and linked to the specific capabilities that make each transformer the appropriate tool. The critical assessment compares architectur

What carries the argument

A taxonomy of transformer families together with four deployment-comparison axes that support quantification of the parameter-energy trade-off

If this is right

Different transformer variants match different domains through their specific capabilities such as long-context handling or preference optimisation.
Higher parameter counts correspond to higher energy costs, directly affecting which models are viable for deployment.
Alignment methods including reinforcement learning from human feedback and direct preference optimisation shift the criteria used to judge state-of-the-art performance.
Data provenance and benchmark saturation must be factored into any claim that a model is state of the art.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The four-axis comparison framework could be applied by organisations to create internal model-selection checklists.
The quantified energy trade-off may prompt hardware vendors to prioritise efficiency metrics in future accelerator designs.
The research questions listed in the final section point to open problems around long-term stability of aligned models in vertical applications.

Load-bearing premise

The survey's selection of literature and models is representative and sufficient to support fair comparisons on the four deployment axes and the energy-parameter quantification.

What would settle it

A new energy-consumption study across the surveyed models that finds no consistent relationship between parameter count and energy cost.

read the original abstract

Transformer-based language models have become the default substrate for natural language processing and the pace of new releases has made it hard for practitioners to separate durable ideas from the noise of incremental announcements. This review works at two levels. At the level of mechanism, we organise the main transformer families into a working taxonomy, covering encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, and generator-discriminator variants. We then extend the discussion to post-2023 developments that changed the picture in practice: instruction tuning, reinforcement learning from human feedback, direct preference optimisation, mixture-of-experts scaling, retrieval augmentation and the current flagship model families from OpenAI, Anthropic, Google, Meta, Mistral and DeepSeek. At the level of use, we survey deployments across healthcare, finance, legal, education, customer service, creative writing and scientific work. Based on this we link each to the specific capabilities that make a transformer the appropriate tool. The contribution of this paper is a critical assessment that is based on the survey. We compare architectures on four axes that matter to deployment decisions, we quantify the trade-off between parameter count and energy cost. We also discuss how alignment methods, data provenance and benchmark saturation change what it means to call a model "state of the art". The final section lists the research questions that we think deserve more attention.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward survey that organizes existing transformer work and domain applications but adds no new mechanisms, data, or verified comparisons.

read the letter

The main thing to know is that this paper is a literature review. It lays out a taxonomy of transformer families (encoder-only through generator-discriminator), covers post-2023 items like instruction tuning, RLHF, DPO, MoE, and RAG, then maps those to deployments in healthcare, finance, legal, and a few other areas. The authors say their contribution is the critical assessment that follows: comparisons on four deployment axes plus a quantification of parameter count versus energy cost.

What it does is collect and group published material in one place. That can save a practitioner some time when they need a quick map of which model families are used where. The taxonomy section looks like standard organization rather than invention.

The soft spot is the central claim about the four-axis comparisons and the energy-parameter trade-off. The abstract gives no search protocol, inclusion rules, or model list, so it is not clear whether the selected papers and flagship families support balanced or representative claims. Without that, the quantifications rest on whatever literature the authors happened to pull. The discussion of alignment, data provenance, and benchmark saturation is sensible but does not resolve open questions or add new evidence.

This paper is for someone who wants an overview of current transformer use across verticals rather than new technical results. A reading group focused on applications might find it useful for discussion; a methods group would not. It does not contain original findings that I would cite.

It is coherent enough on its own terms to go to peer review so referees can check whether the survey selection actually supports the claimed comparisons.

Referee Report

1 major / 1 minor

Summary. The manuscript surveys transformer architectures (encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, generator-discriminator), post-2023 developments (instruction tuning, RLHF, DPO, MoE, RAG, flagship families), and domain deployments (healthcare, finance, legal, education, customer service, creative writing, scientific work). It claims a critical assessment that compares architectures on four deployment-relevant axes, quantifies the parameter-count versus energy-cost trade-off, and discusses how alignment methods, data provenance, and benchmark saturation redefine 'state of the art'.

Significance. If the underlying literature sample is representative, the structured taxonomy and explicit linkage of architectural choices to deployment axes plus the energy-parameter quantification would provide practitioners with a practical decision framework amid rapid model releases. The discussion of evolving SOTA criteria is a timely addition to the survey literature.

major comments (1)

[Abstract] Abstract (contribution paragraph): the central claims rest on architecture comparisons across four deployment axes and a quantification of the parameter-energy trade-off, yet no search protocol, inclusion criteria, or exhaustive model list is supplied; without these the representativeness of the selected post-2023 literature and flagship families cannot be verified, directly undermining the reliability of the reported comparisons and quantification.

minor comments (1)

[Abstract] The four deployment axes are referenced but never enumerated in the abstract; listing them explicitly would improve readability of the contribution statement.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and for highlighting the need for greater methodological transparency in our survey. We address the single major comment below and commit to revisions that strengthen the paper without altering its core contributions.

read point-by-point responses

Referee: [Abstract] Abstract (contribution paragraph): the central claims rest on architecture comparisons across four deployment axes and a quantification of the parameter-energy trade-off, yet no search protocol, inclusion criteria, or exhaustive model list is supplied; without these the representativeness of the selected post-2023 literature and flagship families cannot be verified, directly undermining the reliability of the reported comparisons and quantification.

Authors: We agree that the absence of an explicit search protocol and inclusion criteria limits the ability to assess representativeness. In the revised manuscript we will add a new subsection (Section 2.1, Literature Selection and Scope) that details: (1) search strategy (keywords such as 'transformer architecture 2023+', 'RLHF', 'MoE scaling', 'domain-specific LLM deployment' across arXiv, ACL, NeurIPS, and Google Scholar); (2) inclusion criteria (peer-reviewed or high-impact arXiv preprints from 2023 onward that report empirical results on the four deployment axes or energy metrics, plus all flagship families with public parameter counts); (3) exclusion criteria (purely theoretical works without deployment discussion, non-English papers, and incremental fine-tuning studies without architectural novelty); and (4) an appendix table enumerating every model and paper used for the architecture taxonomy, axis comparisons, and parameter-energy quantification. The energy trade-off numbers are drawn directly from the cited primary sources (e.g., reported training or inference FLOPs converted via standard carbon-intensity factors); the revision will make this sourcing explicit so readers can replicate or extend the quantification. revision: yes

Circularity Check

0 steps flagged

No circularity: survey paper with no derivations or self-referential predictions

full rationale

This is a literature survey that organizes existing transformer families and deployments from external sources, then offers a critical assessment based on those sources. The abstract and contribution statement describe taxonomy construction, deployment surveys, and comparisons on deployment axes plus a parameter-energy trade-off quantification, all drawn from cited literature rather than any internal equations, fitted parameters, or self-citations that reduce the outputs to the paper's own inputs by construction. No load-bearing steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a review paper the central claims rest on the completeness of the literature selection and the fairness of the four-axis comparison framework rather than new parameters, axioms, or entities.

pith-pipeline@v0.9.1-grok · 5777 in / 1030 out tokens · 32965 ms · 2026-06-26T00:05:33.905786+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 23 canonical work pages · 18 internal anchors

[1]

and Kaiser, Lukasz and Polosukhin, Illia , title =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , title =. Advances in Neural Information Processing Systems , year =
[2]

A Survey of Large Language Models

Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and others , title =. arXiv preprint arXiv:2303.18223 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Large Language Models: A Survey

Minaee, Shervin and Mikolov, Tomas and Nikzad, Narjes and Chenaghlu, Meysam and Socher, Richard and Amatriain, Xavier and Gao, Jianfeng , title =. arXiv preprint arXiv:2402.06196 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[4]

and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , title =

Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , title =. Advances in Neural Information Processing Systems , volume =
[5]

and Finn, Chelsea , title =

Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Ermon, Stefano and Manning, Christopher D. and Finn, Chelsea , title =. Advances in Neural Information Processing Systems , year =
[6]

Journal of Machine Learning Research , volume =

Fedus, William and Zoph, Barret and Shazeer, Noam , title =. Journal of Machine Learning Research , volume =
[7]

Jiang, Albert Q. and Sablayrolles, Alexandre and Roux, Antoine and Mensch, Arthur and Savary, Blanche and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Hanna, Emma Bou and Bressand, Florian and others , title =. arXiv preprint arXiv:2401.04088 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , journal =

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , journal =
[9]

Long Short-Term Memory , journal =

Hochreiter, Sepp and Schmidhuber, J. Long Short-Term Memory , journal =
[10]

Proceedings of NAACL-HLT , year =

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , title =. Proceedings of NAACL-HLT , year =
[11]

Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya , title =
[12]

Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya , title =
[13]

Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others , title =. Advances in Neural Information Processing Systems , volume =
[14]

, title =

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , title =. Journal of Machine Learning Research , volume =
[15]

Scaling Laws for Neural Language Models

Kaplan, Jared and McCandlish, Sam and Henighan, Tom and Brown, Tom B. and Chess, Benjamin and Child, Rewon and Gray, Scott and Radford, Alec and Wu, Jeffrey and Amodei, Dario , title =. arXiv preprint arXiv:2001.08361 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2001
[16]

Advances in Neural Information Processing Systems , year =

Hoffmann, Jordan and Borgeaud, Sebastian and Mensch, Arthur and Buchatskaya, Elena and Cai, Trevor and Rutherford, Eliza and de Las Casas, Diego and Hendricks, Lisa Anne and Welbl, Johannes and Clark, Aidan and others , title =. Advances in Neural Information Processing Systems , year =
[17]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , title =. arXiv preprint arXiv:1907.11692 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1907
[18]

International Conference on Learning Representations , year =

He, Pengcheng and Liu, Xiaodong and Gao, Jianfeng and Chen, Weizhu , title =. International Conference on Learning Representations , year =
[19]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , title =. arXiv preprint arXiv:2307.09288 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[20]

The Llama 3 Herd of Models

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , title =. arXiv preprint arXiv:2407.21783 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and others , title =. arXiv preprint arXiv:2310.06825 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[22]

DeepSeek-V3 Technical Report

DeepSeek-AI , title =. arXiv preprint arXiv:2412.19437 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Proceedings of ACL , year =

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke , title =. Proceedings of ACL , year =
[24]

and Salakhutdinov, Ruslan , title =

Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc V. and Salakhutdinov, Ruslan , title =. Proceedings of ACL , year =
[25]

Longformer: The Long-Document Transformer

Beltagy, Iz and Peters, Matthew E. and Cohan, Arman , title =. arXiv preprint arXiv:2004.05150 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2004
[26]

Advances in Neural Information Processing Systems , year =

Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and Ahmed, Amr , title =. Advances in Neural Information Processing Systems , year =
[27]

Rethinking Attention with Performers

Choromanski, Krzysztof and Likhosherstov, Valerii and Dohan, David and Song, Xingyou and Gane, Andreea and Sarlos, Tamas and Hawkins, Peter and Davis, Jared and Mohiuddin, Afroz and Kaiser, Lukasz and others , title =. arXiv preprint arXiv:2009.14794 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2009
[28]

arXiv preprint arXiv:2307.02486 , year =

Ding, Jiayu and Ma, Shuming and Dong, Li and Zhang, Xingxing and Huang, Shaohan and Wang, Wenhui and Zheng, Nanning and Wei, Furu , title =. arXiv preprint arXiv:2307.02486 , year =

work page arXiv
[29]

and Ermon, Stefano and Rudra, Atri and R

Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R. Advances in Neural Information Processing Systems , year =
[30]

, title =

Yang, Zhilin and Dai, Zihang and Yang, Yiming and Carbonell, Jaime and Salakhutdinov, Ruslan and Le, Quoc V. , title =. Advances in Neural Information Processing Systems , year =
[31]

and Manning, Christopher D

Clark, Kevin and Luong, Minh-Thang and Le, Quoc V. and Manning, Christopher D. , title =. International Conference on Learning Representations , year =
[32]

and Hinton, Geoffrey and Dean, Jeff , title =

Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc V. and Hinton, Geoffrey and Dean, Jeff , title =. International Conference on Learning Representations , year =
[33]

Proximal Policy Optimization Algorithms

Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , title =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[34]

International Conference on Machine Learning , year =

Gao, Leo and Schulman, John and Hilton, Jacob , title =. International Conference on Machine Learning , year =
[35]

KTO: Model Alignment as Prospect Theoretic Optimization

Ethayarajh, Kawin and Xu, Winnie and Muennighoff, Niklas and Jurafsky, Dan and Kiela, Douwe , title =. arXiv preprint arXiv:2402.01306 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Constitutional AI: Harmlessness from AI Feedback

Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others , title =. arXiv preprint arXiv:2212.08073 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[37]

Conover, Mike and Hayes, Matt and Mathur, Ankit and Xie, Jianwei and Wan, Jun and Shah, Sam and Ghodsi, Ali and Wendell, Patrick and Zaharia, Matei and Xin, Reynold , title =
[38]

, title =

Taori, Rohan and Gulrajani, Ishaan and Zhang, Tianyi and Dubois, Yann and Li, Xuechen and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B. , title =
[39]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. International Conference on Learning Representations , year =
[40]

Advances in Neural Information Processing Systems , year =

Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , title =. Advances in Neural Information Processing Systems , year =
[41]

npj Digital Medicine , volume =

Rasmy, Laila and Xiang, Yang and Xie, Ziqian and Tao, Cui and Zhi, Degui , title =. npj Digital Medicine , volume =
[42]

Bioinformatics , volume =

Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo , title =. Bioinformatics , volume =
[43]

Proceedings of the Clinical Natural Language Processing Workshop , year =

Alsentzer, Emily and Murphy, John and Boag, William and Weng, Wei-Hung and Jindi, Di and Naumann, Tristan and McDermott, Matthew , title =. Proceedings of the Clinical Natural Language Processing Workshop , year =
[44]

and Kainz, Bernhard , title =

Hou, Benjamin and Kaissis, Georgios and Summers, Ronald M. and Kainz, Bernhard , title =. arXiv preprint arXiv:2107.02104 , year =

work page arXiv
[45]

and Kim, Jennifer L

Zakka, Cyril and Shad, Rohan and Chaurasia, Akash and Dalal, Alex R. and Kim, Jennifer L. and Moor, Michael and Fong, Robyn and Phillips, Curran and Alexander, Kevin and Ashley, Euan and others , title =. NEJM AI , year =
[46]

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

Araci, Dogu , title =. arXiv preprint arXiv:1908.10063 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1908
[47]

Proceedings of the AAAI Conference on Artificial Intelligence , year =

Yang, Linyi and Li, Jiazheng and Dong, Ruihai and Zhang, Yue and Smyth, Barry , title =. Proceedings of the AAAI Conference on Artificial Intelligence , year =
[48]

Findings of EMNLP , year =

Chalkidis, Ilias and Fergadiotis, Manos and Malakasiotis, Prodromos and Aletras, Nikolaos and Androutsopoulos, Ion , title =. Findings of EMNLP , year =
[49]

arXiv preprint arXiv:2010.12871 , year =

Shaheen, Zein and Wohlgenannt, Gerhard and Filtz, Erwin , title =. arXiv preprint arXiv:2010.12871 , year =

work page arXiv 2010
[50]

, title =

Dahl, Matthew and Magesh, Varun and Suzgun, Mirac and Ho, Daniel E. , title =. Journal of Legal Analysis , volume =
[51]

and Malhotra, Akanksha and Jafari, Amir , title =

Ormerod, Christopher M. and Malhotra, Akanksha and Jafari, Amir , title =. arXiv preprint arXiv:2102.13136 , year =

work page arXiv
[52]

arXiv preprint arXiv:2206.04187 , year =

Kulshreshtha, Devang and Shayan, Muhammad and Belfer, Robert and Reddy, Siva and Serban, Iulian Vlad and Kochmar, Ekaterina , title =. arXiv preprint arXiv:2206.04187 , year =

work page arXiv
[53]

Advances in Neural Information Processing Systems , year =

Schick, Timo and Dwivedi-Yu, Jane and Dess. Advances in Neural Information Processing Systems , year =
[54]

SSRN Electronic Journal , year =

Marco, Guillermo and Gonzalo, Julio and Rello, Luz , title =. SSRN Electronic Journal , year =
[55]

Journal of Artificial Intelligence Research , volume=

Generating extractive summaries of scientific paradigms , author=. Journal of Artificial Intelligence Research , volume=
[56]

Evaluating Large Language Models Trained on Code

Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde de Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and others , title =. arXiv preprint arXiv:2107.03374 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[57]

Proceedings of the ACM Workshop on Artificial Intelligence and Security , year =

Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , title =. Proceedings of the ACM Workshop on Artificial Intelligence and Security , year =
[58]

Energy and Policy Considerations for Deep Learning in NLP

Strubell, Emma and Ganesh, Ananya and McCallum, Andrew , title =. arXiv preprint arXiv:1906.02243 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1906
[59]

Carbon Emissions and Large Neural Network Training

Patterson, David and Gonzalez, Joseph and Le, Quoc and Liang, Chen and Munguia, Lluis-Miquel and Rothchild, Daniel and So, David and Texier, Maud and Dean, Jeff , title =. arXiv preprint arXiv:2104.10350 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[60]

Advances in Neural Information Processing Systems , year =

Dettmers, Tim and Lewis, Mike and Belkada, Younes and Zettlemoyer, Luke , title =. Advances in Neural Information Processing Systems , year =
[61]

International Conference on Machine Learning , year =

Leviathan, Yaniv and Kalman, Matan and Matias, Yossi , title =. International Conference on Machine Learning , year =
[62]

Perrigo, Billy , title =
[63]

ACM Computing Surveys , volume =

Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Ye Jin and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =
[64]

Advances in Neural Information Processing Systems , year =

Wei, Alexander and Haghtalab, Nika and Steinhardt, Jacob , title =. Advances in Neural Information Processing Systems , year =
[65]

and Cheng, Newton and Durmus, Esin and Hatfield-Dodds, Zac and Johnston, Scott R

Sharma, Mrinank and Tong, Meg and Korbak, Tomasz and Duvenaud, David and Askell, Amanda and Bowman, Samuel R. and Cheng, Newton and Durmus, Esin and Hatfield-Dodds, Zac and Johnston, Scott R. and others , title =. International Conference on Learning Representations , year =
[66]

Regulation (EU) 2024/1689 of the

2024
[67]

and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =

Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency , pages =
[68]

Language (Technology) is Power: A Critical Survey of ``Bias'' in

Blodgett, Su Lin and Barocas, Solon and Daum. Language (Technology) is Power: A Critical Survey of ``Bias'' in. Proceedings of ACL , year =
[69]

, title =

Oren, Yonatan and Meister, Nicole and Chatterji, Niladri and Ladhak, Faisal and Hashimoto, Tatsunori B. , title =. International Conference on Learning Representations , year =
[70]

and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy , title =

Liu, Nelson F. and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy , title =. Transactions of the Association for Computational Linguistics , volume =
[71]

Transactions on Machine Learning Research , year =

Liang, Percy and Bommasani, Rishi and Lee, Tony and Tsipras, Dimitris and Soylu, Dilara and Yasunaga, Michihiro and Zhang, Yian and Narayanan, Deepak and Wu, Yuhuai and Kumar, Ananya and others , title =. Transactions on Machine Learning Research , year =
[72]

and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri

Srivastava, Aarohi and Rastogi, Abhinav and Rao, Abhishek and Shoeb, Abu Awal Md and Abid, Abubakar and Fisch, Adam and Brown, Adam R. and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri. Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models , journal =
[73]

Olah, Chris and Cammarata, Nick and Schubert, Ludwig and Goh, Gabriel and Petrov, Michael and Carter, Shan , title =
[74]

Transformer Circuits Thread , year =

Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and others , title =. Transformer Circuits Thread , year =

[1] [1]

and Kaiser, Lukasz and Polosukhin, Illia , title =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , title =. Advances in Neural Information Processing Systems , year =

[2] [2]

A Survey of Large Language Models

Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and others , title =. arXiv preprint arXiv:2303.18223 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Large Language Models: A Survey

Minaee, Shervin and Mikolov, Tomas and Nikzad, Narjes and Chenaghlu, Meysam and Socher, Richard and Amatriain, Xavier and Gao, Jianfeng , title =. arXiv preprint arXiv:2402.06196 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , title =

Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , title =. Advances in Neural Information Processing Systems , volume =

[5] [5]

and Finn, Chelsea , title =

Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Ermon, Stefano and Manning, Christopher D. and Finn, Chelsea , title =. Advances in Neural Information Processing Systems , year =

[6] [6]

Journal of Machine Learning Research , volume =

Fedus, William and Zoph, Barret and Shazeer, Noam , title =. Journal of Machine Learning Research , volume =

[7] [7]

Jiang, Albert Q. and Sablayrolles, Alexandre and Roux, Antoine and Mensch, Arthur and Savary, Blanche and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Hanna, Emma Bou and Bressand, Florian and others , title =. arXiv preprint arXiv:2401.04088 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , journal =

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , journal =

[9] [9]

Long Short-Term Memory , journal =

Hochreiter, Sepp and Schmidhuber, J. Long Short-Term Memory , journal =

[10] [10]

Proceedings of NAACL-HLT , year =

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , title =. Proceedings of NAACL-HLT , year =

[11] [11]

Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya , title =

[12] [12]

Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya , title =

[13] [13]

Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others , title =. Advances in Neural Information Processing Systems , volume =

[14] [14]

, title =

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , title =. Journal of Machine Learning Research , volume =

[15] [15]

Scaling Laws for Neural Language Models

Kaplan, Jared and McCandlish, Sam and Henighan, Tom and Brown, Tom B. and Chess, Benjamin and Child, Rewon and Gray, Scott and Radford, Alec and Wu, Jeffrey and Amodei, Dario , title =. arXiv preprint arXiv:2001.08361 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2001

[16] [16]

Advances in Neural Information Processing Systems , year =

Hoffmann, Jordan and Borgeaud, Sebastian and Mensch, Arthur and Buchatskaya, Elena and Cai, Trevor and Rutherford, Eliza and de Las Casas, Diego and Hendricks, Lisa Anne and Welbl, Johannes and Clark, Aidan and others , title =. Advances in Neural Information Processing Systems , year =

[17] [17]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , title =. arXiv preprint arXiv:1907.11692 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1907

[18] [18]

International Conference on Learning Representations , year =

He, Pengcheng and Liu, Xiaodong and Gao, Jianfeng and Chen, Weizhu , title =. International Conference on Learning Representations , year =

[19] [19]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , title =. arXiv preprint arXiv:2307.09288 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

The Llama 3 Herd of Models

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , title =. arXiv preprint arXiv:2407.21783 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and others , title =. arXiv preprint arXiv:2310.06825 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

DeepSeek-V3 Technical Report

DeepSeek-AI , title =. arXiv preprint arXiv:2412.19437 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

Proceedings of ACL , year =

Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke , title =. Proceedings of ACL , year =

[24] [24]

and Salakhutdinov, Ruslan , title =

Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc V. and Salakhutdinov, Ruslan , title =. Proceedings of ACL , year =

[25] [25]

Longformer: The Long-Document Transformer

Beltagy, Iz and Peters, Matthew E. and Cohan, Arman , title =. arXiv preprint arXiv:2004.05150 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2004

[26] [26]

Advances in Neural Information Processing Systems , year =

Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and Ahmed, Amr , title =. Advances in Neural Information Processing Systems , year =

[27] [27]

Rethinking Attention with Performers

Choromanski, Krzysztof and Likhosherstov, Valerii and Dohan, David and Song, Xingyou and Gane, Andreea and Sarlos, Tamas and Hawkins, Peter and Davis, Jared and Mohiuddin, Afroz and Kaiser, Lukasz and others , title =. arXiv preprint arXiv:2009.14794 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2009

[28] [28]

arXiv preprint arXiv:2307.02486 , year =

Ding, Jiayu and Ma, Shuming and Dong, Li and Zhang, Xingxing and Huang, Shaohan and Wang, Wenhui and Zheng, Nanning and Wei, Furu , title =. arXiv preprint arXiv:2307.02486 , year =

work page arXiv

[29] [29]

and Ermon, Stefano and Rudra, Atri and R

Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R. Advances in Neural Information Processing Systems , year =

[30] [30]

, title =

Yang, Zhilin and Dai, Zihang and Yang, Yiming and Carbonell, Jaime and Salakhutdinov, Ruslan and Le, Quoc V. , title =. Advances in Neural Information Processing Systems , year =

[31] [31]

and Manning, Christopher D

Clark, Kevin and Luong, Minh-Thang and Le, Quoc V. and Manning, Christopher D. , title =. International Conference on Learning Representations , year =

[32] [32]

and Hinton, Geoffrey and Dean, Jeff , title =

Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc V. and Hinton, Geoffrey and Dean, Jeff , title =. International Conference on Learning Representations , year =

[33] [33]

Proximal Policy Optimization Algorithms

Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , title =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

International Conference on Machine Learning , year =

Gao, Leo and Schulman, John and Hilton, Jacob , title =. International Conference on Machine Learning , year =

[35] [35]

KTO: Model Alignment as Prospect Theoretic Optimization

Ethayarajh, Kawin and Xu, Winnie and Muennighoff, Niklas and Jurafsky, Dan and Kiela, Douwe , title =. arXiv preprint arXiv:2402.01306 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

Constitutional AI: Harmlessness from AI Feedback

Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others , title =. arXiv preprint arXiv:2212.08073 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[37] [37]

Conover, Mike and Hayes, Matt and Mathur, Ankit and Xie, Jianwei and Wan, Jun and Shah, Sam and Ghodsi, Ali and Wendell, Patrick and Zaharia, Matei and Xin, Reynold , title =

[38] [38]

, title =

Taori, Rohan and Gulrajani, Ishaan and Zhang, Tianyi and Dubois, Yann and Li, Xuechen and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B. , title =

[39] [39]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. International Conference on Learning Representations , year =

[40] [40]

Advances in Neural Information Processing Systems , year =

Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , title =. Advances in Neural Information Processing Systems , year =

[41] [41]

npj Digital Medicine , volume =

Rasmy, Laila and Xiang, Yang and Xie, Ziqian and Tao, Cui and Zhi, Degui , title =. npj Digital Medicine , volume =

[42] [42]

Bioinformatics , volume =

Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo , title =. Bioinformatics , volume =

[43] [43]

Proceedings of the Clinical Natural Language Processing Workshop , year =

Alsentzer, Emily and Murphy, John and Boag, William and Weng, Wei-Hung and Jindi, Di and Naumann, Tristan and McDermott, Matthew , title =. Proceedings of the Clinical Natural Language Processing Workshop , year =

[44] [44]

and Kainz, Bernhard , title =

Hou, Benjamin and Kaissis, Georgios and Summers, Ronald M. and Kainz, Bernhard , title =. arXiv preprint arXiv:2107.02104 , year =

work page arXiv

[45] [45]

and Kim, Jennifer L

Zakka, Cyril and Shad, Rohan and Chaurasia, Akash and Dalal, Alex R. and Kim, Jennifer L. and Moor, Michael and Fong, Robyn and Phillips, Curran and Alexander, Kevin and Ashley, Euan and others , title =. NEJM AI , year =

[46] [46]

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

Araci, Dogu , title =. arXiv preprint arXiv:1908.10063 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1908

[47] [47]

Proceedings of the AAAI Conference on Artificial Intelligence , year =

Yang, Linyi and Li, Jiazheng and Dong, Ruihai and Zhang, Yue and Smyth, Barry , title =. Proceedings of the AAAI Conference on Artificial Intelligence , year =

[48] [48]

Findings of EMNLP , year =

Chalkidis, Ilias and Fergadiotis, Manos and Malakasiotis, Prodromos and Aletras, Nikolaos and Androutsopoulos, Ion , title =. Findings of EMNLP , year =

[49] [49]

arXiv preprint arXiv:2010.12871 , year =

Shaheen, Zein and Wohlgenannt, Gerhard and Filtz, Erwin , title =. arXiv preprint arXiv:2010.12871 , year =

work page arXiv 2010

[50] [50]

, title =

Dahl, Matthew and Magesh, Varun and Suzgun, Mirac and Ho, Daniel E. , title =. Journal of Legal Analysis , volume =

[51] [51]

and Malhotra, Akanksha and Jafari, Amir , title =

Ormerod, Christopher M. and Malhotra, Akanksha and Jafari, Amir , title =. arXiv preprint arXiv:2102.13136 , year =

work page arXiv

[52] [52]

arXiv preprint arXiv:2206.04187 , year =

Kulshreshtha, Devang and Shayan, Muhammad and Belfer, Robert and Reddy, Siva and Serban, Iulian Vlad and Kochmar, Ekaterina , title =. arXiv preprint arXiv:2206.04187 , year =

work page arXiv

[53] [53]

Advances in Neural Information Processing Systems , year =

Schick, Timo and Dwivedi-Yu, Jane and Dess. Advances in Neural Information Processing Systems , year =

[54] [54]

SSRN Electronic Journal , year =

Marco, Guillermo and Gonzalo, Julio and Rello, Luz , title =. SSRN Electronic Journal , year =

[55] [55]

Journal of Artificial Intelligence Research , volume=

Generating extractive summaries of scientific paradigms , author=. Journal of Artificial Intelligence Research , volume=

[56] [56]

Evaluating Large Language Models Trained on Code

Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde de Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and others , title =. arXiv preprint arXiv:2107.03374 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[57] [57]

Proceedings of the ACM Workshop on Artificial Intelligence and Security , year =

Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , title =. Proceedings of the ACM Workshop on Artificial Intelligence and Security , year =

[58] [58]

Energy and Policy Considerations for Deep Learning in NLP

Strubell, Emma and Ganesh, Ananya and McCallum, Andrew , title =. arXiv preprint arXiv:1906.02243 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1906

[59] [59]

Carbon Emissions and Large Neural Network Training

Patterson, David and Gonzalez, Joseph and Le, Quoc and Liang, Chen and Munguia, Lluis-Miquel and Rothchild, Daniel and So, David and Texier, Maud and Dean, Jeff , title =. arXiv preprint arXiv:2104.10350 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[60] [60]

Advances in Neural Information Processing Systems , year =

Dettmers, Tim and Lewis, Mike and Belkada, Younes and Zettlemoyer, Luke , title =. Advances in Neural Information Processing Systems , year =

[61] [61]

International Conference on Machine Learning , year =

Leviathan, Yaniv and Kalman, Matan and Matias, Yossi , title =. International Conference on Machine Learning , year =

[62] [62]

Perrigo, Billy , title =

[63] [63]

ACM Computing Surveys , volume =

Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Ye Jin and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =

[64] [64]

Advances in Neural Information Processing Systems , year =

Wei, Alexander and Haghtalab, Nika and Steinhardt, Jacob , title =. Advances in Neural Information Processing Systems , year =

[65] [65]

and Cheng, Newton and Durmus, Esin and Hatfield-Dodds, Zac and Johnston, Scott R

Sharma, Mrinank and Tong, Meg and Korbak, Tomasz and Duvenaud, David and Askell, Amanda and Bowman, Samuel R. and Cheng, Newton and Durmus, Esin and Hatfield-Dodds, Zac and Johnston, Scott R. and others , title =. International Conference on Learning Representations , year =

[66] [66]

Regulation (EU) 2024/1689 of the

2024

[67] [67]

and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =

Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency , pages =

[68] [68]

Language (Technology) is Power: A Critical Survey of ``Bias'' in

Blodgett, Su Lin and Barocas, Solon and Daum. Language (Technology) is Power: A Critical Survey of ``Bias'' in. Proceedings of ACL , year =

[69] [69]

, title =

Oren, Yonatan and Meister, Nicole and Chatterji, Niladri and Ladhak, Faisal and Hashimoto, Tatsunori B. , title =. International Conference on Learning Representations , year =

[70] [70]

and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy , title =

Liu, Nelson F. and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy , title =. Transactions of the Association for Computational Linguistics , volume =

[71] [71]

Transactions on Machine Learning Research , year =

Liang, Percy and Bommasani, Rishi and Lee, Tony and Tsipras, Dimitris and Soylu, Dilara and Yasunaga, Michihiro and Zhang, Yian and Narayanan, Deepak and Wu, Yuhuai and Kumar, Ananya and others , title =. Transactions on Machine Learning Research , year =

[72] [72]

and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri

Srivastava, Aarohi and Rastogi, Abhinav and Rao, Abhishek and Shoeb, Abu Awal Md and Abid, Abubakar and Fisch, Adam and Brown, Adam R. and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri. Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models , journal =

[73] [73]

Olah, Chris and Cammarata, Nick and Schubert, Ludwig and Goh, Gabriel and Petrov, Michael and Carter, Shan , title =

[74] [74]

Transformer Circuits Thread , year =

Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and others , title =. Transformer Circuits Thread , year =