Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment
Pith reviewed 2026-06-26 00:05 UTC · model grok-4.3
The pith
Transformer models show distinct trade-offs in energy use, parameters and domain fit across architectures
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Transformer-based language models are organised into a working taxonomy covering encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, and generator-discriminator variants. Post-2023 developments that changed practice include instruction tuning, reinforcement learning from human feedback, direct preference optimisation, mixture-of-experts scaling, retrieval augmentation and the flagship families from major providers. Deployments in healthcare, finance, legal, education, customer service, creative writing and scientific work are surveyed and linked to the specific capabilities that make each transformer the appropriate tool. The critical assessment compares architectur
What carries the argument
A taxonomy of transformer families together with four deployment-comparison axes that support quantification of the parameter-energy trade-off
If this is right
- Different transformer variants match different domains through their specific capabilities such as long-context handling or preference optimisation.
- Higher parameter counts correspond to higher energy costs, directly affecting which models are viable for deployment.
- Alignment methods including reinforcement learning from human feedback and direct preference optimisation shift the criteria used to judge state-of-the-art performance.
- Data provenance and benchmark saturation must be factored into any claim that a model is state of the art.
Where Pith is reading between the lines
- The four-axis comparison framework could be applied by organisations to create internal model-selection checklists.
- The quantified energy trade-off may prompt hardware vendors to prioritise efficiency metrics in future accelerator designs.
- The research questions listed in the final section point to open problems around long-term stability of aligned models in vertical applications.
Load-bearing premise
The survey's selection of literature and models is representative and sufficient to support fair comparisons on the four deployment axes and the energy-parameter quantification.
What would settle it
A new energy-consumption study across the surveyed models that finds no consistent relationship between parameter count and energy cost.
read the original abstract
Transformer-based language models have become the default substrate for natural language processing and the pace of new releases has made it hard for practitioners to separate durable ideas from the noise of incremental announcements. This review works at two levels. At the level of mechanism, we organise the main transformer families into a working taxonomy, covering encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, and generator-discriminator variants. We then extend the discussion to post-2023 developments that changed the picture in practice: instruction tuning, reinforcement learning from human feedback, direct preference optimisation, mixture-of-experts scaling, retrieval augmentation and the current flagship model families from OpenAI, Anthropic, Google, Meta, Mistral and DeepSeek. At the level of use, we survey deployments across healthcare, finance, legal, education, customer service, creative writing and scientific work. Based on this we link each to the specific capabilities that make a transformer the appropriate tool. The contribution of this paper is a critical assessment that is based on the survey. We compare architectures on four axes that matter to deployment decisions, we quantify the trade-off between parameter count and energy cost. We also discuss how alignment methods, data provenance and benchmark saturation change what it means to call a model "state of the art". The final section lists the research questions that we think deserve more attention.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript surveys transformer architectures (encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, generator-discriminator), post-2023 developments (instruction tuning, RLHF, DPO, MoE, RAG, flagship families), and domain deployments (healthcare, finance, legal, education, customer service, creative writing, scientific work). It claims a critical assessment that compares architectures on four deployment-relevant axes, quantifies the parameter-count versus energy-cost trade-off, and discusses how alignment methods, data provenance, and benchmark saturation redefine 'state of the art'.
Significance. If the underlying literature sample is representative, the structured taxonomy and explicit linkage of architectural choices to deployment axes plus the energy-parameter quantification would provide practitioners with a practical decision framework amid rapid model releases. The discussion of evolving SOTA criteria is a timely addition to the survey literature.
major comments (1)
- [Abstract] Abstract (contribution paragraph): the central claims rest on architecture comparisons across four deployment axes and a quantification of the parameter-energy trade-off, yet no search protocol, inclusion criteria, or exhaustive model list is supplied; without these the representativeness of the selected post-2023 literature and flagship families cannot be verified, directly undermining the reliability of the reported comparisons and quantification.
minor comments (1)
- [Abstract] The four deployment axes are referenced but never enumerated in the abstract; listing them explicitly would improve readability of the contribution statement.
Simulated Author's Rebuttal
We thank the referee for their detailed review and for highlighting the need for greater methodological transparency in our survey. We address the single major comment below and commit to revisions that strengthen the paper without altering its core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract (contribution paragraph): the central claims rest on architecture comparisons across four deployment axes and a quantification of the parameter-energy trade-off, yet no search protocol, inclusion criteria, or exhaustive model list is supplied; without these the representativeness of the selected post-2023 literature and flagship families cannot be verified, directly undermining the reliability of the reported comparisons and quantification.
Authors: We agree that the absence of an explicit search protocol and inclusion criteria limits the ability to assess representativeness. In the revised manuscript we will add a new subsection (Section 2.1, Literature Selection and Scope) that details: (1) search strategy (keywords such as 'transformer architecture 2023+', 'RLHF', 'MoE scaling', 'domain-specific LLM deployment' across arXiv, ACL, NeurIPS, and Google Scholar); (2) inclusion criteria (peer-reviewed or high-impact arXiv preprints from 2023 onward that report empirical results on the four deployment axes or energy metrics, plus all flagship families with public parameter counts); (3) exclusion criteria (purely theoretical works without deployment discussion, non-English papers, and incremental fine-tuning studies without architectural novelty); and (4) an appendix table enumerating every model and paper used for the architecture taxonomy, axis comparisons, and parameter-energy quantification. The energy trade-off numbers are drawn directly from the cited primary sources (e.g., reported training or inference FLOPs converted via standard carbon-intensity factors); the revision will make this sourcing explicit so readers can replicate or extend the quantification. revision: yes
Circularity Check
No circularity: survey paper with no derivations or self-referential predictions
full rationale
This is a literature survey that organizes existing transformer families and deployments from external sources, then offers a critical assessment based on those sources. The abstract and contribution statement describe taxonomy construction, deployment surveys, and comparisons on deployment axes plus a parameter-energy trade-off quantification, all drawn from cited literature rather than any internal equations, fitted parameters, or self-citations that reduce the outputs to the paper's own inputs by construction. No load-bearing steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
and Kaiser, Lukasz and Polosukhin, Illia , title =
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , title =. Advances in Neural Information Processing Systems , year =
-
[2]
A Survey of Large Language Models
Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and others , title =. arXiv preprint arXiv:2303.18223 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Large Language Models: A Survey
Minaee, Shervin and Mikolov, Tomas and Nikzad, Narjes and Chenaghlu, Meysam and Socher, Richard and Amatriain, Xavier and Gao, Jianfeng , title =. arXiv preprint arXiv:2402.06196 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , title =
Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , title =. Advances in Neural Information Processing Systems , volume =
-
[5]
and Finn, Chelsea , title =
Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Ermon, Stefano and Manning, Christopher D. and Finn, Chelsea , title =. Advances in Neural Information Processing Systems , year =
-
[6]
Journal of Machine Learning Research , volume =
Fedus, William and Zoph, Barret and Shazeer, Noam , title =. Journal of Machine Learning Research , volume =
-
[7]
Jiang, Albert Q. and Sablayrolles, Alexandre and Roux, Antoine and Mensch, Arthur and Savary, Blanche and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Hanna, Emma Bou and Bressand, Florian and others , title =. arXiv preprint arXiv:2401.04088 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , journal =
Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , journal =
-
[9]
Long Short-Term Memory , journal =
Hochreiter, Sepp and Schmidhuber, J. Long Short-Term Memory , journal =
-
[10]
Proceedings of NAACL-HLT , year =
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , title =. Proceedings of NAACL-HLT , year =
-
[11]
Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya , title =
-
[12]
Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya , title =
-
[13]
Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others , title =. Advances in Neural Information Processing Systems , volume =
-
[14]
, title =
Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , title =. Journal of Machine Learning Research , volume =
-
[15]
Scaling Laws for Neural Language Models
Kaplan, Jared and McCandlish, Sam and Henighan, Tom and Brown, Tom B. and Chess, Benjamin and Child, Rewon and Gray, Scott and Radford, Alec and Wu, Jeffrey and Amodei, Dario , title =. arXiv preprint arXiv:2001.08361 , year =
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[16]
Advances in Neural Information Processing Systems , year =
Hoffmann, Jordan and Borgeaud, Sebastian and Mensch, Arthur and Buchatskaya, Elena and Cai, Trevor and Rutherford, Eliza and de Las Casas, Diego and Hendricks, Lisa Anne and Welbl, Johannes and Clark, Aidan and others , title =. Advances in Neural Information Processing Systems , year =
-
[17]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , title =. arXiv preprint arXiv:1907.11692 , year =
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[18]
International Conference on Learning Representations , year =
He, Pengcheng and Liu, Xiaodong and Gao, Jianfeng and Chen, Weizhu , title =. International Conference on Learning Representations , year =
-
[19]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others , title =. arXiv preprint arXiv:2307.09288 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , title =. arXiv preprint arXiv:2407.21783 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and de las Casas, Diego and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and others , title =. arXiv preprint arXiv:2310.06825 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
DeepSeek-AI , title =. arXiv preprint arXiv:2412.19437 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Proceedings of ACL , year =
Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke , title =. Proceedings of ACL , year =
-
[24]
and Salakhutdinov, Ruslan , title =
Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc V. and Salakhutdinov, Ruslan , title =. Proceedings of ACL , year =
-
[25]
Longformer: The Long-Document Transformer
Beltagy, Iz and Peters, Matthew E. and Cohan, Arman , title =. arXiv preprint arXiv:2004.05150 , year =
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[26]
Advances in Neural Information Processing Systems , year =
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and Ahmed, Amr , title =. Advances in Neural Information Processing Systems , year =
-
[27]
Rethinking Attention with Performers
Choromanski, Krzysztof and Likhosherstov, Valerii and Dohan, David and Song, Xingyou and Gane, Andreea and Sarlos, Tamas and Hawkins, Peter and Davis, Jared and Mohiuddin, Afroz and Kaiser, Lukasz and others , title =. arXiv preprint arXiv:2009.14794 , year =
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[28]
arXiv preprint arXiv:2307.02486 , year =
Ding, Jiayu and Ma, Shuming and Dong, Li and Zhang, Xingxing and Huang, Shaohan and Wang, Wenhui and Zheng, Nanning and Wei, Furu , title =. arXiv preprint arXiv:2307.02486 , year =
-
[29]
and Ermon, Stefano and Rudra, Atri and R
Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R. Advances in Neural Information Processing Systems , year =
-
[30]
, title =
Yang, Zhilin and Dai, Zihang and Yang, Yiming and Carbonell, Jaime and Salakhutdinov, Ruslan and Le, Quoc V. , title =. Advances in Neural Information Processing Systems , year =
-
[31]
and Manning, Christopher D
Clark, Kevin and Luong, Minh-Thang and Le, Quoc V. and Manning, Christopher D. , title =. International Conference on Learning Representations , year =
-
[32]
and Hinton, Geoffrey and Dean, Jeff , title =
Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc V. and Hinton, Geoffrey and Dean, Jeff , title =. International Conference on Learning Representations , year =
-
[33]
Proximal Policy Optimization Algorithms
Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , title =. arXiv preprint arXiv:1707.06347 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
International Conference on Machine Learning , year =
Gao, Leo and Schulman, John and Hilton, Jacob , title =. International Conference on Machine Learning , year =
-
[35]
KTO: Model Alignment as Prospect Theoretic Optimization
Ethayarajh, Kawin and Xu, Winnie and Muennighoff, Niklas and Jurafsky, Dan and Kiela, Douwe , title =. arXiv preprint arXiv:2402.01306 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
Constitutional AI: Harmlessness from AI Feedback
Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others , title =. arXiv preprint arXiv:2212.08073 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[37]
Conover, Mike and Hayes, Matt and Mathur, Ankit and Xie, Jianwei and Wan, Jun and Shah, Sam and Ghodsi, Ali and Wendell, Patrick and Zaharia, Matei and Xin, Reynold , title =
-
[38]
, title =
Taori, Rohan and Gulrajani, Ishaan and Zhang, Tianyi and Dubois, Yann and Li, Xuechen and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B. , title =
-
[39]
and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =
Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. International Conference on Learning Representations , year =
-
[40]
Advances in Neural Information Processing Systems , year =
Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , title =. Advances in Neural Information Processing Systems , year =
-
[41]
npj Digital Medicine , volume =
Rasmy, Laila and Xiang, Yang and Xie, Ziqian and Tao, Cui and Zhi, Degui , title =. npj Digital Medicine , volume =
-
[42]
Bioinformatics , volume =
Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo , title =. Bioinformatics , volume =
-
[43]
Proceedings of the Clinical Natural Language Processing Workshop , year =
Alsentzer, Emily and Murphy, John and Boag, William and Weng, Wei-Hung and Jindi, Di and Naumann, Tristan and McDermott, Matthew , title =. Proceedings of the Clinical Natural Language Processing Workshop , year =
-
[44]
Hou, Benjamin and Kaissis, Georgios and Summers, Ronald M. and Kainz, Bernhard , title =. arXiv preprint arXiv:2107.02104 , year =
-
[45]
and Kim, Jennifer L
Zakka, Cyril and Shad, Rohan and Chaurasia, Akash and Dalal, Alex R. and Kim, Jennifer L. and Moor, Michael and Fong, Robyn and Phillips, Curran and Alexander, Kevin and Ashley, Euan and others , title =. NEJM AI , year =
-
[46]
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models
Araci, Dogu , title =. arXiv preprint arXiv:1908.10063 , year =
work page internal anchor Pith review Pith/arXiv arXiv 1908
-
[47]
Proceedings of the AAAI Conference on Artificial Intelligence , year =
Yang, Linyi and Li, Jiazheng and Dong, Ruihai and Zhang, Yue and Smyth, Barry , title =. Proceedings of the AAAI Conference on Artificial Intelligence , year =
-
[48]
Findings of EMNLP , year =
Chalkidis, Ilias and Fergadiotis, Manos and Malakasiotis, Prodromos and Aletras, Nikolaos and Androutsopoulos, Ion , title =. Findings of EMNLP , year =
-
[49]
arXiv preprint arXiv:2010.12871 , year =
Shaheen, Zein and Wohlgenannt, Gerhard and Filtz, Erwin , title =. arXiv preprint arXiv:2010.12871 , year =
-
[50]
, title =
Dahl, Matthew and Magesh, Varun and Suzgun, Mirac and Ho, Daniel E. , title =. Journal of Legal Analysis , volume =
-
[51]
and Malhotra, Akanksha and Jafari, Amir , title =
Ormerod, Christopher M. and Malhotra, Akanksha and Jafari, Amir , title =. arXiv preprint arXiv:2102.13136 , year =
-
[52]
arXiv preprint arXiv:2206.04187 , year =
Kulshreshtha, Devang and Shayan, Muhammad and Belfer, Robert and Reddy, Siva and Serban, Iulian Vlad and Kochmar, Ekaterina , title =. arXiv preprint arXiv:2206.04187 , year =
-
[53]
Advances in Neural Information Processing Systems , year =
Schick, Timo and Dwivedi-Yu, Jane and Dess. Advances in Neural Information Processing Systems , year =
-
[54]
SSRN Electronic Journal , year =
Marco, Guillermo and Gonzalo, Julio and Rello, Luz , title =. SSRN Electronic Journal , year =
-
[55]
Journal of Artificial Intelligence Research , volume=
Generating extractive summaries of scientific paradigms , author=. Journal of Artificial Intelligence Research , volume=
-
[56]
Evaluating Large Language Models Trained on Code
Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde de Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and others , title =. arXiv preprint arXiv:2107.03374 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
Proceedings of the ACM Workshop on Artificial Intelligence and Security , year =
Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , title =. Proceedings of the ACM Workshop on Artificial Intelligence and Security , year =
-
[58]
Energy and Policy Considerations for Deep Learning in NLP
Strubell, Emma and Ganesh, Ananya and McCallum, Andrew , title =. arXiv preprint arXiv:1906.02243 , year =
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[59]
Carbon Emissions and Large Neural Network Training
Patterson, David and Gonzalez, Joseph and Le, Quoc and Liang, Chen and Munguia, Lluis-Miquel and Rothchild, Daniel and So, David and Texier, Maud and Dean, Jeff , title =. arXiv preprint arXiv:2104.10350 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[60]
Advances in Neural Information Processing Systems , year =
Dettmers, Tim and Lewis, Mike and Belkada, Younes and Zettlemoyer, Luke , title =. Advances in Neural Information Processing Systems , year =
-
[61]
International Conference on Machine Learning , year =
Leviathan, Yaniv and Kalman, Matan and Matias, Yossi , title =. International Conference on Machine Learning , year =
-
[62]
Perrigo, Billy , title =
-
[63]
ACM Computing Surveys , volume =
Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Ye Jin and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =
-
[64]
Advances in Neural Information Processing Systems , year =
Wei, Alexander and Haghtalab, Nika and Steinhardt, Jacob , title =. Advances in Neural Information Processing Systems , year =
-
[65]
and Cheng, Newton and Durmus, Esin and Hatfield-Dodds, Zac and Johnston, Scott R
Sharma, Mrinank and Tong, Meg and Korbak, Tomasz and Duvenaud, David and Askell, Amanda and Bowman, Samuel R. and Cheng, Newton and Durmus, Esin and Hatfield-Dodds, Zac and Johnston, Scott R. and others , title =. International Conference on Learning Representations , year =
-
[66]
Regulation (EU) 2024/1689 of the
2024
-
[67]
and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =
Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency , pages =
-
[68]
Language (Technology) is Power: A Critical Survey of ``Bias'' in
Blodgett, Su Lin and Barocas, Solon and Daum. Language (Technology) is Power: A Critical Survey of ``Bias'' in. Proceedings of ACL , year =
-
[69]
, title =
Oren, Yonatan and Meister, Nicole and Chatterji, Niladri and Ladhak, Faisal and Hashimoto, Tatsunori B. , title =. International Conference on Learning Representations , year =
-
[70]
and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy , title =
Liu, Nelson F. and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy , title =. Transactions of the Association for Computational Linguistics , volume =
-
[71]
Transactions on Machine Learning Research , year =
Liang, Percy and Bommasani, Rishi and Lee, Tony and Tsipras, Dimitris and Soylu, Dilara and Yasunaga, Michihiro and Zhang, Yian and Narayanan, Deepak and Wu, Yuhuai and Kumar, Ananya and others , title =. Transactions on Machine Learning Research , year =
-
[72]
and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri
Srivastava, Aarohi and Rastogi, Abhinav and Rao, Abhishek and Shoeb, Abu Awal Md and Abid, Abubakar and Fisch, Adam and Brown, Adam R. and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri. Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models , journal =
-
[73]
Olah, Chris and Cammarata, Nick and Schubert, Ludwig and Goh, Gabriel and Petrov, Michael and Carter, Shan , title =
-
[74]
Transformer Circuits Thread , year =
Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and others , title =. Transformer Circuits Thread , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.