{"work":{"id":"6ea3375b-837c-4640-a175-be7525aa3c6d","openalex_id":null,"doi":null,"arxiv_id":"2206.07682","raw_key":null,"title":"Emergent Abilities of Large Language Models","authors":null,"authors_text":"Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud","year":2022,"venue":"cs.CL","abstract":"Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.","external_url":"https://arxiv.org/abs/2206.07682","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T13:03:26.804054+00:00","pith_arxiv_id":"2206.07682","created_at":"2026-05-08T18:44:01.706654+00:00","updated_at":"2026-06-29T13:03:26.804054+00:00","title_quality_ok":true,"display_title":"Emergent Abilities of Large Language Models","render_title":"Emergent Abilities of Large Language Models"},"hub":{"state":{"work_id":"6ea3375b-837c-4640-a175-be7525aa3c6d","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":121,"external_cited_by_count":null,"distinct_field_count":18,"first_pith_cited_at":"2022-08-05T17:39:22+00:00","last_pith_cited_at":"2026-06-24T21:26:43+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T13:28:50.652118+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":33},{"context_role":"baseline","n":2}],"polarity_counts":[{"context_polarity":"background","n":30},{"context_polarity":"support","n":3},{"context_polarity":"baseline","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Emergent Abilities of Large Language Models","claims":[{"claim_text":"Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language model","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"best-performing model GPT-4 achieves an average accuracy rate of 94% over four awareness datasets. Other LLMs exhibit decent but not substantial awareness. 11 TRUST LLM 3 Background 3.1 Large Language Models (LLMs) A language model (LM) aims to predict the probability distribution over a sequence of tokens. Scaling the model size and data size, large language models (LLMs) have shown \"emergent abilities\" [ 87, 88, 89] in solving a series of complex tasks that cannot be dealt with by regular-size","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"[1] Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng- Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022. [2] Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. Wordcraft: story writing with large language models. In 27th International Conference on Intelligent User Interfaces, pages 841-852, 2022. [3] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barr","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"over different tokens, and then draw samples from it with different sampling techniques, e.g. greedy sampling [25], nucleus sampling [26], and beam search [27] etc. A large language model (LLM) is an LM with a large size (in the magnitude of tens of millions to billions of model parameters) and size of training data [4]. Researchers have shown that LLMs show \"emergent abilities\" [28, 29, 30] that are not seen in regular-sized LMs. The transformer model [31] is the key architecture behind the rec","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"have advanced the field significantly, these systems continue to face fundamental challenges: data sparsity leading to poor generalization and limited ability to capture user semantics beyond interaction patterns [15,16]. The rise of Large Language Models (LLMs) offers promising opportunities to enhance recommendation systems through their sophisticated semantic under- standing capabilities [1,20,36,39]. This has led to methods ranging from zero-shot prompting [1,7,8,20,36,39] and feature augmen","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021. [2] Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35 (2):1-72, 2026. [3] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zho","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"9 (2025), pp. 783-786.doi:10.1016/j.tics.2025.07.004. url:https://doi.org/10.1016/j.tics.2025.07.004. [63] Miles Turpin et al. \"Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting\". In:arXiv(2023). NeurIPS 2023.doi: 10.48550/arXiv.2305.04388. arXiv:2305.04388 [cs.CL].url:https://arxiv. org/abs/2305.04388. [64] Jason Wei et al. \"Emergent abilities of large language models\". In:arXiv preprint arXiv:2206.07682(2022). [65] Daniel M. Wolpert, R. Chr","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Emergent Abilities of Large Language Models because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (31 contexts).","role_counts":[{"n":31,"context_role":"background"},{"n":2,"context_role":"baseline"}]},"error":null,"updated_at":"2026-05-22T14:23:47.056010+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"cb0687c7-02a4-474c-a3bd-ba5a6e0d00f9","orcid":null,"display_name":"Jason Wei"},{"id":"7627c811-12ca-4f5a-948a-1b948aa8f19a","orcid":null,"display_name":"Yi Tay"},{"id":"e10a60aa-4833-4871-b90d-03d4ba0e0a5b","orcid":null,"display_name":"Rishi Bommasani"},{"id":"56e1d708-c3a1-403a-8a9f-9a6e58a7ddc0","orcid":null,"display_name":"Colin Raffel"},{"id":"5eed455e-8225-4eb8-8ab3-3b1ced378944","orcid":null,"display_name":"Barret Zoph"},{"id":"85fb8136-11ae-4eb2-9893-62a53fa26fb7","orcid":null,"display_name":"Sebastian Borgeaud"}]},"error":null,"updated_at":"2026-05-22T14:23:47.655238+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T13:11:06.679738+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":17},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":13},{"title":"On the Opportunities and Risks of Foundation Models","work_id":"a18039e9-928d-47c9-a836-32656a71bf71","shared_citers":13},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":12},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":11},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":9},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":8},{"title":"GLU Variants Improve Transformer","work_id":"17d0763c-1016-41ab-a478-478e890765eb","shared_citers":8},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":8},{"title":"PaLM: Scaling Language Modeling with Pathways","work_id":"a94f3ef7-2c49-4445-93fe-6ec16aafd966","shared_citers":8},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":7},{"title":"OpenAI o1 System Card","work_id":"68d3c334-0fc9-49e3-b7b0-a69afae933e2","shared_citers":7},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":7},{"title":"Scaling Instruction-Finetuned Language Models","work_id":"8405abb1-7558-4fdf-af24-f4c52fa77a06","shared_citers":7},{"title":"Sparks of Artificial General Intelligence: Early experiments with GPT-4","work_id":"a23cfe92-7f7c-424b-98d4-b386a83002fb","shared_citers":7},{"title":"Training language models to follow instructions with human feedback","work_id":"52aff42f-4fa9-4fcf-bdb3-1459b9bebf65","shared_citers":7},{"title":"Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models","work_id":"bb63abb3-0d50-4362-b97c-b5e725b03b39","shared_citers":6},{"title":"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model","work_id":"337ba690-f35d-4154-9450-8edf4bc9f488","shared_citers":6},{"title":"Holistic Evaluation of Language Models","work_id":"cc02a01e-7218-47dc-8e66-3333e7e4adec","shared_citers":6},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":6},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":6},{"title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","work_id":"c888e6d1-0b1d-43d6-9ef5-f0912a0efa1b","shared_citers":6},{"title":"OPT: Open Pre-trained Transformer Language Models","work_id":"d7ff3b21-1fff-4cf4-952a-4714e3ef2307","shared_citers":6},{"title":"Qwen Technical Report","work_id":"bb1fd52f-6b2f-437c-9516-37bdf6eb9be8","shared_citers":6}],"time_series":[{"n":1,"year":2022},{"n":6,"year":2023},{"n":3,"year":2024},{"n":3,"year":2025},{"n":39,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T13:10:51.053259+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T13:10:54.286775+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Emergent Abilities of Large Language Models","claims":[{"claim_text":"Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language model","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"best-performing model GPT-4 achieves an average accuracy rate of 94% over four awareness datasets. Other LLMs exhibit decent but not substantial awareness. 11 TRUST LLM 3 Background 3.1 Large Language Models (LLMs) A language model (LM) aims to predict the probability distribution over a sequence of tokens. Scaling the model size and data size, large language models (LLMs) have shown \"emergent abilities\" [ 87, 88, 89] in solving a series of complex tasks that cannot be dealt with by regular-size","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"[1] Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng- Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022. [2] Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. Wordcraft: story writing with large language models. In 27th International Conference on Intelligent User Interfaces, pages 841-852, 2022. [3] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barr","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"over different tokens, and then draw samples from it with different sampling techniques, e.g. greedy sampling [25], nucleus sampling [26], and beam search [27] etc. A large language model (LLM) is an LM with a large size (in the magnitude of tens of millions to billions of model parameters) and size of training data [4]. Researchers have shown that LLMs show \"emergent abilities\" [28, 29, 30] that are not seen in regular-sized LMs. The transformer model [31] is the key architecture behind the rec","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"have advanced the field significantly, these systems continue to face fundamental challenges: data sparsity leading to poor generalization and limited ability to capture user semantics beyond interaction patterns [15,16]. The rise of Large Language Models (LLMs) offers promising opportunities to enhance recommendation systems through their sophisticated semantic under- standing capabilities [1,20,36,39]. This has led to methods ranging from zero-shot prompting [1,7,8,20,36,39] and feature augmen","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021. [2] Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35 (2):1-72, 2026. [3] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zho","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"9 (2025), pp. 783-786.doi:10.1016/j.tics.2025.07.004. url:https://doi.org/10.1016/j.tics.2025.07.004. [63] Miles Turpin et al. \"Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting\". In:arXiv(2023). NeurIPS 2023.doi: 10.48550/arXiv.2305.04388. arXiv:2305.04388 [cs.CL].url:https://arxiv. org/abs/2305.04388. [64] Jason Wei et al. \"Emergent abilities of large language models\". In:arXiv preprint arXiv:2206.07682(2022). [65] Daniel M. Wolpert, R. Chr","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Emergent Abilities of Large Language Models because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (31 contexts).","role_counts":[{"n":31,"context_role":"background"},{"n":2,"context_role":"baseline"}]},"error":null,"updated_at":"2026-05-22T14:23:47.658214+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Emergent Abilities of Large Language Models","claims":[{"claim_text":"Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language model","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Emergent Abilities of Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T13:10:51.058329+00:00"}},"summary":{"title":"Emergent Abilities of Large Language Models","claims":[{"claim_text":"Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language model","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Emergent Abilities of Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":17},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":13},{"title":"On the Opportunities and Risks of Foundation Models","work_id":"a18039e9-928d-47c9-a836-32656a71bf71","shared_citers":13},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":12},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":11},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":9},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":8},{"title":"GLU Variants Improve Transformer","work_id":"17d0763c-1016-41ab-a478-478e890765eb","shared_citers":8},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":8},{"title":"PaLM: Scaling Language Modeling with Pathways","work_id":"a94f3ef7-2c49-4445-93fe-6ec16aafd966","shared_citers":8},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":7},{"title":"OpenAI o1 System Card","work_id":"68d3c334-0fc9-49e3-b7b0-a69afae933e2","shared_citers":7},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":7},{"title":"Scaling Instruction-Finetuned Language Models","work_id":"8405abb1-7558-4fdf-af24-f4c52fa77a06","shared_citers":7},{"title":"Sparks of Artificial General Intelligence: Early experiments with GPT-4","work_id":"a23cfe92-7f7c-424b-98d4-b386a83002fb","shared_citers":7},{"title":"Training language models to follow instructions with human feedback","work_id":"52aff42f-4fa9-4fcf-bdb3-1459b9bebf65","shared_citers":7},{"title":"Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models","work_id":"bb63abb3-0d50-4362-b97c-b5e725b03b39","shared_citers":6},{"title":"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model","work_id":"337ba690-f35d-4154-9450-8edf4bc9f488","shared_citers":6},{"title":"Holistic Evaluation of Language Models","work_id":"cc02a01e-7218-47dc-8e66-3333e7e4adec","shared_citers":6},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":6},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":6},{"title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","work_id":"c888e6d1-0b1d-43d6-9ef5-f0912a0efa1b","shared_citers":6},{"title":"OPT: Open Pre-trained Transformer Language Models","work_id":"d7ff3b21-1fff-4cf4-952a-4714e3ef2307","shared_citers":6},{"title":"Qwen Technical Report","work_id":"bb1fd52f-6b2f-437c-9516-37bdf6eb9be8","shared_citers":6}],"time_series":[{"n":1,"year":2022},{"n":6,"year":2023},{"n":3,"year":2024},{"n":3,"year":2025},{"n":39,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"5eed455e-8225-4eb8-8ab3-3b1ced378944","orcid":null,"display_name":"Barret Zoph","source":"manual","import_confidence":0.72},{"id":"56e1d708-c3a1-403a-8a9f-9a6e58a7ddc0","orcid":null,"display_name":"Colin Raffel","source":"manual","import_confidence":0.72},{"id":"cb0687c7-02a4-474c-a3bd-ba5a6e0d00f9","orcid":null,"display_name":"Jason Wei","source":"manual","import_confidence":0.72},{"id":"e10a60aa-4833-4871-b90d-03d4ba0e0a5b","orcid":null,"display_name":"Rishi Bommasani","source":"manual","import_confidence":0.72},{"id":"85fb8136-11ae-4eb2-9893-62a53fa26fb7","orcid":null,"display_name":"Sebastian Borgeaud","source":"manual","import_confidence":0.72},{"id":"7627c811-12ca-4f5a-948a-1b948aa8f19a","orcid":null,"display_name":"Yi Tay","source":"manual","import_confidence":0.72}]}}