{"work":{"id":"12f5a236-ef7a-4d13-b4de-b51465a6f977","openalex_id":null,"doi":null,"arxiv_id":null,"raw_key":"raw:6eb9d26f971f38b99cc4543f","title":"Advances in neural information processing systems , volume=","authors":null,"authors_text":"Language models are few-shot learners , author=","year":null,"venue":null,"abstract":null,"external_url":null,"cited_by_count":null,"metadata_source":"raw_reference","metadata_fetched_at":"2026-05-27T00:58:21.357686+00:00","pith_arxiv_id":null,"created_at":"2026-05-12T06:24:25.819553+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":false,"display_title":"Advances in neural information processing systems , volume=","render_title":"Advances in neural information processing systems , volume="},"hub":{"state":{"work_id":"12f5a236-ef7a-4d13-b4de-b51465a6f977","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":130,"external_cited_by_count":null,"distinct_field_count":16,"first_pith_cited_at":"2023-01-12T18:56:49+00:00","last_pith_cited_at":"2026-05-22T05:25:00+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-06T12:00:41.853299+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":11},{"context_role":"dataset","n":1},{"context_role":"method","n":1},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":9},{"context_polarity":"unclear","n":2},{"context_polarity":"support","n":1},{"context_polarity":"use_dataset","n":1},{"context_polarity":"use_method","n":1}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Advances in neural information processing systems , volume=","claims":[{"claim_text":"phenomenon might further improve the performance of the co-scientist as a tool for scientific discovery. Improved multimodal reasoning and tool-use capabilities.Some of the most interesting data in scientific publications is not written in text but may be encoded visually in figures and charts. However, even state-of-the-art frontier models may not comprehensively utilize such data with optimal reasoning [89] and the AI co-scientist system is unlikely to be an exception. Stronger benchmarks and ","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"activated experts Tl, the bottom- k least activated experts Bl, and the remaining normal experts. We establish a bijection σ:T l → B l pairing each high-frequency expert with a low-frequency counterpart. During the programming phase, each bottom-k expert's weights are overwritten with those of its paired top-kexpert: Wrep l,σ(i) ←W l,i,∀i∈ T l. (7) During inference, given router logitsz l = [z l,1, . . . ,zl,E ], the modified logits are: ˜zl,i =    zl,i/2 ifi∈ T l, 0 ifi∈ B l, zl,i otherwise.","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"If |Xt| ≤R almost surely, then for any η∈(0,1/R), with probability at least1−δ, TX t=1 Xt ≤η TX t=1 Et−1 \u0002 X2 t \u0003 + log(δ−1) η .(43) The following result is a consequence of Lemma 6. Lemma 7.Let (Xt)t≤T be a sequence of random variables adapted to a filtration (Ft)t≤T . If 0≤X t ≤Ralmost surely, then with probability at least1−δ, TX t=1 Xt ≤(1 +ε) TX t=1 Et−1[Xt] + R ε log(δ−1),(44) and also with probability at least1−δ, TX t=1 Et−1[Xt]≤(1 +ε) TX t=1 Xt + (1 +ε) 2R ε log(δ−1).(45) Proof of Lemma","claim_type":"method","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"Table 1: Comparison with existing Financial Benchmarks. \"Summarize\" refers to the context is summarized firm events related to price. \"Human bias\" refers to the context contains human opinions like tweets, analyst report etc. \"Large Sample\" refers to the number of unique firms in sample is large. opinions on LLM decision-making, we create two perturbations of each report: (1) removal of the first sentence that contains the analyst rating; and (2) retention of the first sentence but replacement o","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":",Λ(u) := logEe u⊤x1 (wherever finite). The identification of the ergodic limit depends only on the marginal law ofx1; rate statements depend additionally on the dependence structure. The following elementary lemma, used repeatedly to convertL2 errors into directional (cosine) errors, is proved in Appendix A.1. Lemma 3.1(Geometric stability).Let u̸= 0 and ∥e∥ ≤η∥u∥ with η∈[0,1) . Then u+e̸= 0 and cos(u+e, u)≥ p 1−η 2. 3.2 Ergodic alignment of the attention output We now identify the long-context ","claim_type":"background","confidence":0.7,"evidence_strength":"citation_context"},{"claim_text":"tributions of a pair of paths becomes large, further enlarging it may harm reasoning coherence. There- fore, the contrastive weight is dynamically adjusted based on the divergence between paths. Specifi- cally, the contrastive weight between paths (b, b′) is defined as βb,b′ =β max ·max 0,1− J SD(ˆpb∥ˆpb′ ) δlog 2 ! (6) where J SD(·∥·) denotes the Jensen-Shannon di- vergence, normalized to [0,1] by log 2, βmax denotes the initial value of the pairwise path- contrastive weight. Here, ˆpb and ˆpb′","claim_type":"background","confidence":0.7,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Advances in neural information processing systems , volume= because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (10 contexts).","role_counts":[{"n":10,"context_role":"background"},{"n":1,"context_role":"method"},{"n":1,"context_role":"other"}]},"error":null,"updated_at":"2026-05-22T20:23:56.466534+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"4b8187d1-cbb8-40dc-b8fb-985b6b161e73","orcid":null,"display_name":"Language models are few-shot learners"},{"id":"fe4d5dbf-e369-4296-8f93-544d5ed81b09","orcid":null,"display_name":"author="}]},"error":null,"updated_at":"2026-05-22T20:23:56.460836+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-15T05:57:44.683230+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Advances in neural information processing systems , volume=","work_id":"a1fd09f1-b62b-4aca-a5ef-dd2b50ad08b5","shared_citers":11},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":11},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":10},{"title":"OpenAI blog , volume=","work_id":"31dc92c3-2fc3-432b-8d63-b7ee13f53a9c","shared_citers":8},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":8},{"title":"Advances in neural information processing systems , volume=","work_id":"1265447d-0324-4d07-abba-34fa29d172da","shared_citers":7},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":7},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":7},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":6},{"title":"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback","work_id":"a1f2574b-a899-4713-be60-c87ba332656c","shared_citers":6},{"title":"Advances in neural information processing systems , volume=","work_id":"f5fceb4c-0b93-4786-b48f-a567818bb731","shared_citers":5},{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":5},{"title":"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model","work_id":"337ba690-f35d-4154-9450-8edf4bc9f488","shared_citers":5},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":5},{"title":"Emergent Abilities of Large Language Models","work_id":"6ea3375b-837c-4640-a175-be7525aa3c6d","shared_citers":5},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":5},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":5},{"title":"Language models are few-shot learners","work_id":"b5af3a68-2622-4421-b39b-b1d2fbde2d8d","shared_citers":5},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":5},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":5},{"title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach","work_id":"41fe12c4-e538-4890-a244-480650ed3078","shared_citers":5},{"title":"Advances in neural information processing systems , volume=","work_id":"c25e8154-fab2-455c-8a26-56e40aed5d2b","shared_citers":4},{"title":"Advances in Neural Information Processing Systems , volume=","work_id":"bd577a47-49ef-4f8a-81ca-86cd88b71479","shared_citers":4},{"title":"Advances in Neural Information Processing Systems , volume=","work_id":"a675d477-b40a-4944-b8a6-2733c94587e8","shared_citers":4}],"time_series":[{"n":6,"year":2023},{"n":1,"year":2024},{"n":1,"year":2025},{"n":29,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-15T05:57:38.098764+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-15T05:57:42.856259+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Advances in neural information processing systems , volume=","claims":[{"claim_text":"phenomenon might further improve the performance of the co-scientist as a tool for scientific discovery. Improved multimodal reasoning and tool-use capabilities.Some of the most interesting data in scientific publications is not written in text but may be encoded visually in figures and charts. However, even state-of-the-art frontier models may not comprehensively utilize such data with optimal reasoning [89] and the AI co-scientist system is unlikely to be an exception. Stronger benchmarks and ","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"activated experts Tl, the bottom- k least activated experts Bl, and the remaining normal experts. We establish a bijection σ:T l → B l pairing each high-frequency expert with a low-frequency counterpart. During the programming phase, each bottom-k expert's weights are overwritten with those of its paired top-kexpert: Wrep l,σ(i) ←W l,i,∀i∈ T l. (7) During inference, given router logitsz l = [z l,1, . . . ,zl,E ], the modified logits are: ˜zl,i =    zl,i/2 ifi∈ T l, 0 ifi∈ B l, zl,i otherwise.","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"If |Xt| ≤R almost surely, then for any η∈(0,1/R), with probability at least1−δ, TX t=1 Xt ≤η TX t=1 Et−1 \u0002 X2 t \u0003 + log(δ−1) η .(43) The following result is a consequence of Lemma 6. Lemma 7.Let (Xt)t≤T be a sequence of random variables adapted to a filtration (Ft)t≤T . If 0≤X t ≤Ralmost surely, then with probability at least1−δ, TX t=1 Xt ≤(1 +ε) TX t=1 Et−1[Xt] + R ε log(δ−1),(44) and also with probability at least1−δ, TX t=1 Et−1[Xt]≤(1 +ε) TX t=1 Xt + (1 +ε) 2R ε log(δ−1).(45) Proof of Lemma","claim_type":"method","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"Table 1: Comparison with existing Financial Benchmarks. \"Summarize\" refers to the context is summarized firm events related to price. \"Human bias\" refers to the context contains human opinions like tweets, analyst report etc. \"Large Sample\" refers to the number of unique firms in sample is large. opinions on LLM decision-making, we create two perturbations of each report: (1) removal of the first sentence that contains the analyst rating; and (2) retention of the first sentence but replacement o","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":",Λ(u) := logEe u⊤x1 (wherever finite). The identification of the ergodic limit depends only on the marginal law ofx1; rate statements depend additionally on the dependence structure. The following elementary lemma, used repeatedly to convertL2 errors into directional (cosine) errors, is proved in Appendix A.1. Lemma 3.1(Geometric stability).Let u̸= 0 and ∥e∥ ≤η∥u∥ with η∈[0,1) . Then u+e̸= 0 and cos(u+e, u)≥ p 1−η 2. 3.2 Ergodic alignment of the attention output We now identify the long-context ","claim_type":"background","confidence":0.7,"evidence_strength":"citation_context"},{"claim_text":"tributions of a pair of paths becomes large, further enlarging it may harm reasoning coherence. There- fore, the contrastive weight is dynamically adjusted based on the divergence between paths. Specifi- cally, the contrastive weight between paths (b, b′) is defined as βb,b′ =β max ·max 0,1− J SD(ˆpb∥ˆpb′ ) δlog 2 ! (6) where J SD(·∥·) denotes the Jensen-Shannon di- vergence, normalized to [0,1] by log 2, βmax denotes the initial value of the pairwise path- contrastive weight. Here, ˆpb and ˆpb′","claim_type":"background","confidence":0.7,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Advances in neural information processing systems , volume= because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (10 contexts).","role_counts":[{"n":10,"context_role":"background"},{"n":1,"context_role":"method"},{"n":1,"context_role":"other"}]},"error":null,"updated_at":"2026-05-22T20:23:56.470352+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Advances in neural information processing systems , volume=","claims":[{"claim_text":"phenomenon might further improve the performance of the co-scientist as a tool for scientific discovery. Improved multimodal reasoning and tool-use capabilities.Some of the most interesting data in scientific publications is not written in text but may be encoded visually in figures and charts. However, even state-of-the-art frontier models may not comprehensively utilize such data with optimal reasoning [89] and the AI co-scientist system is unlikely to be an exception. Stronger benchmarks and ","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Bernoulli random variables with α(q) success probability. According to the Hoeffding's inequal- ity Hoeffding (1963) we have P(|m−α(q)N| ≥t)≤2 exp \u0012 −2t2 N \u0013 ,(8) wheret≥0. It implies P(|m−α(q)N| ≤t)≥1−2 exp \u0012 −2t2 N \u0013 ⇔P(−t≤m−α(q)N≤t)≥1−2 exp \u0012 −2t2 N \u0013 ⇒P(−t≤m−α(q)N)≥1−2 exp \u0012 −2t2 N \u0013 . By settingt= q N 2 log 2 ϵ from anyϵ >0, one may check, P m≥α(q)N− r N 2 log 1 ϵ ! ≥1−ϵ.(9) In order to ensurem≥N/2, we need to have α(q)N− r N 2 log 1 ϵ ≥ 1 2 N ⇒α(q)≥ 1 2 + r 1 2N log 1 ϵ (10) B Discussion T","claim_type":"other","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"performance would not be significantly degraded if projBx were replaced with a nonzero constant value more representative of the training distribution. To correct for this, in our causality experiments we ablate directions by replacing them with theirmeanvalues computed across a dataset, instead of zeroing them out. Specifically, to ablate a directionu, we use the formula: x′ =x+P u(x−x)(21) whereP u is the projection matrix foruand xis the mean representation. E. Static interpretability analysi","claim_type":"background","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"an All-Reduce operation so that they can use the identical gradient to update the model parameters. The All-Reduce operation accumulates distributed gradients (say Xi at i worker) from all workers (say P workers) using a reduction operation (typically sum or mean in training), which can be formally represented X=AllReduce(X 1, X2, ..., XP ) = PX i=1 Xi.(5) The gradients have the same dimensionality as the model weights, which means additional memory is required to store them for communication an","claim_type":"background","confidence":0.6,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Advances in neural information processing systems , volume= because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (3 contexts).","role_counts":[{"n":3,"context_role":"background"},{"n":1,"context_role":"other"}]},"error":null,"updated_at":"2026-05-15T05:57:42.862697+00:00"}},"summary":{"title":"Advances in neural information processing systems , volume=","claims":[{"claim_text":"phenomenon might further improve the performance of the co-scientist as a tool for scientific discovery. Improved multimodal reasoning and tool-use capabilities.Some of the most interesting data in scientific publications is not written in text but may be encoded visually in figures and charts. However, even state-of-the-art frontier models may not comprehensively utilize such data with optimal reasoning [89] and the AI co-scientist system is unlikely to be an exception. Stronger benchmarks and ","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Bernoulli random variables with α(q) success probability. According to the Hoeffding's inequal- ity Hoeffding (1963) we have P(|m−α(q)N| ≥t)≤2 exp \u0012 −2t2 N \u0013 ,(8) wheret≥0. It implies P(|m−α(q)N| ≤t)≥1−2 exp \u0012 −2t2 N \u0013 ⇔P(−t≤m−α(q)N≤t)≥1−2 exp \u0012 −2t2 N \u0013 ⇒P(−t≤m−α(q)N)≥1−2 exp \u0012 −2t2 N \u0013 . By settingt= q N 2 log 2 ϵ from anyϵ >0, one may check, P m≥α(q)N− r N 2 log 1 ϵ ! ≥1−ϵ.(9) In order to ensurem≥N/2, we need to have α(q)N− r N 2 log 1 ϵ ≥ 1 2 N ⇒α(q)≥ 1 2 + r 1 2N log 1 ϵ (10) B Discussion T","claim_type":"other","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"performance would not be significantly degraded if projBx were replaced with a nonzero constant value more representative of the training distribution. To correct for this, in our causality experiments we ablate directions by replacing them with theirmeanvalues computed across a dataset, instead of zeroing them out. Specifically, to ablate a directionu, we use the formula: x′ =x+P u(x−x)(21) whereP u is the projection matrix foruand xis the mean representation. E. Static interpretability analysi","claim_type":"background","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"an All-Reduce operation so that they can use the identical gradient to update the model parameters. The All-Reduce operation accumulates distributed gradients (say Xi at i worker) from all workers (say P workers) using a reduction operation (typically sum or mean in training), which can be formally represented X=AllReduce(X 1, X2, ..., XP ) = PX i=1 Xi.(5) The gradients have the same dimensionality as the model weights, which means additional memory is required to store them for communication an","claim_type":"background","confidence":0.6,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Advances in neural information processing systems , volume= because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (3 contexts).","role_counts":[{"n":3,"context_role":"background"},{"n":1,"context_role":"other"}]},"graph":{"co_cited":[{"title":"Advances in neural information processing systems , volume=","work_id":"a1fd09f1-b62b-4aca-a5ef-dd2b50ad08b5","shared_citers":11},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":11},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":10},{"title":"OpenAI blog , volume=","work_id":"31dc92c3-2fc3-432b-8d63-b7ee13f53a9c","shared_citers":8},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":8},{"title":"Advances in neural information processing systems , volume=","work_id":"1265447d-0324-4d07-abba-34fa29d172da","shared_citers":7},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":7},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":7},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":6},{"title":"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback","work_id":"a1f2574b-a899-4713-be60-c87ba332656c","shared_citers":6},{"title":"Advances in neural information processing systems , volume=","work_id":"f5fceb4c-0b93-4786-b48f-a567818bb731","shared_citers":5},{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":5},{"title":"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model","work_id":"337ba690-f35d-4154-9450-8edf4bc9f488","shared_citers":5},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":5},{"title":"Emergent Abilities of Large Language Models","work_id":"6ea3375b-837c-4640-a175-be7525aa3c6d","shared_citers":5},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":5},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":5},{"title":"Language models are few-shot learners","work_id":"b5af3a68-2622-4421-b39b-b1d2fbde2d8d","shared_citers":5},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":5},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":5},{"title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach","work_id":"41fe12c4-e538-4890-a244-480650ed3078","shared_citers":5},{"title":"Advances in neural information processing systems , volume=","work_id":"c25e8154-fab2-455c-8a26-56e40aed5d2b","shared_citers":4},{"title":"Advances in Neural Information Processing Systems , volume=","work_id":"bd577a47-49ef-4f8a-81ca-86cd88b71479","shared_citers":4},{"title":"Advances in Neural Information Processing Systems , volume=","work_id":"a675d477-b40a-4944-b8a6-2733c94587e8","shared_citers":4}],"time_series":[{"n":6,"year":2023},{"n":1,"year":2024},{"n":1,"year":2025},{"n":29,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"fe4d5dbf-e369-4296-8f93-544d5ed81b09","orcid":null,"display_name":"author=","source":"manual","import_confidence":0.72},{"id":"4b8187d1-cbb8-40dc-b8fb-985b6b161e73","orcid":null,"display_name":"Language models are few-shot learners","source":"manual","import_confidence":0.72}]}}