hub Canonical reference

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi · 2022 · cs.CL · arXiv 2202.12837

Canonical reference. 100% of citing Pith papers cite this work as background.

31 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 31 citing papers arXiv PDF

abstract

Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs. However, there has been little understanding of how the model learns and which aspects of the demonstrations contribute to end task performance. In this paper, we show that ground truth demonstrations are in fact not required -- randomly replacing labels in the demonstrations barely hurts performance on a range of classification and multi-choce tasks, consistently over 12 different models including GPT-3. Instead, we find that other aspects of the demonstrations are the key drivers of end task performance, including the fact that they provide a few examples of (1) the label space, (2) the distribution of the input text, and (3) the overall format of the sequence. Together, our analysis provides a new way of understanding how and why in-context learning works, while opening up new questions about how much can be learned from large language models through inference alone.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 11

citation-polarity summary

background 11

representative citing papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

cs.CL · 2022-01-28 · accept · novelty 9.0

Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.

Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

Multimodal ICL lags text-only ICL in few-shot settings due to weak cross-modal reasoning alignment and unreliable task mapping transfer, with an inference-stage method proposed to strengthen transfer.

Soft Head Selection for Injecting ICL-Derived Task Embeddings

cs.CL · 2025-07-28 · conditional · novelty 7.0

SITE applies soft gradient-based head selection to inject ICL-derived task embeddings, outperforming prior embedding adaptation and few-shot ICL across generation, reasoning, and NLU tasks on 12 LLMs from 4B to 70B parameters.

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

cs.CL · 2022-11-22 · unverdicted · novelty 7.0

PoT prompting improves numerical reasoning by having language models write programs executed by a computer instead of performing calculations in natural language chains of thought, with an average 12% gain over CoT.

In-context Learning and Induction Heads

cs.LG · 2022-09-24 · unverdicted · novelty 7.0

Induction heads, which implement pattern completion in attention, develop at the same training stage as a sudden rise in in-context learning, providing evidence they are the primary mechanism for in-context learning in transformers.

Flamingo: a Visual Language Model for Few-Shot Learning

cs.CV · 2022-04-29 · unverdicted · novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.

Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.

Measuring Representation Robustness in Large Language Models for Geometry

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

LLMs display accuracy gaps of up to 14 percentage points on the same geometry problems solely due to representation choice, with vector forms consistently weakest and a convert-then-solve prompt helping only high-capacity models.

Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection

cs.LG · 2025-12-15 · unverdicted · novelty 6.0

FinFRE-RAG combines importance-guided feature reduction with label-aware retrieval-augmented generation to boost LLM performance on tabular fraud detection across four public datasets while providing human-readable rationales.

SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation

cs.CR · 2025-11-17 · conditional · novelty 6.0

SnapAudit decomposes DP-ICL into a deterministic snapshot stage and a stochastic noise stage, using bootstrap simulation to achieve 80-200x faster auditing and exposing privacy bound violations in existing Gaussian and embedding mechanisms.

Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

cs.CL · 2025-09-29 · unverdicted · novelty 6.0

A new framework using Task Subspace Logit Attribution localizes attention heads specialized for task recognition and task learning in in-context learning, showing they align and rotate hidden states within a task subspace.

Otter: A Multi-Modal Model with In-Context Instruction Tuning

cs.CV · 2023-05-05 · unverdicted · novelty 6.0

Otter is a multi-modal model instruction-tuned on the MIMIC-IT dataset of over 3 million in-context instruction-response pairs to improve convergence and generalization on tasks with multiple images and videos.

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

cs.AI · 2023-02-03 · conditional · novelty 6.0

DEPS combines LLM-based interactive planning with a trainable goal selector to create a zero-shot multi-task agent that completes 70+ Minecraft tasks and nearly doubles prior performance.

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

cs.CL · 2022-10-17 · accept · novelty 6.0

Chain-of-thought prompting enables large language models to surpass average human performance on 17 of 23 challenging BIG-Bench tasks.

Automatic Chain of Thought Prompting in Large Language Models

cs.CL · 2022-10-07 · conditional · novelty 6.0

Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.

Emergent Abilities of Large Language Models

cs.CL · 2022-06-15 · unverdicted · novelty 6.0

Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.

The Structure and Dynamics of the Online MAHA-sphere

cs.SI · 2026-05-19 · unverdicted · novelty 5.0

Reddit analysis finds MAHA users show strong cross-theme belief bundling and network coherence unlike anti-MAHA users, with pandemic-era shifts from anti-fluoride/mask to anti-vaccine to broader anti-science engagement.

Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

cs.AI · 2026-05-10 · unverdicted · novelty 5.0

Absurd World automatically converts real-world problems into absurd yet logically coherent scenarios to test whether LLMs can reason without depending on familiar patterns.

Can LLMs Take Retrieved Information with a Grain of Salt?

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

LLMs exhibit systematic failures in obeying expressed certainty in retrieved contexts, but a combination of prior reminders, certainty recalibration, and context simplification reduces obedience errors by 25%.

When Context Sticks: Studying Interference in In-Context Learning

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.

SAKE: Self-aware Knowledge Exploitation-Exploration for Grounded Multimodal Named Entity Recognition

cs.IR · 2026-04-22 · unverdicted · novelty 5.0

SAKE is an agentic framework for GMNER that uses uncertainty-based self-awareness and reinforcement learning to balance internal knowledge exploitation with adaptive external exploration.

citing papers explorer

Showing 31 of 31 citing papers.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models cs.CL · 2022-01-28 · accept · none · ref 42 · internal anchor
Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks cs.CV · 2026-04-15 · unverdicted · none · ref 31 · internal anchor
Multimodal ICL lags text-only ICL in few-shot settings due to weak cross-modal reasoning alignment and unreliable task mapping transfer, with an inference-stage method proposed to strengthen transfer.
Soft Head Selection for Injecting ICL-Derived Task Embeddings cs.CL · 2025-07-28 · conditional · none · ref 12 · internal anchor
SITE applies soft gradient-based head selection to inject ICL-derived task embeddings, outperforming prior embedding adaptation and few-shot ICL across generation, reasoning, and NLU tasks on 12 LLMs from 4B to 70B parameters.
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks cs.CL · 2022-11-22 · unverdicted · none · ref 17 · internal anchor
PoT prompting improves numerical reasoning by having language models write programs executed by a computer instead of performing calculations in natural language chains of thought, with an average 12% gain over CoT.
In-context Learning and Induction Heads cs.LG · 2022-09-24 · unverdicted · none · ref 15 · internal anchor
Induction heads, which implement pattern completion in attention, develop at the same training stage as a sudden rise in in-context learning, providing evidence they are the primary mechanism for in-context learning in transformers.
Flamingo: a Visual Language Model for Few-Shot Learning cs.CV · 2022-04-29 · unverdicted · none · ref 77 · internal anchor
Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.
OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences cs.CR · 2026-05-18 · unverdicted · none · ref 21 · internal anchor
OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space cs.CL · 2026-05-12 · unverdicted · none · ref 31 · internal anchor
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces cs.AI · 2026-05-09 · unverdicted · none · ref 89 · internal anchor
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA cs.AI · 2026-05-05 · unverdicted · none · ref 54 · internal anchor
Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.
Measuring Representation Robustness in Large Language Models for Geometry cs.CL · 2026-04-03 · unverdicted · none · ref 19 · internal anchor
LLMs display accuracy gaps of up to 14 percentage points on the same geometry problems solely due to representation choice, with vector forms consistently weakest and a convert-then-solve prompt helping only high-capacity models.
Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection cs.LG · 2025-12-15 · unverdicted · none · ref 31 · internal anchor
FinFRE-RAG combines importance-guided feature reduction with label-aware retrieval-augmented generation to boost LLM performance on tabular fraud detection across four public datasets while providing human-readable rationales.
SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation cs.CR · 2025-11-17 · conditional · none · ref 23 · internal anchor
SnapAudit decomposes DP-ICL into a deterministic snapshot stage and a stochastic noise stage, using bootstrap simulation to achieve 80-200x faster auditing and exposing privacy bound violations in existing Gaussian and embedding mechanisms.
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis cs.CL · 2025-09-29 · unverdicted · none · ref 27 · internal anchor
A new framework using Task Subspace Logit Attribution localizes attention heads specialized for task recognition and task learning in in-context learning, showing they align and rotate hidden states within a task subspace.
Otter: A Multi-Modal Model with In-Context Instruction Tuning cs.CV · 2023-05-05 · unverdicted · none · ref 62 · internal anchor
Otter is a multi-modal model instruction-tuned on the MIMIC-IT dataset of over 3 million in-context instruction-response pairs to improve convergence and generalization on tasks with multiple images and videos.
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents cs.AI · 2023-02-03 · conditional · none · ref 30 · internal anchor
DEPS combines LLM-based interactive planning with a trainable goal selector to create a zero-shot multi-task agent that completes 70+ Minecraft tasks and nearly doubles prior performance.
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them cs.CL · 2022-10-17 · accept · none · ref 17 · internal anchor
Chain-of-thought prompting enables large language models to surpass average human performance on 17 of 23 challenging BIG-Bench tasks.
Automatic Chain of Thought Prompting in Large Language Models cs.CL · 2022-10-07 · conditional · none · ref 23 · internal anchor
Auto-CoT automatically builds chain-of-thought demonstrations by sampling diverse questions and letting the LLM generate reasoning chains, matching manual CoT performance on ten reasoning tasks with GPT-3.
Emergent Abilities of Large Language Models cs.CL · 2022-06-15 · unverdicted · none · ref 59 · internal anchor
Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
The Structure and Dynamics of the Online MAHA-sphere cs.SI · 2026-05-19 · unverdicted · none · ref 40 · internal anchor
Reddit analysis finds MAHA users show strong cross-theme belief bundling and network coherence unlike anti-MAHA users, with pandemic-era shifts from anti-fluoride/mask to anti-vaccine to broader anti-science engagement.
Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities cs.AI · 2026-05-10 · unverdicted · none · ref 11 · internal anchor
Absurd World automatically converts real-world problems into absurd yet logically coherent scenarios to test whether LLMs can reason without depending on familiar patterns.
Can LLMs Take Retrieved Information with a Grain of Salt? cs.CL · 2026-05-07 · unverdicted · none · ref 8 · internal anchor
LLMs exhibit systematic failures in obeying expressed certainty in retrieved contexts, but a combination of prior reminders, certainty recalibration, and context simplification reduces obedience errors by 25%.
When Context Sticks: Studying Interference in In-Context Learning cs.LG · 2026-04-25 · unverdicted · none · ref 17 · internal anchor
In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.
SAKE: Self-aware Knowledge Exploitation-Exploration for Grounded Multimodal Named Entity Recognition cs.IR · 2026-04-22 · unverdicted · none · ref 26 · internal anchor
SAKE is an agentic framework for GMNER that uses uncertainty-based self-awareness and reinforcement learning to balance internal knowledge exploitation with adaptive external exploration.
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL cs.CL · 2025-11-02 · unverdicted · none · ref 21 · internal anchor
MARS-SQL trains a multi-agent RL system with ReAct-style interaction and generative validation to produce SQL queries, reaching 77.84% execution accuracy on BIRD dev and 89.75% on Spider test.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering cs.CL · 2026-04-27 · unverdicted · none · ref 25 · internal anchor
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis cs.SE · 2026-03-18 · unverdicted · none · ref 118 · 2 links · internal anchor
Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 123 · internal anchor
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
SLM Finetuning for Natural Language to Domain Specific Code Generation in Production cs.LG · 2026-04-10 · unverdicted · none · ref 20 · internal anchor
Fine-tuned small language models outperform larger models in natural language to domain-specific code generation with improved performance, latency, and the ability to adapt to customer-specific scenarios without losing general capabilities.
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models cs.AI · 2025-01-16 · unverdicted · none · ref 98 · internal anchor
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages cs.CL · 2026-05-16 · unverdicted · none · ref 143 · internal anchor
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer