A test-time zeroth-order optimization of prompt embeddings using a bounded self-supervised proxy from demonstration log-probabilities improves ICL accuracy and correlates with gains across tasks.
What Makes Good In-Context Examples for
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 12roles
background 2polarities
background 2representative citing papers
Legal2LogicICL improves accuracy and generalization when mapping legal cases to logical formulas by retrieving balanced diverse exemplars at semantic and structural levels, backed by the new Legal2Proleg dataset.
LC-ICL improves few-shot NER and RE by using label-guided contrastive demonstrations that pair positive samples with error-annotated negative samples.
MLP activations measured as massive activations or first four moments correlate weakly (max |Spearman| = 0.33) with in-context example quality across Llama-3.2-3B, Qwen2.5-3B, and multiple classification/generative tasks, so activation-based active learning should not be used for ICL.
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
Many-shot CoT-ICL improves when demonstrations are ordered for smooth conceptual progression, with CDS delivering up to 5.42 percentage-point gains on math tasks using 64 examples.
METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior performance and up to 67% faster convergence across math, code, and agent benchmarks.
SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.
GRaSp optimizes in-context examples for LLMs via synthetic generation, clustering, dimensionality reduction, and genetic algorithms with diversity-adaptive mutation, reaching 45.84% micro-F1 on financial NER with real data and outperforming zero-shot and random few-shot baselines.
LLMs disperse meaning-preserving prompts internally instead of clustering them, which produces an excessively high upper bound on output log-probability differences via Taylor expansion and Cauchy-Schwarz.
Presents a four-module LLM framework for text-to-SQL on the ALeRCE astro database, evaluated on 110 NL/SQL pairs across 13 models with perfect-match metrics.
citing papers explorer
No citing papers match the current filters.