Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.
arXiv preprint arXiv:2005.00547 (2020)
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
HoldUp uses LLM-guided clustering to provide holistic dataset context for semantic operators, yielding up to 33% higher classification accuracy and 30% higher scoring accuracy than row-by-row LLM processing across 15 datasets.
PrefixMemory-Tuning decouples the prefix from attention to overcome performance limits of traditional prefix-tuning and reaches competitive results with modern PEFT methods on LLM adaptation benchmarks.
AIPsy-Affect supplies 480 keyword-free clinical vignettes and matched neutral controls for mechanistic interpretability studies of emotion in language models.
SUMMIR is a multimetric ranking model that orders LLM-generated sports insights by importance while incorporating hallucination detection to improve factual reliability across cricket, soccer, basketball, and baseball articles.
T-FIX operationalizes expert alignment for LLM explanations as an automatic, generalizable evaluation using domain-specific criteria across seven tasks in three domains.
LLMs detect social signals in clinical transcripts across model families, with an agreement-weighted ensemble using group-level agreement patterns improving accuracy and stability over individual models.
citing papers explorer
-
Meta-Harness: End-to-End Optimization of Model Harnesses
Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.