Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.
Large language models as optimizers
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8representative citing papers
OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated constraint formulation.
Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.
ADKO is a decentralized framework where agents share compact GP-derived tokens and LM insights to achieve collaborative Bayesian optimization with a decomposed regret bound that includes compression and approximation losses.
FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.
SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage points across equity sectors.
Bilevel optimization with outer-loop MCTS for skill structure and inner-loop LLM refinement improves agent accuracy on an operations-research question-answering dataset.
citing papers explorer
-
Meta-Harness: End-to-End Optimization of Model Harnesses
Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.