An algorithm generates a portfolio of LLM-produced optimization models with guarantees that high-quality candidates are included if either the generator or evaluator aligns with human preferences.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.
citing papers explorer
-
Generating Robust Portfolios of Optimization Models using Large Language Models
An algorithm generates a portfolio of LLM-produced optimization models with guarantees that high-quality candidates are included if either the generator or evaluator aligns with human preferences.
-
Trust Region On-Policy Distillation
TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.