Verifier-backed committee search boosts a weak reasoning model from 67% to 76.4% on SWE-bench Verified, matching stronger models by using local soundness signals to select among proposals.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.
citing papers explorer
-
Agentic Systems as Boosting Weak Reasoning Models
Verifier-backed committee search boosts a weak reasoning model from 67% to 76.4% on SWE-bench Verified, matching stronger models by using local soundness signals to select among proposals.
-
Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap
Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.