pith. sign in

Huxley-g\"odel machine: Human-level coding agent development by an approximation of the optimal self-improving machine

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.AI 2 cs.SE 1

years

2026 2 2025 1

verdicts

UNVERDICTED 3

roles

background 1

polarities

background 1

representative citing papers

Harnessing Agentic Evolution

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

Pioneer Agent: Continual Improvement of Small Language Models in Production

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.

citing papers explorer

Showing 3 of 3 citing papers.

  • Harnessing Agentic Evolution cs.AI · 2026-05-13 · unverdicted · none · ref 28

    AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

  • Pioneer Agent: Continual Improvement of Small Language Models in Production cs.AI · 2026-04-10 · unverdicted · none · ref 92

    Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.

  • Toward Training Superintelligent Software Agents through Self-Play SWE-RL cs.SE · 2025-12-21 · unverdicted · none · ref 45

    Self-play RL on bug injection and repair in sandboxed repositories yields +10.4 and +7.8 point gains on SWE-bench Verified and Pro while outperforming human-data baselines.