A Lean 4 machine-verified proof establishes that depth-p QAOA on the ring of disagrees attains approximation ratio (2p+1)/(2p+2) exactly.
Title resolution pending
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
polarities
background 2representative citing papers
LeanMarathon uses four contract-scoped agents on an evolving blueprint coordinated by a two-stage orchestrator to formalize seven theorems from Erdős problems in Lean, proving 258 lemmas with no sorry across three runs.
IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
LiveFMBench shows that direct LLM prompting for C program formal specs overestimates accuracy by ~20% due to unfaithful behaviors like deceiving provers, while agentic workflows help under low sampling but overall performance remains far below human-authored specs.
LLMs display clear performance stratification on formal language tasks aligned with Chomsky hierarchy complexity levels, limited by severe efficiency barriers rather than absolute capability.
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
A Lean library called Palamedes uses synthesis rules from generator semantics and catamorphism-anamorphism rewriting to automatically produce correct constrained random generators.
Agent-directed tree search improves LLM performance on Lean formal verification tasks, with context-based orchestration solving more intermediate specs at lower token cost than baseline agents.
A hybrid pipeline lets an LLM write high-level proof sketches in a compact DSL that a lightweight kernel then expands into explicit, checkable obligations for reliable math and logic reasoning.
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
RLVR training raises verified Dafny pass rates from 9.7% to 31.1% on a filtered benchmark while a Lean proof scaffold lifts success from 46.2% to 69.2% on a pilot set and solves 7 of 42 prior unsolved tasks.
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.
citing papers explorer
-
pAI/MSc: ML Theory Research with Humans on the Loop
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.