LLMs versus the Halting Problem: Characterizing Program Termination Reasoning

Dafna Shahaf; Jordi Armengol-Estape; Julien Vanegue; Oren Sultan; Pascal Kesseli; Peter O'Hearn; Yossi Adi

arxiv: 2601.18987 · v5 · pith:QDLN3BUAnew · submitted 2026-01-26 · 💻 cs.CL · cs.AI· cs.PL

LLMs versus the Halting Problem: Characterizing Program Termination Reasoning

Oren Sultan , Jordi Armengol-Estape , Pascal Kesseli , Julien Vanegue , Dafna Shahaf , Yossi Adi , Peter O'Hearn This is my paper

classification 💻 cs.CL cs.AIcs.PL

keywords terminationllmsproblemverificationprogramprogramstoolshalting

0 comments

read the original abstract

Determining whether a program terminates is a central problem in computer science. Turing's Halting Problem established termination as undecidable, showing that no algorithm can universally determine termination for all programs and inputs. Hence, verification tools approximate termination, sometimes failing to prove or disprove; these tools rely on problem specific architectures, and are usually tied to particular programming languages. Recent advances in LLMs raise a natural question: To what extent can they reason about program termination? We evaluate frontier LLMs on a diverse set of C programs from the International Competition on Software Verification (SV Comp) 2025. Our results show that GPT-5 and Claude Sonnet 4.5 achieve scores comparable to top ranked verification tools (with test time scaling). However, while models often correctly infer whether programs terminate, they frequently fail to construct a witness as formal proof, revealing a gap between semantic recognition and symbolic proof generation. Performance further degrades as code length increases. To analyze this gap, we introduce a divergence precondition formulation that characterizes non termination conditions as logical constraints. We hope these findings motivate future research on real-world termination benchmarks, neuro-symbolic approaches that combine LLMs with symbolic verification methods, and, more broadly LLM reasoning on other undecidable problems.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Teaching LLMs Program Semantics via Symbolic Execution Traces
cs.SE 2026-05 unverdicted novelty 6.0

Training Qwen3-8B on symbolic execution traces from Soteria improves violation detection in C programs by over 17 points, transfers across five property types, and shows superadditive gains with chain-of-thought.
Why Code, Why Now: An Information-Theoretic Perspective on the Limits of Machine Learning
cs.LG 2026-02 unverdicted novelty 6.0

Task information structure determines ML scaling success, with code's dense verifiable signals enabling predictable progress while sparse-feedback tasks like typical RL do not.
Natural Language based Specification and Verification
cs.SE 2026-05 unverdicted novelty 5.0

LLMs can generate natural language specs and perform compositional verification to help prevent vulnerable code from being produced by AI models.