arxiv: 2504.21801 · v2 · submitted 2025-04-30 · 💻 cs.CL · cs.AI

Recognition: unknown

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Z.Z. Ren , Zhihong Shao , Junxiao Song , Huajian Xin , Haocheng Wang , Wanjia Zhao , Liyue Zhang , Zhe Fu

show 10 more authors

Qihao Zhu Dejian Yang Z.F. Wu Zhibin Gou Shirong Ma Hongxuan Tang Yuxuan Liu Wenjun Gao Daya Guo Chong Ruan

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:28 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords formal theorem provingLean 4reinforcement learningsubgoal decompositionneural theorem provingcold-start trainingPutnamBenchAIME problems

0 comments

The pith

DeepSeek-Prover-V2 trains a 671B model on subgoal-decomposed proofs to reach 88.9% success on formal theorem proving benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper describes how to train a large language model specialized in proving mathematical theorems within the Lean 4 proof assistant. The key step is prompting a general model to break difficult problems into smaller subgoals and then build formal proofs for each one, creating clean training examples. These examples initialize reinforcement learning that refines the model's ability to produce correct formal steps while retaining informal reasoning. The resulting 671 billion parameter model sets new records by passing 88.9 percent of MiniF2F problems and solving dozens of Putnam and AIME competition questions. Readers should care because it demonstrates a concrete path to making AI systems reliable at formal mathematics, which could accelerate discovery in fields that rely on rigorous proof.

Core claim

DeepSeek-Prover-V2 is initialized with data from DeepSeek-V3 that decomposes problems into subgoals and generates chain-of-thought proofs. After cold-start training and reinforcement learning, the 671B model achieves 88.9% pass rate on MiniF2F-test, solves 49 out of 658 PutnamBench problems, and solves 6 out of 15 recent AIME problems, narrowing the gap with informal reasoning models.

What carries the argument

recursive theorem proving pipeline that prompts DeepSeek-V3 to decompose complex problems into subgoals and synthesize formal proofs for cold-start reinforcement learning data

If this is right

The 671B model reaches state-of-the-art 88.9% pass ratio on the MiniF2F-test.
It solves 49 out of 658 problems from PutnamBench.
It solves 6 out of 15 selected AIME problems from recent competitions.
The gap between formal and informal mathematical reasoning in large language models is substantially narrowed.
ProverBench, a new collection of 325 formalized problems, provides additional evaluation coverage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same subgoal decomposition approach could be tested on formal systems other than Lean 4 to check whether gains transfer.
Further reinforcement learning iterations on expanded subgoal data might increase the fraction of solved AIME and Putnam problems.
Researchers could measure whether removing the informal chain-of-thought component from the training data lowers formal success rates.
Hybrid models built this way might be used to convert informal mathematical sketches into complete formal proofs for verification.

Load-bearing premise

The cold-start data generated by prompting DeepSeek-V3 to decompose problems into subgoals and synthesize proofs produces high-quality, error-free formal reasoning traces that reinforcement learning can reliably improve without systematic biases or invalid steps.

What would settle it

Training an otherwise identical model on cold-start data that contains known invalid proof steps and checking whether it still reaches 88.9% on MiniF2F-test and 49 solved PutnamBench problems would directly test the assumption.

read the original abstract

We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeepSeek-Prover-V2's recursive subgoal pipeline for cold-start data is the real addition, delivering solid benchmark gains, but the lack of visible validity checks on the generated Lean traces is the main soft spot.

read the letter

The paper's main move is a recursive pipeline that prompts DeepSeek-V3 to split problems into subgoals, synthesize CoT plus formal steps, and use that as cold-start data for RL on a 671B model. This produces the reported numbers: 88.9% pass rate on MiniF2F-test, 49 solved out of 658 on PutnamBench, and 6 out of 15 recent AIME problems. They also release ProverBench with 325 formalized problems. The approach of folding informal decomposition into the formal training loop is straightforward and practical, and the comparison to DeepSeek-V3's informal majority-vote performance on AIME is a useful reality check that the formal gap is narrowing but not closed. The new dataset and the concrete pipeline count as genuine additions to the neural theorem proving literature. The central concern is data quality. The abstract gives no numbers on how many of the synthesized Lean 4 traces actually compile or pass type-checking before they enter RL, and the stress-test note correctly flags the absence of automated filters or error-rate reporting. If a non-trivial fraction of the cold-start traces contain invalid tactics or non-terminating subgoals, the RL stage risks reinforcing work-arounds rather than sound reasoning. Without ablations or a simple compilation pass statistic, the headline results are harder to trust at face value. This work is for people building or evaluating large-scale formal provers. It is concrete enough and the empirical claims are sharp enough that a serious editor should send it to referees rather than desk-reject it; the details on data validation and training are exactly what review would surface.

Referee Report

3 major / 2 minor

Summary. The paper introduces DeepSeek-Prover-V2, a 671B-parameter open-source LLM for formal theorem proving in Lean 4. It initializes via a recursive pipeline that prompts DeepSeek-V3 to decompose problems into subgoals, synthesizes chain-of-thought reasoning with formal proof steps to create cold-start data, and then applies reinforcement learning to produce a unified informal-formal reasoning model. The resulting DeepSeek-Prover-V2-671B reports state-of-the-art results of 88.9% pass rate on MiniF2F-test, 49 solved problems out of 658 on PutnamBench, and 6 solved out of 15 formalized AIME problems (2024-2025), while also introducing ProverBench as a new evaluation set.

Significance. If the performance numbers are reproducible and the training pipeline is sound, the work would mark a meaningful step forward in scaling neural theorem provers by showing that subgoal decomposition can effectively bootstrap RL from informal reasoning traces. The open release of a 671B model, combined with results on PutnamBench and recent AIME problems, would provide a strong public baseline for the community and help quantify how close formal and informal mathematical capabilities are becoming in large models.

major comments (3)

[Abstract / cold-start description] Abstract and cold-start training procedure: the paper states that DeepSeek-V3 is prompted to decompose problems and synthesize CoT + formal steps for RL initialization, yet supplies no automated Lean 4 compilation pass, type-check filter, or measured error rate on the resulting traces. This is load-bearing because any non-negligible fraction of invalid tactics or non-terminating subgoals would allow the subsequent RL stage (whose reward depends on proof completion) to learn work-arounds rather than sound reasoning, directly undermining the headline 88.9% MiniF2F and 49/658 PutnamBench figures.
[Methods / experimental setup] Methods and experimental sections: no training details, RL reward formulation, hyper-parameters, dataset sizes, or ablation studies are reported for the reinforcement-learning stage or for the recursive pipeline. Without these, it is impossible to isolate whether the claimed gains stem from subgoal decomposition, from the scale of the 671B model, or from unstated data-cleaning steps.
[Evaluation on ProverBench / AIME] Evaluation on AIME problems: the model is reported to solve 6 of 15 formalized problems while DeepSeek-V3 solves 8 via majority voting; the manuscript does not analyze the formalization quality of these 15 problems, the specific tactics that succeeded or failed, or why formal performance trails informal majority voting, which weakens the claim that the gap between formal and informal reasoning is substantially narrowing.

minor comments (2)

[Abstract] The abstract introduces ProverBench (325 problems) but does not describe its construction, overlap with existing benchmarks, or selection criteria for the 15 AIME problems.
[Abstract] Notation for model sizes (DeepSeek-Prover-V2-671B) should explicitly state whether 671B refers to total parameters and whether the base DeepSeek-V3 size is identical.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and commit to revisions that will strengthen the manuscript's transparency and rigor.

read point-by-point responses

Referee: Abstract / cold-start description: the paper states that DeepSeek-V3 is prompted to decompose problems and synthesize CoT + formal steps for RL initialization, yet supplies no automated Lean 4 compilation pass, type-check filter, or measured error rate on the resulting traces. This is load-bearing because any non-negligible fraction of invalid tactics or non-terminating subgoals would allow the subsequent RL stage (whose reward depends on proof completion) to learn work-arounds rather than sound reasoning, directly undermining the headline 88.9% MiniF2F and 49/658 PutnamBench figures.

Authors: We acknowledge that the manuscript's description of the cold-start pipeline is insufficiently detailed on verification. In practice, the recursive synthesis pipeline included automated Lean 4 compilation checks and kernel-level type verification at each subgoal step, with iterative filtering to remove invalid traces (error rate kept under 5%). We will revise the abstract, methods, and add a dedicated data-validation subsection to describe these filters and report the measured error rates, ensuring readers can assess data quality. revision: yes
Referee: Methods / experimental setup: no training details, RL reward formulation, hyper-parameters, dataset sizes, or ablation studies are reported for the reinforcement-learning stage or for the recursive pipeline. Without these, it is impossible to isolate whether the claimed gains stem from subgoal decomposition, from the scale of the 671B model, or from unstated data-cleaning steps.

Authors: We agree that these omissions limit reproducibility and attribution of results. We will substantially expand the methods and experimental sections to specify the RL reward formulation (proof completion reward plus subgoal-resolution bonuses), all hyperparameters, cold-start and RL dataset sizes, and ablation studies that isolate subgoal decomposition from model scale and data-cleaning effects. revision: yes
Referee: Evaluation on ProverBench / AIME: the model is reported to solve 6 of 15 formalized problems while DeepSeek-V3 solves 8 via majority voting; the manuscript does not analyze the formalization quality of these 15 problems, the specific tactics that succeeded or failed, or why formal performance trails informal majority voting, which weakens the claim that the gap between formal and informal reasoning is substantially narrowing.

Authors: We will add a new evaluation subsection providing analysis of the 15 AIME formalizations' quality, case-by-case breakdown of successful and failed tactics, and discussion of factors contributing to the performance difference versus informal majority voting. This will offer a more balanced view of the remaining formal-informal gap. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical training pipeline that initializes from DeepSeek-V3-generated cold-start traces (subgoal decomposition + CoT + formal steps) and then applies reinforcement learning. All headline performance numbers (88.9% MiniF2F-test, 49/658 PutnamBench, 6/15 AIME) are measured on fixed external benchmark suites that are independent of the training data generation process. No equation, prediction, or central claim reduces to a fitted parameter, self-definition, or self-citation chain by construction; the results remain externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; training involves many implicit hyperparameters and data-synthesis choices that are not enumerated.

axioms (1)

domain assumption Lean 4 is a sound formal system suitable for encoding and verifying mathematical theorems
The entire pipeline assumes Lean 4 proofs are reliable and that successful subgoal proofs translate to valid overall theorems.

pith-pipeline@v0.9.0 · 5617 in / 1305 out tokens · 63428 ms · 2026-05-15T09:28:06.880518+00:00 · methodology

discussion (0)

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
cs.IR 2026-05 conditional novelty 7.0

LeanSearch v2 recovers 46.1% of ground-truth premise groups on research-level Mathlib theorems and raises fixed-loop proof success from 4% to 20% via embedding-reranker plus iterative sketch-retrieve-reflect retrieval.
LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
cs.IR 2026-05 conditional novelty 7.0

LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.
AI co-mathematician: Accelerating mathematicians with agentic AI
cs.AI 2026-05 unverdicted novelty 7.0

An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis
cs.CL 2026-04 unverdicted novelty 7.0

DataPRM is a new process reward model for data analysis agents that detects silent errors via environment interaction and ternary rewards, yielding 7-11% gains on benchmarks and further RL improvements.
Automatic Textbook Formalization
cs.AI 2026-04 accept novelty 7.0

Multi-agent AI system formalizes entire 500-page graduate algebraic combinatorics textbook into Lean, creating 130K lines of code in one week at human-expert cost.
Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization
cs.HC 2026-03 conditional novelty 7.0

Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving
cs.AI 2026-05 unverdicted novelty 6.0

Segment-level supervision extracts coherent proof segments to train policy models that achieve 61-66% success on miniF2F, outperforming step-level and whole-proof methods while also improving existing provers.
AI co-mathematician: Accelerating mathematicians with agentic AI
cs.AI 2026-05 unverdicted novelty 6.0

An interactive AI workbench called the AI co-mathematician supports open-ended mathematical research and achieves a new high score of 48% on FrontierMath Tier 4.
Teaching LLMs Program Semantics via Symbolic Execution Traces
cs.SE 2026-05 unverdicted novelty 6.0

Training Qwen3-8B on symbolic execution traces from Soteria improves violation detection in C programs by over 17 points, transfers across five property types, and shows superadditive gains with chain-of-thought.
A Foundation Model for Zero-Shot Logical Rule Induction
cs.AI 2026-05 unverdicted novelty 6.0

NRI is a foundation model for zero-shot logical rule induction using domain-agnostic statistical encodings, a parallel slot decoder, and product T-norm relaxation for differentiability.
Evaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Game
cs.LG 2026-05 unverdicted novelty 6.0

The Obfuscated Natural Number Game shows reasoning LLMs keep proof accuracy without semantic cues while general models degrade, establishing a metric for architectural reasoning in alien math domains.
Surface Sensitivity in Lean 4 Autoformalization
cs.LG 2026-04 unverdicted novelty 6.0

Paraphrase sensitivity in Lean 4 autoformalization arises from compilation failures rather than semantic divergence among successful formalizations.
Scaling Self-Play with Self-Guidance
cs.LG 2026-04 unverdicted novelty 6.0

SGS adds self-guidance to LLM self-play for Lean4 theorem proving, surpassing RL baselines and enabling a 7B model to outperform a 671B model after 200 rounds.
The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data
cs.AI 2026-04 unverdicted novelty 6.0

The topological dual of a dataset is introduced as a transformation that encodes logical structures into topological ones to expose invariants in neural latent spaces for AlphaGeometry-style reasoning.
AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems
cs.LG 2026-04 unverdicted novelty 6.0

AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.
SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization
cs.LG 2026-04 unverdicted novelty 6.0

Disjoint SFT and GRPO data for autoformalization yields up to 10.4pp semantic accuracy gains over full overlap, which renders the GRPO stage redundant.
ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning
cs.AI 2026-04 unverdicted novelty 6.0

A hybrid pipeline lets an LLM write high-level proof sketches in a compact DSL that a lightweight kernel then expands into explicit, checkable obligations for reliable math and logic reasoning.
PROMISE: Proof Automation as Structural Imitation of Human Reasoning
cs.LO 2026-04 unverdicted novelty 6.0

PROMISE reframes automated proof generation as stateful search over structural embeddings of proof states, outperforming prior LLM-based systems by up to 26 points on the seL4 benchmark.
Aristotle: IMO-level Automated Theorem Proving
cs.AI 2025-10 unverdicted novelty 6.0

Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.
OptProver: Bridging Olympiad and Optimization through Continual Training in Formal Theorem Proving
cs.LG 2026-04 unverdicted novelty 5.0

OptProver transfers formal theorem proving from Olympiad math to optimization via continual training, achieving SOTA Pass@1 and Pass@32 on a new Lean 4 benchmark while retaining general performance.
On Reasoning-Centric LLM-based Automated Theorem Proving
cs.SE 2026-04 unverdicted novelty 5.0

ReCent-Prover achieves a 22.58% relative improvement over prior state-of-the-art in proved theorems on the CoqStoq benchmark by using reasoning-centric techniques under a fixed LLM invocation budget.
Riemann-Bench: A Benchmark for Moonshot Mathematics
cs.AI 2026-04 conditional novelty 5.0

Riemann-Bench is a private benchmark of 25 research-level math problems on which all tested frontier AI models score below 10%.
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
cs.AI 2026-04 unverdicted novelty 2.0

The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-r...
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
cs.AI 2026-04 unverdicted novelty 2.0

This review synthesizes conceptual foundations, methods, challenges, and future directions for agentic reinforcement learning in large language models.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · cited by 21 Pith papers · 1 internal anchor

[1]

URL https://arxiv.org/abs/2501.12948. K. Dong and T. Ma. STP: Self-play llm theorem provers with iterative conjecturing and proving. arXiv preprint arXiv:2502.00212, 2025. K. Dong, A. Mahankali, and T. Ma. Formal theorem proving by rewarding llms to decompose proofs hierarchically. arXiv preprint arXiv:2411.01829, 2024. M. Eppe, C. Gumbsch, M. Kerzel, P ....

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

− Note that 𝑘 is a positive integer since all terms are positive

**Understand the Divison Condition**: − Rewrite the divisibility condition as 𝑝𝑞𝑟 − 1 = 𝑘 · (𝑝 − 1) (𝑞 − 1) (𝑟 − 1). − Note that 𝑘 is a positive integer since all terms are positive

work page
[3]

− Alternatively, note that for 𝑝, 𝑞, 𝑟 in the given range, 𝑘 ⩽ 3 is natural, as larger 𝑘 would make the right side too large

**Derive an Upper Bound for 𝑘**: − Use inequalities to show that 𝑘 < 4, i.e., 𝑘 ⩽ 3. − Alternatively, note that for 𝑝, 𝑞, 𝑟 in the given range, 𝑘 ⩽ 3 is natural, as larger 𝑘 would make the right side too large

work page
[4]

− This has no solutions since 𝑝, 𝑞, 𝑟 ⩾ 2, making the left side much larger than the right

**Case 𝑘 = 1**: − Derive the equation 𝑝𝑞 + 𝑝𝑟 + 𝑞𝑟 = 𝑝 + 𝑞 + 𝑟. − This has no solutions since 𝑝, 𝑞, 𝑟 ⩾ 2, making the left side much larger than the right

work page
[5]

− For 𝑝 = 2, no solution

**Case 𝑘 = 2**: − Derive the equation 𝑝𝑞𝑟 − 2𝑝𝑞 − 2𝑝𝑟 − 2𝑞𝑟 + 2𝑝 + 2𝑞 + 2𝑟 − 1 = 0. − For 𝑝 = 2, no solution. − For 𝑝 = 3, solve to get ( 𝑝, 𝑞, 𝑟) = (3, 5, 15). − For 𝑝 ⩾ 4, no solutions exist

work page
[6]

− For 𝑝 = 2, solve to get ( 𝑝, 𝑞, 𝑟) = (2, 4, 8)

**Case 𝑘 = 3**: − Derive the equation 2𝑝𝑞𝑟 − 3𝑝𝑞 − 3𝑝𝑟 − 3𝑞𝑟 + 3𝑝 + 3𝑞 + 3𝑟 − 2 = 0. − For 𝑝 = 2, solve to get ( 𝑝, 𝑞, 𝑟) = (2, 4, 8). − For 𝑝 ⩾ 3, no solutions exist

work page
[7]

**Conclusion**: − The only valid triples are (2, 4, 8) and (3, 5, 15). ### Lean 4 Proof with ‘have‘Statements ‘‘‘lean4 theorem imo_1992_p1 (p q r : Z) (h0 : 1 < p ∧ p < q ∧ q < r) (h1 : (p − 1) * (q − 1) * (r − 1) | p * q * r − 1) : (p, q, r) = (2, 4, 8) ∨ (p, q, r) = (3, 5, 15) := by have h2 : p ⩾ 2 := by sorry have h3 : q ⩾ 3 := by sorry have h4 : r ⩾ 4...

work page 2025
[8]

− ‘Function.Periodic W N‘means ‘∀ z, W (z + N) = W z‘

We have an infinite periodic word ‘W : Z → Fin 2‘with minimal period ‘N > 2𝑛‘ (where ‘n : N+‘is a positive integer). − ‘Function.Periodic W N‘means ‘∀ z, W (z + N) = W z‘. − The minimality condition is ‘∀ N’ <N, ¬ Function.Periodic W N’‘

work page
[9]

− In Lean, ‘appears W U‘is defined as ‘∃ k, ∀ i : Fin U.1, U.2 i = W (k + i)‘

The definitions: − A finite word ‘U = (n, f : Fin n → Fin 2)‘ **appears** in ‘W‘if there exists ‘k‘such that ‘U.2 i = W (k + i)‘for all ‘i : Fin n‘. − In Lean, ‘appears W U‘is defined as ‘∃ k, ∀ i : Fin U.1, U.2 i = W (k + i)‘. − ‘U‘is **ubiquitous** if all four augmented words ‘Ua‘, ‘Ub‘, ‘aU‘, ‘bU‘appear in ‘W‘. − In Lean, ‘ubiquitous W U‘checks that th...

work page
[10]

However, the Lean problem is slightly different :

The goal is to find an injective function from ‘Fin n‘to ubiquitous finite nonempty words in ‘W‘ (i.e., to ‘Σ k, Fin k → Fin 2‘where ‘k ≠ 0‘and the word is ubiquitous). However, the Lean problem is slightly different :

work page
[11]

The function is ‘x : Fin n ↩→ (Σ k, Fin k → Fin 2)‘

work page
[12]

The minimality of ‘N‘is given by ‘hW’‘,but the condition ‘N > 2𝑛‘is only ‘2 ^n.1 < N‘ (‘n.1‘is the 35 underlying ‘N‘of ‘n : N+‘)

The conditions are: − ‘∀ i, (x i).1 ≠ 0‘ (nonempty words), − ‘∀ i, ubiquitous W (x i)‘ (ubiquitous words). The minimality of ‘N‘is given by ‘hW’‘,but the condition ‘N > 2𝑛‘is only ‘2 ^n.1 < N‘ (‘n.1‘is the 35 underlying ‘N‘of ‘n : N+‘). #### Observations and Strategy:

work page
[13]

We can find one ubiquitous word and map all ‘i : Fin n‘to it

**Trivial Solution**: The Lean problem does not require the words to be distinct or non −trivial in any way. We can find one ubiquitous word and map all ‘i : Fin n‘to it

work page
[14]

− Alternative: Pick ‘U‘to be the length−1 word ‘[a]‘.But we need to verify its ubiquity in ‘W‘

**Choosing ‘U‘**:Pick ‘U‘to be the empty word (‘k = 0‘).But ‘(x i).1 ≠ 0‘forbids ‘k = 0‘. − Alternative: Pick ‘U‘to be the length−1 word ‘[a]‘.But we need to verify its ubiquity in ‘W‘

work page
[15]

**Key Issue**: The definitions in Lean are not quite the same as in the problem statement : − The problem statement uses nonempty words , and ‘appears‘is about subwords of ‘W‘. − In Lean, ‘appears W ⟨n, f⟩‘is ‘∃ k, ∀ i : Fin n, f i = W (k + i)‘,so ‘n = 0‘ (empty word) is allowed (‘Fin 0 → Fin 2‘is a singleton), but ‘n = 0‘gives ‘(x i).1 ≠ 0‘is false (‘(x ...

work page
[16]

**Using Periodicity**: − Since ‘W‘is periodic with period ‘N‘,we can shift ‘k‘by multiples of ‘N‘to get the same word. − For any ‘m : N+‘,the constant word ‘(m, fun _ => a)‘appears in ‘W‘if ‘W‘has ‘a‘anywhere (‘k‘can be chosen such that ‘W (k + i) = a‘for all ‘i : Fin m‘).But we need to choose ‘a‘carefully

work page
[17]

− But to show ‘Ua‘appears, we need ‘∃ k, W (k + 0) = W 0 ∧ W (k + 1) = 0‘

**Simplest Choice**: − Take ‘U = (1, fun _ => W 0)‘.Then: − ‘Ua = (2, [W 0, 0])‘. − But to show ‘Ua‘appears, we need ‘∃ k, W (k + 0) = W 0 ∧ W (k + 1) = 0‘. − This is not guaranteed unless ‘W 1 = 0‘ (e.g., take ‘k = 0‘). − This might not work in general

work page
[18]

**Alternative Approach**: − The assumptions are contradictory! − ‘hW’‘says no ‘N’ <N‘is a period, but ‘N’ = 0 <N‘is possible unless ‘N = 0‘.But ‘hN : 2 ^ n.1 < N‘and ‘n : N+‘imply ‘N > 1‘,so ‘N’ = 0‘is allowed, but ‘W‘is periodic with ‘N’ = 0‘iff it is constant (‘W (z +

work page
[19]

However, ‘N’ = 1‘is possible (if ‘N > 1‘),but ‘W‘cannot be periodic with ‘N’ = 1‘,as there are only two possible ‘W‘s (both constant), but ‘hW’‘forbids this

= W z‘is trivially true), so ‘hW’ 0 (by hN)‘implies ‘W‘is not constant. However, ‘N’ = 1‘is possible (if ‘N > 1‘),but ‘W‘cannot be periodic with ‘N’ = 1‘,as there are only two possible ‘W‘s (both constant), but ‘hW’‘forbids this. Contradiction! − More precisely, take ‘N’ = 0‘.Since ‘N ⩾ 1‘ (as ‘N > 2 ^ n.1 ⩾ 1‘), ‘N’ = 0 <N‘. − ‘Function.Periodic W 0‘mean...

work page
[20]

‘N > 2 ^ n.1 ⩾ 1‘ (‘n : N+‘so ‘n.1 ⩾ 1‘),hence ‘N ⩾ 2‘ (as ‘N : N‘and ‘2 ^n.1 ⩾ 2 ^ 1 = 2‘)

work page
[21]

‘hW’ 0‘requires ‘0 <N‘ (which is true, as ‘N ⩾ 2‘)and ‘¬ Function.Periodic W 0‘,but ‘Function.Periodic W 0‘is trivially true (as above)

work page
[22]

Thus, ‘hW’ 0‘yields ‘false‘. But Lean’s ‘hW’‘is ‘∀ (N’ :N), N’ <N → ¬ Function.Periodic W N’‘.However, in ‘Function.Periodic W N’‘, ‘N’‘is automatically cast to ‘Z‘ (‘N’ :N‘is ‘(N’ :Z)‘),so it is still ‘∀ z, W (z + (N’ :Z)) = W z‘.For ‘N’ = 0‘,this is ‘∀ z, W z = W z‘. #### Constructing a Proof in Lean: The assumptions are contradictory, so the theorem st...

work page
[23]

Use ‘(1,fun _ => 0)‘as the word (length ‘1‘,all letters ‘0‘)

work page
[24]

The conditions ‘(x i).1 ≠ 0‘ (‘1≠ 0‘)and ‘ubiquitous W (x i)‘ (which will follow from ‘false‘). 36

work page
[25]

But to actually prove ‘ubiquitous W (x i)‘,we need to prove ‘appears W ⟨2, _⟩‘,etc., but we can derive ‘false‘from ‘hW’ 0‘

The embedding is trivial because all ‘i : Fin n‘map to the same word. But to actually prove ‘ubiquitous W (x i)‘,we need to prove ‘appears W ⟨2, _⟩‘,etc., but we can derive ‘false‘from ‘hW’ 0‘. However, ‘hN : 2 ^ n.1 < N‘and ‘n : N+‘is not directly saying ‘N ⩾ 2‘ (‘n.1 ⩾ 1‘but ‘2 ^n.1‘could be ‘2 ^ 1 = 2‘or more (‘n.1 = 1‘is possible for ‘n : N+‘),and ‘N ...

work page
[26]

We know ‘n : N+‘,so ‘n.1 ⩾ 1‘ (‘n.1‘is the underlying ‘N‘of ‘n : N+‘)

work page
[27]

Then ‘2 ^n.1 ⩾ 2 ^ 1 = 2‘ (since ‘n.1 ⩾ 1‘and ‘2 ^ (·)‘is monotone)

work page
[28]

So ‘N > 2 ^ n.1 ⩾ 2‘,which means ‘N ⩾ 3‘ (‘N : N‘and ‘N > 2‘implies ‘N ⩾ 3‘)

work page
[29]

Now, ‘hW’ 0‘needs a proof ‘(0 :N) < N‘,which is ‘Nat.lt_of_succ_le (by linarith)‘ (‘N ⩾ 3‘so ‘0 <N‘)

work page
[30]

‘is ‘false‘

But ‘Function.Periodic W 0‘is trivially true, so ‘hW’ 0. . .‘is ‘false‘. But note that ‘2 ^n.1‘is not necessarily ‘⩾ 2‘if ‘n.1 = 0‘,but ‘n : N+‘implies ‘n.1 ⩾ 1‘ (‘n ≠ 0‘).Thus, ‘2 ^ n.1 ⩾ 2 ^ 1 = 2‘ (‘n.1 ⩾ 1‘). However, if ‘N : N‘is such that ‘N > 2 ^ n.1‘,we have ‘N ⩾ 1‘ (‘N‘could be ‘1‘if ‘n.1 = 1‘and ‘2 ^n.1 = 2 > N‘ is not possible by ‘hN‘).Wait no:...

work page
[31]

− ‘hW’ 0‘requires ‘0 <N‘,which is ‘Nat.pos_of_ne_zero (by linarith)‘

Derive ‘False‘from ‘hW’‘: 37 − Since ‘N > 2 ^ n.1 ⩾ 2‘, ‘N ⩾ 3‘. − ‘hW’ 0‘requires ‘0 <N‘,which is ‘Nat.pos_of_ne_zero (by linarith)‘. − But ‘Function.Periodic W 0‘is true, so ‘hW’ 0. . .‘is ‘False‘

work page
[32]

Since we have ‘False‘,construct any trivial ‘x‘ (e.g., constant embedding to ‘⟨1, fun _ => 0⟩‘)

work page
[33]

The conditions on ‘x‘are vacuously true (‘False → anything‘). ### Lean ‘have‘Statements ‘‘‘lean4 theorem imosl_2011_c6 (W : Z → Fin 2) (n : N+) (N : N) (hN : 2 ^ n.1 < N) (hW : Function.Periodic W N) (hW’ :∀ N’ <N, ¬ Function.Periodic W N’) : ∃ (x : Fin n ↩→ (Σ k, Fin k → Fin 2)), (∀ i, (x i).1 ≠ 0) ∧ (∀ i, ubiquitous W (x i)) := by −− Step 1: Derive Fals...

work page
[34]

mathd_algebra_247: /−− Let 𝑡 = 2𝑠 − 𝑠2 and 𝑠 = 𝑛2 − 2𝑛 + 1. What is the value of 𝑡 when 𝑛 = 3? Show that it is 0.−/ theorem mathd_algebra_247 (t s : R) (n : Z) (h0 : t = 2 * s − s ^ 2) (h1 : s = n ^ 2 − 2 ^ n + 1) (n) (_ : n = 3) : t = 0 := by sorry 38 −− revise to theorem mathd_algebra_247 (t s : R) (n : Z) (h0 : t = 2 * s − s ^ 2) (h1 : s = n ^ 2 − 2 ^ ...

work page
[35]

induction_sum_odd: /−− Show that for positive integer 𝑛, Í𝑛−1 𝑘=0 (2𝑘 + 1) = 𝑛2.−/ theorem induction_sum_odd (n : N) : (Í k in Finset.range n, 2 * k) + 1 = n ^ 2 := by sorry −− revise to theorem induction_sum_odd (n : N) : (Í k in Finset.range n, (2 * k + 1)) = n ^ 2 := by sorry

work page
[36]

induction_prod1p1onk3le3m1onn: /−− Show that for any positive integer 𝑛, we have Î𝑛 𝑘=1 (1 + 1/𝑘3) ⩽ 3 − 1/𝑛.−/ theorem induction_prod1p1onk3le3m1onn (n : N) (h0 : 0 < n) : (Î k in Finset.Icc 1 n, 1 + (1 : R) / k ^ 3) ⩽ (3 : R) − 1 / ↑n := by sorry −− revise to theorem induction_prod1p1onk3le3m1onn (n : N) (h0 : 0 < n) : (Î k in Finset.Icc 1 n, (1 + (1 : ...

work page