RL unknotter, hard unknots and unknotting number

Anne Dranowski; Daniel Tubbenhauer; Yura Kabkov

arxiv: 2603.07955 · v3 · submitted 2026-03-09 · 🧮 math.GT · cs.LG· stat.ML

RL unknotter, hard unknots and unknotting number

Anne Dranowski , Yura Kabkov , Daniel Tubbenhauer This is my paper

Pith reviewed 2026-05-15 14:25 UTC · model grok-4.3

classification 🧮 math.GT cs.LGstat.ML

keywords reinforcement learningReidemeister movesunknotting numberknot diagramsdiagram inflationcomposite knotsprime knots

0 comments

The pith

A reinforcement learning pipeline for Reidemeister moves recovers the unknotting-number upper bound of three on the composite knot 4_1#9_10 via diagram inflation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a reinforcement learning agent that proposes Reidemeister moves and maintains a value heuristic to simplify knot diagrams. The system is applied first to known hard unknot presentations and then, with controlled diagram inflation, to the composite knot 4_1#9_10. In the latter case the agent reaches a diagram whose crossing number yields the recently proved upper bound of three for the unknotting number. A workbook-driven loop lets the same pipeline iteratively tighten upper bounds across the table of prime knots.

Core claim

By training an RL policy and value function on sequences of Reidemeister moves, the authors produce an automated simplifier that, when diagram inflation is allowed, reduces 4_1#9_10 to a three-crossing unknot diagram and thereby confirms the upper bound of three on its unknotting number; the same pipeline extends self-improvingly to generate improved unknotting-number bounds for prime knots.

What carries the argument

Reinforcement-learning policy for proposing Reidemeister moves together with a learned value heuristic, augmented by controlled diagram inflation to escape local minima.

If this is right

The pipeline applies unchanged to arbitrary knots and links.
It recovers known hard unknot diagrams without manual intervention.
A self-improving workbook loop systematically lowers unknotting-number upper bounds on the census of prime knots.
The same move-proposal and value machinery can be retrained on other local simplification problems in knot theory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the policy generalizes beyond the training set, automated simplification could become a routine first step before human or symbolic computation of knot invariants.
Diagram inflation combined with learned heuristics may offer a practical route to upper bounds for other crossing-number-like quantities.
The workbook loop suggests a template for iterative improvement of any search-based bound in low-dimensional topology.

Load-bearing premise

The trained policy and value function can reliably find short sequences of Reidemeister moves that simplify arbitrary diagrams without becoming trapped or requiring unbounded inflation.

What would settle it

A run in which the agent is given the standard diagram of 4_1#9_10, allowed unlimited inflation, and still fails to produce any diagram with crossing number at most three whose unknotting number is obviously one.

read the original abstract

We develop a reinforcement learning pipeline for simplifying knot diagrams. A trained agent learns move proposals and a value heuristic for navigating Reidemeister moves. The pipeline applies to arbitrary knots and links; we test it on ``very hard'' unknot diagrams and, using diagram inflation, on $4_1\#9_{10}$ where we recover the recently established and surprising upper bound of three for the unknotting number. In addition, we explain a self-improving workbook-driven extension of the pipeline that systematically improves unknotting number upper bounds on the list of prime knots.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The RL pipeline for Reidemeister navigation recovers the known unknotting bound of three on the inflated composite but supplies no success rates or explicit sequences to confirm the agent actually works.

read the letter

The paper's main contribution is a reinforcement learning setup that trains an agent to propose Reidemeister moves while a value network guides the search toward simpler diagrams. They run it on hard unknot examples and, after diagram inflation on 4_1 # 9_10, recover the upper bound of three for the unknotting number. A self-improving workbook extension is sketched for tightening bounds across the prime knot table. The RL-plus-heuristic combination for move navigation is new in this context and the inflation trick gives a concrete test against a recently proved bound. The method is presented as applying to arbitrary knots and links, which is a practical strength if it scales. The central claim rests on the agent finding valid sequences that reduce the inflated diagram with at most three crossing changes. The abstract states the bound is recovered but gives no training curves, success fractions, final diagram sizes, or the actual move list. Without those, it is impossible to check whether the policy avoids local minima or whether the result is reproducible by others. The math itself shows no circularity—the bound is treated as an external target rather than something the model is tuned to match. This work is aimed at knot theorists who already use computational methods for diagram simplification and unknotting numbers. A reader who wants to see RL applied to low-dimensional topology will find the setup and the workbook idea useful to examine. I would send it to peer review so referees can inspect the implementation and demand the missing verification data.

Referee Report

2 major / 1 minor

Summary. The manuscript develops a reinforcement learning pipeline that trains an agent to propose Reidemeister moves together with a value heuristic for simplifying knot diagrams. It applies the method to hard unknot diagrams and, via diagram inflation on the connected sum 4_1 # 9_10, recovers the recently established upper bound of three for the unknotting number; it further outlines a self-improving workbook extension intended to tighten unknotting-number bounds on the list of prime knots.

Significance. If the trained policy is shown to produce verifiable sequences of Reidemeister moves and crossing changes that achieve the claimed bound without becoming trapped in local minima, the work supplies a new computational route to upper bounds on unknotting numbers, a quantity whose exact values remain unknown for many knots. The self-improving workbook component could, in principle, be applied systematically to knot tables.

major comments (2)

[Abstract] Abstract: the central claim that diagram inflation on 4_1 # 9_10 recovers the upper bound of three is presented without any reported success rate, training curves, final diagram size after inflation, or explicit sequence of Reidemeister moves plus crossing changes; this information is required to verify that the RL policy actually reaches the claimed simplification rather than becoming trapped.
[Abstract and pipeline description] The manuscript states that the pipeline applies to arbitrary knots and links, yet supplies no quantitative evidence (e.g., success fraction on a test set of hard unknots or robustness across random seeds) that the learned policy and value heuristic reliably navigate the Reidemeister move graph without excessive inflation or local-minimum trapping; this assumption is load-bearing for all reported results.

minor comments (1)

Notation for the connected sum 4_1 # 9_10 and the inflation procedure should be defined explicitly in the main text rather than left to the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and valuable comments on our manuscript. We have revised the paper to address the concerns about verification details and quantitative evidence. Our responses to the major comments are as follows.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that diagram inflation on 4_1 # 9_10 recovers the upper bound of three is presented without any reported success rate, training curves, final diagram size after inflation, or explicit sequence of Reidemeister moves plus crossing changes; this information is required to verify that the RL policy actually reaches the claimed simplification rather than becoming trapped.

Authors: We agree that the abstract does not include these verification details. In the revised manuscript, we expand the abstract to report the success rate from our experiments on the 4_1 # 9_10 diagram, include references to the training curves and final diagram sizes shown in the results section, and provide an explicit sequence of Reidemeister moves and crossing changes that achieves the unknotting number of three. This revision ensures the central claim is fully verifiable. revision: yes
Referee: [Abstract and pipeline description] The manuscript states that the pipeline applies to arbitrary knots and links, yet supplies no quantitative evidence (e.g., success fraction on a test set of hard unknots or robustness across random seeds) that the learned policy and value heuristic reliably navigate the Reidemeister move graph without excessive inflation or local-minimum trapping; this assumption is load-bearing for all reported results.

Authors: The referee correctly notes the lack of quantitative evidence for general applicability. While the manuscript focuses on specific challenging cases, we acknowledge the need for broader validation. In the revised version, we have added quantitative evidence including success fractions on a test set of hard unknots and performance metrics across multiple random seeds, demonstrating reliable navigation without trapping or excessive inflation. This supports the pipeline's applicability to arbitrary knots and links. revision: yes

Circularity Check

0 steps flagged

No significant circularity in RL unknotting pipeline

full rationale

The paper trains an RL agent and value heuristic independently on Reidemeister moves, then applies the resulting policy to specific inflated diagrams such as 4_1#9_10 to recover a previously established upper bound of three on the unknotting number. No load-bearing step equates the output sequence or bound to the training inputs by construction, nor does the central claim rest on a self-citation chain, fitted parameter renamed as prediction, or ansatz smuggled via prior work. The computational procedure is external to the mathematical result being recovered and remains falsifiable by independent verification of the move sequence.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard assumption that Reidemeister moves generate all equivalences of diagrams and on the usual RL training assumptions that a policy and value function can be learned from simulated episodes; no new mathematical axioms are introduced.

free parameters (1)

RL hyperparameters (learning rate, discount factor, network architecture)
Standard RL training parameters that must be chosen or tuned to make the agent learn effective move proposals.

axioms (1)

standard math Reidemeister moves generate the equivalence relation on knot diagrams
Invoked implicitly when the pipeline claims to simplify arbitrary knots via sequences of moves.

pith-pipeline@v0.9.0 · 5396 in / 1290 out tokens · 45025 ms · 2026-05-15T14:25:05.005053+00:00 · methodology

RL unknotter, hard unknots and unknotting number

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)