Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does Not Improve Neural Network Training

Augustin Chan

arxiv: 2604.09234 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.AI· cs.NE

Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does Not Improve Neural Network Training

Augustin Chan This is my paper

Pith reviewed 2026-05-10 17:58 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NE

keywords King Wen sequenceI Chingstatistical propertiescurriculum orderingneural network traininggradient descentlearning rate modulationanti-habituation

0 comments

The pith

The King Wen sequence's distinctive statistical properties degrade neural network training performance rather than improving it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper first characterizes the King Wen sequence of 64 hexagrams through Monte Carlo tests against random permutations and identifies four properties that stand out: greater transition distances, negative lag-1 autocorrelation, balanced groups of four, and asymmetric within-pair versus between-pair distances. These traits appear to echo ideas from curriculum learning and curiosity-driven exploration, suggesting the sequence might serve as a fixed anti-habituation ordering that could stabilize or accelerate neural network training. The authors then run three experiments across two hardware platforms: modulating learning rates with the sequence, reordering training data by the sequence, and checking sensitivity across many random seeds. In every case the results are negative or neutral at best. The explanation offered is that the same high variance that produces the sequence's statistical distinctiveness also destabilizes gradient updates.

Core claim

The King Wen sequence exhibits four statistically significant properties relative to 100,000 random baselines: higher-than-random transition distance, negative lag-1 autocorrelation, yang-balanced groups of four, and asymmetric within-pair versus between-pair distances. When these properties are applied to neural network training, either by scaling learning rates according to the sequence or by ordering minibatches in King Wen order, performance either degrades at every tested amplitude or falls to the worst or near-worst among non-sequential orderings. A 30-seed sweep shows that only the King Wen-induced drop exceeds ordinary seed-to-seed variation. The authors attribute the outcome to the

What carries the argument

The King Wen sequence itself, treated as a fixed combinatorial ordering of 64 binary states whose transition variance is measured by permutation tests and then injected into learning-rate schedules and data curricula.

If this is right

King Wen learning-rate modulation reduces performance at every amplitude tested on both hardware platforms.
King Wen curriculum ordering ranks as the worst non-sequential schedule on one platform and within noise of random on the other.
Only the King Wen degradation exceeds the range of performance variation seen across 30 random seeds.
The sequence's elevated variance, the feature that distinguishes it statistically, produces instability in gradient-based optimization.
Anti-habituation realized by a static combinatorial ordering does not produce the same dynamics as effective curriculum or exploration strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Any future attempt to import ancient sequences into training pipelines would need explicit variance-reduction steps to avoid the instability observed here.
The gap between descriptive statistical signatures and functional optimization behavior suggests that direct transfer of combinatorial patterns is unlikely to succeed without domain-specific tuning.
Testing low-variance transformations that preserve the other three properties could isolate which feature, if any, carries training value.
The result points to a broader need to measure not only statistical distinctiveness but also gradient-stability metrics when proposing new ordering heuristics.

Load-bearing premise

That the four statistically distinctive properties are sufficiently similar to curriculum-learning and curiosity principles to predict better training outcomes when the sequence is used in optimization.

What would settle it

A controlled experiment in which King Wen ordering or modulation produces higher final accuracy or faster convergence than both random permutations and sequential ordering across repeated independent runs with matched compute.

read the original abstract

The King Wen sequence of the I-Ching (c. 1000 BC) orders 64 hexagrams -- states of a six-dimensional binary space -- in a pattern that has puzzled scholars for three millennia. We present a rigorous statistical characterization of this ordering using Monte Carlo permutation analysis against 100,000 random baselines. We find that the sequence has four statistically significant properties: higher-than-random transition distance (98.2nd percentile), negative lag-1 autocorrelation (p=0.037), yang-balanced groups of four (p=0.002), and asymmetric within-pair vs. between-pair distances (99.2nd percentile). These properties superficially resemble principles from curriculum learning and curiosity-driven exploration, motivating the hypothesis that they might benefit neural network training. We test this hypothesis through three experiments: learning rate schedule modulation, curriculum ordering, and seed sensitivity analysis, conducted across two hardware platforms (NVIDIA RTX 2060 with PyTorch and Apple Silicon with MLX). The results are uniformly negative. King Wen LR modulation degrades performance at all tested amplitudes. As curriculum ordering, King Wen is the worst non-sequential ordering on one platform and within noise on the other. A 30-seed sweep confirms that only King Wen's degradation exceeds natural seed variance. We explain why: the sequence's high variance -- the very property that makes it statistically distinctive -- destabilizes gradient-based optimization. Anti-habituation in a fixed combinatorial sequence is not the same as effective training dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper statistically characterizes the King Wen sequence of the I-Ching via Monte Carlo permutation analysis against 100,000 random baselines, identifying four significant properties: higher transition distance (98.2nd percentile), negative lag-1 autocorrelation (p=0.037), yang-balanced groups of four (p=0.002), and asymmetric within-pair vs. between-pair distances (99.2nd percentile). These motivate tests of whether the sequence benefits neural network training through learning-rate modulation, curriculum ordering, and 30-seed sensitivity experiments on two platforms (PyTorch on RTX 2060, MLX on Apple Silicon). All three experiments yield negative results, with King Wen modulation degrading performance at all amplitudes, curriculum ordering performing worst or within noise, and degradation exceeding seed variance; the authors attribute this to the sequence's high variance destabilizing gradient-based optimization.

Significance. If the empirical results hold, the work demonstrates that anti-habituation properties in a fixed combinatorial ordering do not translate into improved training dynamics for gradient descent, distinguishing static sequence statistics from the requirements of stable optimization. Strengths include the large-scale permutation baselines, consistent negative outcomes across platforms, and the 30-seed sweep that places the degradation outside natural variance. The negative finding is directly measured rather than derived from fitted parameters.

minor comments (2)

[Experiments] § on neural network experiments: provide explicit details on the model architectures, loss functions, datasets, batch sizes, and optimizer settings used in the LR modulation, curriculum, and seed-sweep trials to support full reproducibility.
[Statistical Analysis] Statistical properties section: the four properties are tested simultaneously; a brief note on whether any multiple-comparison adjustment was applied to the reported p-values (e.g., p=0.037) would strengthen the presentation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate and positive summary of our work, which correctly captures the statistical characterization, the negative experimental outcomes, and the distinction between static anti-habituation properties and gradient-based optimization requirements. We appreciate the noted strengths of the large-scale baselines, cross-platform consistency, and seed-sensitivity analysis. We agree with the minor_revision recommendation and will incorporate any editorial suggestions in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claims rest on direct empirical measurements: Monte Carlo permutation tests against 100,000 random baselines for the four statistical properties, and controlled training runs on standard optimizers across two platforms with seed sweeps. No equations, fitted parameters, or derivations are present that reduce to the inputs by construction. The interpretive link to curriculum learning is offered only as motivation and is not required for the validity of the negative experimental outcomes. No self-citations are load-bearing, and the work is self-contained against external random references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard statistical assumptions for permutation testing and empirical measurement of training dynamics; no free parameters are fitted to produce the reported properties or training results, and no new entities are postulated.

axioms (1)

domain assumption Uniform random permutations form an appropriate null distribution for assessing sequence distinctiveness
Invoked in the Monte Carlo analysis against 100,000 random baselines to establish statistical significance of the four properties.

pith-pipeline@v0.9.0 · 5572 in / 1444 out tokens · 55917 ms · 2026-05-10T17:58:18.235725+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking (D=3 forcing) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

four statistically significant properties: higher-than-random transition distance (98.2nd percentile), negative lag-1 autocorrelation (p=0.037), yang-balanced groups of four (p=0.002), and asymmetric within-pair vs. between-pair distances (99.2nd percentile)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J-cost uniqueness) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

King Wen LR modulation degrades performance at all tested amplitudes... high variance... destabilizes gradient-based optimization

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

How LR decay wastes your best data in curriculum-based pretraining, 2025

Anirudh Agarwal et al. How LR decay wastes your best data in curriculum-based pretraining, 2025

work page 2025
[2]

Curriculum learning

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 41–48. ACM, 2009

work page 2009
[3]

C. L. Philip Chen, Tong Zhang, Long Chen, and Sik Chung Tam. I-ching divination evolutionary algorithm and its convergence analysis.IEEE Transactions on Cybernetics, 47(1):1–12, January 2016

work page 2016
[4]

Conversational factor information retrieval model (ConFIRM), 2023

Stephen Choi, William Gazeley, Siu Ho Wong, and Tingting Li. Conversational factor information retrieval model (ConFIRM), 2023

work page 2023
[5]

Bellemare, Jacob Menick, Rémi Munos, and Koray Kavukcuoglu

Alex Graves, Marc G. Bellemare, Jacob Menick, Rémi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks. InProceedings of the 34th International Conference on Machine Learning (ICML), volume 70, pages 1311–1320. PMLR, 2017. 1https://github.com/augchan42/king-wen-agi-framework 7 Statistical Properties of the King Wen Sequence: An...

work page 2017
[6]

Bayesian surprise attracts human attention.Vision Research, 49(10):1295–1306, 2009

Laurent Itti and Pierre Baldi. Bayesian surprise attracts human attention.Vision Research, 49(10):1295–1306, 2009

work page 2009
[7]

Beyond random sampling: Curriculum learning for LM pretraining, 2025

Zhijie Jia et al. Beyond random sampling: Curriculum learning for LM pretraining, 2025

work page 2025
[8]

Explanation of binary arithmetic, which uses only the characters 0 and 1, with remarks on its usefulness and on what it gives us the meaning of the ancient Chinese figures of Fuxi

Gottfried Wilhelm Freiherr von Leibniz. Explication de l’arithmétique binaire, qui se sert des seuls caractères 0 et 1, avec des remarques sur son utilité, et sur ce qu’elle donne le sens des anciennes figures chinoises de fohy. Mémoires de l’Académie Royale des Sciences, pages 85–89, 1703. English translation: “Explanation of binary arithmetic, which use...

work page
[9]

An elementary introduction to information geometry.Entropy, 22(10):1100, September 2020

Frank Nielsen. An elementary introduction to information geometry.Entropy, 22(10):1100, September 2020

work page 2020
[10]

Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts

Jürgen Schmidhuber. Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science, 18(2):173–187, 2006

work page 2006
[11]

A survey on curriculum learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576, 2022

Xin Wang, Yudong Chen, and Wenwu Zhu. A survey on curriculum learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576, 2022

work page 2022
[12]

Tong Zhang, Chunyu Lei, Zongyan Zhang, Xian-Bing Meng, and C. L. Philip Chen. As-nas: Adaptive scalable neural architecture search with reinforced evolutionary algorithm for deep learning.IEEE Transactions on Evolutionary Computation, 25(5):840–854, February 2021. 8 Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does N...

work page 2021

[1] [1]

How LR decay wastes your best data in curriculum-based pretraining, 2025

Anirudh Agarwal et al. How LR decay wastes your best data in curriculum-based pretraining, 2025

work page 2025

[2] [2]

Curriculum learning

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 41–48. ACM, 2009

work page 2009

[3] [3]

C. L. Philip Chen, Tong Zhang, Long Chen, and Sik Chung Tam. I-ching divination evolutionary algorithm and its convergence analysis.IEEE Transactions on Cybernetics, 47(1):1–12, January 2016

work page 2016

[4] [4]

Conversational factor information retrieval model (ConFIRM), 2023

Stephen Choi, William Gazeley, Siu Ho Wong, and Tingting Li. Conversational factor information retrieval model (ConFIRM), 2023

work page 2023

[5] [5]

Bellemare, Jacob Menick, Rémi Munos, and Koray Kavukcuoglu

Alex Graves, Marc G. Bellemare, Jacob Menick, Rémi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks. InProceedings of the 34th International Conference on Machine Learning (ICML), volume 70, pages 1311–1320. PMLR, 2017. 1https://github.com/augchan42/king-wen-agi-framework 7 Statistical Properties of the King Wen Sequence: An...

work page 2017

[6] [6]

Bayesian surprise attracts human attention.Vision Research, 49(10):1295–1306, 2009

Laurent Itti and Pierre Baldi. Bayesian surprise attracts human attention.Vision Research, 49(10):1295–1306, 2009

work page 2009

[7] [7]

Beyond random sampling: Curriculum learning for LM pretraining, 2025

Zhijie Jia et al. Beyond random sampling: Curriculum learning for LM pretraining, 2025

work page 2025

[8] [8]

Explanation of binary arithmetic, which uses only the characters 0 and 1, with remarks on its usefulness and on what it gives us the meaning of the ancient Chinese figures of Fuxi

Gottfried Wilhelm Freiherr von Leibniz. Explication de l’arithmétique binaire, qui se sert des seuls caractères 0 et 1, avec des remarques sur son utilité, et sur ce qu’elle donne le sens des anciennes figures chinoises de fohy. Mémoires de l’Académie Royale des Sciences, pages 85–89, 1703. English translation: “Explanation of binary arithmetic, which use...

work page

[9] [9]

An elementary introduction to information geometry.Entropy, 22(10):1100, September 2020

Frank Nielsen. An elementary introduction to information geometry.Entropy, 22(10):1100, September 2020

work page 2020

[10] [10]

Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts

Jürgen Schmidhuber. Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science, 18(2):173–187, 2006

work page 2006

[11] [11]

A survey on curriculum learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576, 2022

Xin Wang, Yudong Chen, and Wenwu Zhu. A survey on curriculum learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576, 2022

work page 2022

[12] [12]

Tong Zhang, Chunyu Lei, Zongyan Zhang, Xian-Bing Meng, and C. L. Philip Chen. As-nas: Adaptive scalable neural architecture search with reinforced evolutionary algorithm for deep learning.IEEE Transactions on Evolutionary Computation, 25(5):840–854, February 2021. 8 Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does N...

work page 2021