Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does Not Improve Neural Network Training
Pith reviewed 2026-05-10 17:58 UTC · model grok-4.3
The pith
The King Wen sequence's distinctive statistical properties degrade neural network training performance rather than improving it.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The King Wen sequence exhibits four statistically significant properties relative to 100,000 random baselines: higher-than-random transition distance, negative lag-1 autocorrelation, yang-balanced groups of four, and asymmetric within-pair versus between-pair distances. When these properties are applied to neural network training, either by scaling learning rates according to the sequence or by ordering minibatches in King Wen order, performance either degrades at every tested amplitude or falls to the worst or near-worst among non-sequential orderings. A 30-seed sweep shows that only the King Wen-induced drop exceeds ordinary seed-to-seed variation. The authors attribute the outcome to the
What carries the argument
The King Wen sequence itself, treated as a fixed combinatorial ordering of 64 binary states whose transition variance is measured by permutation tests and then injected into learning-rate schedules and data curricula.
If this is right
- King Wen learning-rate modulation reduces performance at every amplitude tested on both hardware platforms.
- King Wen curriculum ordering ranks as the worst non-sequential schedule on one platform and within noise of random on the other.
- Only the King Wen degradation exceeds the range of performance variation seen across 30 random seeds.
- The sequence's elevated variance, the feature that distinguishes it statistically, produces instability in gradient-based optimization.
- Anti-habituation realized by a static combinatorial ordering does not produce the same dynamics as effective curriculum or exploration strategies.
Where Pith is reading between the lines
- Any future attempt to import ancient sequences into training pipelines would need explicit variance-reduction steps to avoid the instability observed here.
- The gap between descriptive statistical signatures and functional optimization behavior suggests that direct transfer of combinatorial patterns is unlikely to succeed without domain-specific tuning.
- Testing low-variance transformations that preserve the other three properties could isolate which feature, if any, carries training value.
- The result points to a broader need to measure not only statistical distinctiveness but also gradient-stability metrics when proposing new ordering heuristics.
Load-bearing premise
That the four statistically distinctive properties are sufficiently similar to curriculum-learning and curiosity principles to predict better training outcomes when the sequence is used in optimization.
What would settle it
A controlled experiment in which King Wen ordering or modulation produces higher final accuracy or faster convergence than both random permutations and sequential ordering across repeated independent runs with matched compute.
read the original abstract
The King Wen sequence of the I-Ching (c. 1000 BC) orders 64 hexagrams -- states of a six-dimensional binary space -- in a pattern that has puzzled scholars for three millennia. We present a rigorous statistical characterization of this ordering using Monte Carlo permutation analysis against 100,000 random baselines. We find that the sequence has four statistically significant properties: higher-than-random transition distance (98.2nd percentile), negative lag-1 autocorrelation (p=0.037), yang-balanced groups of four (p=0.002), and asymmetric within-pair vs. between-pair distances (99.2nd percentile). These properties superficially resemble principles from curriculum learning and curiosity-driven exploration, motivating the hypothesis that they might benefit neural network training. We test this hypothesis through three experiments: learning rate schedule modulation, curriculum ordering, and seed sensitivity analysis, conducted across two hardware platforms (NVIDIA RTX 2060 with PyTorch and Apple Silicon with MLX). The results are uniformly negative. King Wen LR modulation degrades performance at all tested amplitudes. As curriculum ordering, King Wen is the worst non-sequential ordering on one platform and within noise on the other. A 30-seed sweep confirms that only King Wen's degradation exceeds natural seed variance. We explain why: the sequence's high variance -- the very property that makes it statistically distinctive -- destabilizes gradient-based optimization. Anti-habituation in a fixed combinatorial sequence is not the same as effective training dynamics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper statistically characterizes the King Wen sequence of the I-Ching via Monte Carlo permutation analysis against 100,000 random baselines, identifying four significant properties: higher transition distance (98.2nd percentile), negative lag-1 autocorrelation (p=0.037), yang-balanced groups of four (p=0.002), and asymmetric within-pair vs. between-pair distances (99.2nd percentile). These motivate tests of whether the sequence benefits neural network training through learning-rate modulation, curriculum ordering, and 30-seed sensitivity experiments on two platforms (PyTorch on RTX 2060, MLX on Apple Silicon). All three experiments yield negative results, with King Wen modulation degrading performance at all amplitudes, curriculum ordering performing worst or within noise, and degradation exceeding seed variance; the authors attribute this to the sequence's high variance destabilizing gradient-based optimization.
Significance. If the empirical results hold, the work demonstrates that anti-habituation properties in a fixed combinatorial ordering do not translate into improved training dynamics for gradient descent, distinguishing static sequence statistics from the requirements of stable optimization. Strengths include the large-scale permutation baselines, consistent negative outcomes across platforms, and the 30-seed sweep that places the degradation outside natural variance. The negative finding is directly measured rather than derived from fitted parameters.
minor comments (2)
- [Experiments] § on neural network experiments: provide explicit details on the model architectures, loss functions, datasets, batch sizes, and optimizer settings used in the LR modulation, curriculum, and seed-sweep trials to support full reproducibility.
- [Statistical Analysis] Statistical properties section: the four properties are tested simultaneously; a brief note on whether any multiple-comparison adjustment was applied to the reported p-values (e.g., p=0.037) would strengthen the presentation.
Simulated Author's Rebuttal
We thank the referee for the accurate and positive summary of our work, which correctly captures the statistical characterization, the negative experimental outcomes, and the distinction between static anti-habituation properties and gradient-based optimization requirements. We appreciate the noted strengths of the large-scale baselines, cross-platform consistency, and seed-sensitivity analysis. We agree with the minor_revision recommendation and will incorporate any editorial suggestions in the revised manuscript.
Circularity Check
No significant circularity identified
full rationale
The paper's central claims rest on direct empirical measurements: Monte Carlo permutation tests against 100,000 random baselines for the four statistical properties, and controlled training runs on standard optimizers across two platforms with seed sweeps. No equations, fitted parameters, or derivations are present that reduce to the inputs by construction. The interpretive link to curriculum learning is offered only as motivation and is not required for the validity of the negative experimental outcomes. No self-citations are load-bearing, and the work is self-contained against external random references.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Uniform random permutations form an appropriate null distribution for assessing sequence distinctiveness
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking (D=3 forcing) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
four statistically significant properties: higher-than-random transition distance (98.2nd percentile), negative lag-1 autocorrelation (p=0.037), yang-balanced groups of four (p=0.002), and asymmetric within-pair vs. between-pair distances (99.2nd percentile)
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J-cost uniqueness) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
King Wen LR modulation degrades performance at all tested amplitudes... high variance... destabilizes gradient-based optimization
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
How LR decay wastes your best data in curriculum-based pretraining, 2025
Anirudh Agarwal et al. How LR decay wastes your best data in curriculum-based pretraining, 2025
work page 2025
-
[2]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 41–48. ACM, 2009
work page 2009
-
[3]
C. L. Philip Chen, Tong Zhang, Long Chen, and Sik Chung Tam. I-ching divination evolutionary algorithm and its convergence analysis.IEEE Transactions on Cybernetics, 47(1):1–12, January 2016
work page 2016
-
[4]
Conversational factor information retrieval model (ConFIRM), 2023
Stephen Choi, William Gazeley, Siu Ho Wong, and Tingting Li. Conversational factor information retrieval model (ConFIRM), 2023
work page 2023
-
[5]
Bellemare, Jacob Menick, Rémi Munos, and Koray Kavukcuoglu
Alex Graves, Marc G. Bellemare, Jacob Menick, Rémi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks. InProceedings of the 34th International Conference on Machine Learning (ICML), volume 70, pages 1311–1320. PMLR, 2017. 1https://github.com/augchan42/king-wen-agi-framework 7 Statistical Properties of the King Wen Sequence: An...
work page 2017
-
[6]
Bayesian surprise attracts human attention.Vision Research, 49(10):1295–1306, 2009
Laurent Itti and Pierre Baldi. Bayesian surprise attracts human attention.Vision Research, 49(10):1295–1306, 2009
work page 2009
-
[7]
Beyond random sampling: Curriculum learning for LM pretraining, 2025
Zhijie Jia et al. Beyond random sampling: Curriculum learning for LM pretraining, 2025
work page 2025
-
[8]
Gottfried Wilhelm Freiherr von Leibniz. Explication de l’arithmétique binaire, qui se sert des seuls caractères 0 et 1, avec des remarques sur son utilité, et sur ce qu’elle donne le sens des anciennes figures chinoises de fohy. Mémoires de l’Académie Royale des Sciences, pages 85–89, 1703. English translation: “Explanation of binary arithmetic, which use...
-
[9]
An elementary introduction to information geometry.Entropy, 22(10):1100, September 2020
Frank Nielsen. An elementary introduction to information geometry.Entropy, 22(10):1100, September 2020
work page 2020
-
[10]
Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts
Jürgen Schmidhuber. Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science, 18(2):173–187, 2006
work page 2006
-
[11]
Xin Wang, Yudong Chen, and Wenwu Zhu. A survey on curriculum learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576, 2022
work page 2022
-
[12]
Tong Zhang, Chunyu Lei, Zongyan Zhang, Xian-Bing Meng, and C. L. Philip Chen. As-nas: Adaptive scalable neural architecture search with reinforced evolutionary algorithm for deep learning.IEEE Transactions on Evolutionary Computation, 25(5):840–854, February 2021. 8 Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does N...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.