Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

Erik B Dam; Jonathan Wensh{\o}j; Pedram Bakhtiarifard; Raghavendra Selvan; Tong Chen

arxiv: 2602.14896 · v2 · pith:VSWEP6CPnew · submitted 2026-02-16 · 💻 cs.LG

Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

Pedram Bakhtiarifard , Tong Chen , Jonathan Wensh{\o}j , Erik B Dam , Raghavendra Selvan This is my paper

Pith reviewed 2026-05-21 12:05 UTC · model grok-4.3

classification 💻 cs.LG

keywords neural network compressionKolmogorov complexitymodel parameterizationalgorithmic simplicityreusable motifsmosaic structure

0 comments

The pith

A constrained parameterization with reusable motifs reduces neural network algorithmic complexity during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper starts from the observation that trained neural network weights show more structure and thus lower algorithmic complexity than random initializations. It introduces Mosaic-of-Motifs, a parameterization that splits weights into fixed-size blocks and forces each block to come from a small dictionary of reusable motifs arranged according to a reuse pattern. This constraint is meant to steer gradient descent toward solutions that are simpler to describe. Multiple experiments indicate that the resulting models reach lower complexity estimates than standard training while matching task accuracy. The work therefore treats compressibility as something that can be encouraged throughout optimization rather than recovered afterward.

Core claim

By defining the Kolmogorov complexity K(w) of the weight vector w, the authors show that replacing the free parameterization with a block-wise selection from k reusable motifs produces optimization paths whose final weights exhibit measurably lower algorithmic complexity; empirical runs confirm that this reduction occurs without degrading performance relative to unconstrained baselines.

What carries the argument

The Mosaic-of-Motifs parameterization, which partitions weights into blocks of size s and restricts each block to one of k reusable motifs via a mosaic reuse pattern, thereby biasing search toward lower-Kolmogorov-complexity assignments.

Load-bearing premise

The constrained parameterization using blocks and reusable motifs actually reduces the true Kolmogorov complexity of the weight vector rather than merely changing a proxy or measurement that is tied to the parameterization itself.

What would settle it

After training identical architectures with and without MoMos, an estimate of Kolmogorov complexity (or a faithful proxy such as description length) for the MoMos weights is not lower than the unconstrained weights, or the MoMos model shows a clear drop in task accuracy.

read the original abstract

Large-scale deep learning models are well-suited for compression. Across a variety of tasks, methods like pruning, quantization, and knowledge distillation have been used to achieve massive reductions in model parameters with only marginal performance drops. This raises the central question: *Why are deep neural networks suited for compression?* In this work, we take up the perspective of algorithmic complexity to explain this behavior. We hypothesize that the parameters of trained models have more structure and, hence, exhibit lower algorithmic complexity compared to the weights at (random) initialization. Furthermore, model compression methods harness this reduced algorithmic complexity to compress models. Although an unconstrained parameterization of model weights, $\mathbf{w} \in \mathbb{R}^n$, can represent arbitrary weight assignments, the solutions found during training exhibit repeatability and structure, making them simpler to implement than a trivial program. To this end, we formalize the Kolmogorov complexity of $\mathbf{w}$ by $\mathcal{K}(\mathbf{w})$. We introduce a constrained parameterization $\widehat{\mathbf{w}}$ that partitions parameters into blocks of size $s$ and restricts each block to be selected from a set of $k$ reusable motifs, specified by a reuse pattern (or mosaic). The resulting method, $\mathit{Mosaic\text{-}of\text{-}Motifs}$ (MoMos), provides a theoretically justified parameterization that biases optimization toward algorithmically simpler solutions. Empirical evidence from multiple experiments shows that MoMos consistently lowers the algorithmic complexity of neural networks during training while preserving the performance of unconstrained models. These results suggest that parameter compressibility is not only observed after training, but can be induced from the optimization domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Mosaic-of-Motifs (MoMos), a constrained parameterization that partitions neural network weights into blocks of size s drawn from a set of k reusable motifs according to a specified reuse pattern (mosaic). It hypothesizes that trained weights exhibit lower Kolmogorov complexity K(w) than random initialization due to structure, and that MoMos biases optimization toward such simpler solutions. The authors claim that this approach reduces algorithmic complexity during training while preserving performance relative to unconstrained models, suggesting that compressibility can be induced from the optimization domain rather than observed only post-training.

Significance. If the central claim can be established with an independent measure of complexity reduction, the work would provide a theoretically motivated explanation for the compressibility of trained neural networks grounded in algorithmic information theory and a practical parameterization to encourage simpler solutions during training. The approach connects ideas from Kolmogorov complexity to model compression methods like pruning and quantization.

major comments (2)

[Abstract and §3] Abstract and §3 (MoMos parameterization): The central claim that MoMos lowers the true Kolmogorov complexity K(w) of the resulting weight vectors is load-bearing for the hypothesis, yet the constrained form directly upper-bounds description length by the cost of specifying the k motifs plus the mosaic indices and pattern. This creates a risk that any reported reduction is tautological to the parameterization choice rather than reflecting intrinsic simplicity independent of the block-and-motif model.
[Experimental results section] Experimental results section: The abstract states that 'empirical evidence from multiple experiments shows that MoMos consistently lowers the algorithmic complexity,' but the provided text gives no details on the proxy used for K(w), the baselines compared, quantitative effect sizes, or how complexity is measured independently of the MoMos description length. This leaves the empirical support for the central claim without verifiable grounding.

minor comments (1)

[§3] Notation: Clarify the relationship between the unconstrained w and the constrained hat{w} when discussing the reuse pattern; the mosaic specification should be explicitly tied to how it affects the optimization trajectory.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which highlight key areas for strengthening the presentation of our central claims. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (MoMos parameterization): The central claim that MoMos lowers the true Kolmogorov complexity K(w) of the resulting weight vectors is load-bearing for the hypothesis, yet the constrained form directly upper-bounds description length by the cost of specifying the k motifs plus the mosaic indices and pattern. This creates a risk that any reported reduction is tautological to the parameterization choice rather than reflecting intrinsic simplicity independent of the block-and-motif model.

Authors: We agree that the MoMos parameterization inherently supplies a compact description length via the k motifs, mosaic pattern, and assignment indices, and that this must be distinguished from an intrinsic reduction in K(w). Our hypothesis is that the optimization dynamics under this constraint discover reusable structure that yields good performance, and the resulting weights admit shorter descriptions than those found by unconstrained training. To address the tautology concern directly, we will revise §3 and the abstract to clarify this distinction and will add an independent proxy for complexity (compressed length of the serialized weight tensors under a general-purpose compressor such as zlib) that does not rely on the motif/mosaic encoding. This will allow readers to verify that MoMos-trained weights are more compressible in a representation-independent manner while preserving task performance. revision: yes
Referee: [Experimental results section] Experimental results section: The abstract states that 'empirical evidence from multiple experiments shows that MoMos consistently lowers the algorithmic complexity,' but the provided text gives no details on the proxy used for K(w), the baselines compared, quantitative effect sizes, or how complexity is measured independently of the MoMos description length. This leaves the empirical support for the central claim without verifiable grounding.

Authors: The referee is correct that the current experimental section does not supply these details. In the revised manuscript we will expand the section to describe the proxy (compressed size of the flattened weight tensors using a standard lossless compressor), the baselines (unconstrained dense networks and ablated motif-reuse variants), quantitative effect sizes (mean and standard deviation of complexity reduction across runs and architectures), and the independence of the measure (by applying the compressor directly to the numerical weight values rather than to the mosaic specification). We will also include corresponding tables and training-trajectory plots so that the empirical support is fully verifiable. revision: yes

Circularity Check

1 steps flagged

MoMos complexity reduction is tautological to the imposed mosaic parameterization

specific steps

self definitional [Abstract]
"We introduce a constrained parameterization ŵ that partitions parameters into blocks of size s and restricts each block to be selected from a set of k reusable motifs, specified by a reuse pattern (or mosaic). The resulting method, Mosaic-of-Motifs (MoMos), provides a theoretically justified parameterization that biases optimization toward algorithmically simpler solutions. Empirical evidence from multiple experiments shows that MoMos consistently lowers the algorithmic complexity of neural networks during training while preserving the performance of unconstrained models."

The parameterization is explicitly constructed around a small set of reusable motifs and a mosaic reuse pattern, which by definition supplies a compact program for the weights (specify the k motifs plus the indices and pattern). Any weight vector produced under MoMos therefore has lower description length under this exact model by construction. The claim that MoMos 'lowers the algorithmic complexity' is thus equivalent to the observation that the imposed constraint produces weights fitting the constraint, without an independent, parameterization-agnostic verification of reduced K(w).

full rationale

The paper hypothesizes that trained weights exhibit lower Kolmogorov complexity K(w) than random initialization and introduces MoMos as a constrained parameterization using reusable motifs and mosaics to bias toward simpler solutions. It then claims empirical evidence that MoMos lowers algorithmic complexity. Because the parameterization is defined precisely to admit a short description (motifs + reuse pattern), any measured reduction in complexity under a proxy tied to block/motif counts or mosaic description length reduces directly to the constraint itself rather than providing an independent demonstration that true K(w) is lower outside the model.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that trained weights possess lower algorithmic complexity, introduces two free parameters for the mosaic construction, and proposes a new parameterization without external falsifiable evidence for its effect on Kolmogorov complexity.

free parameters (2)

block size s
Size of each parameter block in the mosaic partitioning; chosen to enable reuse.
number of motifs k
Size of the reusable motif set from which blocks are drawn; controls the degree of constraint.

axioms (1)

domain assumption Trained neural network parameters exhibit more structure and therefore lower algorithmic complexity than weights at random initialization.
This hypothesis is stated explicitly as the starting point for the work and is required for the claim that compression methods exploit reduced complexity.

invented entities (1)

Mosaic-of-Motifs parameterization no independent evidence
purpose: Constrained form that partitions weights into blocks selected from a small set of reusable motifs to bias optimization toward simpler solutions.
Newly introduced constrained parameterization whose effect on true Kolmogorov complexity lacks independent verification outside the method itself.

pith-pipeline@v0.9.0 · 5843 in / 1409 out tokens · 83277 ms · 2026-05-21T12:05:33.348410+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis
cs.LG 2026-05 unverdicted novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.