Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models

Marshall Brett

arxiv: 2604.06767 · v1 · submitted 2026-04-08 · 💻 cs.LG · cs.CL

Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models

Marshall Brett This is my paper

Pith reviewed 2026-05-10 17:41 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords Voronoi tessellationmargin refinementFisher informationlarge language modelsexpressibility gapgeometric polishinglatent manifoldstoken margins

0 comments

The pith

Fisher MRP reshapes the Voronoi tessellation of LLM manifolds by widening token margins with constant damage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how large language models create a Voronoi tessellation over their continuous representation space from discrete tokens. It validates an existing scaling law for the expressibility gap and then demonstrates that short post-training margin refinement procedures can reshape this tessellation. Specifically, Fisher information-based refinement improves decision margins by 28 percent at higher intervention strengths while keeping collateral damage fixed and benchmarks unchanged, unlike direct margin maximization which increases damage as strength grows. This suggests a method to geometrically polish model outputs without retraining the entire network.

Core claim

The Voronoi tessellation in a converged language model can be refined through margin refinement procedures, where Fisher information distance maximization achieves a ceiling of approximately 16,300 correctable positions with constant damage of about 5,300 positions across a range of intervention strengths, preserving the linear scaling law of the expressibility gap and leaving downstream benchmarks invariant.

What carries the argument

Margin Refinement Procedures (MRP) using Fisher information distance maximization, which perform short post-hoc optimization to widen token-decision margins in the representation manifold's Voronoi cells.

If this is right

Both direct and Fisher MRP reach the same practical ceiling of correctable positions out of 256K evaluated.
The scaling law of the expressibility gap remains intact after geometric reorganization.
Gains from Fisher MRP concentrate in high-frequency structural tokens, reaching 84 percent of net corrections at higher strengths.
Direct margin maximization sees damage escalate with intervention strength, eventually overwhelming corrections.
Mid-layers exhibit geometric ambiguity with negative correlation to cross-entropy, while the final layer shows strong positive alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might extend to other architectures if their manifolds exhibit similar Voronoi structures and expressibility gaps.
Targeting uniformity of token-level benefits could require combining Fisher MRP with frequency-aware weighting during refinement.
Preservation of the scaling law implies that such polishing does not alter the fundamental geometric properties but only compresses the gap locally.
Applications could include improving reliability for common tokens in production models without full retraining cycles.

Load-bearing premise

The invariance of downstream benchmarks and the fixed level of collateral damage observed will continue to hold when the method is applied to different models, larger evaluation sets, or varying intervention parameters.

What would settle it

Running the Fisher MRP at lambda=0.6 on a different language model and observing either a proportional increase in damage beyond the constant 5,300 positions or a drop in downstream benchmark scores would disprove the viability claim.

read the original abstract

Language models operate on discrete tokens but compute in continuous vector spaces, inducing a Voronoi tessellation over the representation manifold. We study this tessellation empirically on Qwen3.5-4B-Base, making two contributions. First, using float32 margin recomputation to resolve bfloat16 quantization artifacts, we validate Mabrok's (2026) linear scaling law of the expressibility gap with $R^2$ = 0.9997 - the strongest confirmation to date - and identify a mid-layer geometric ambiguity regime where margin geometry is anti-correlated with cross-entropy (layers 24-28, $\rho$ = -0.29) before crystallizing into alignment at the final layer ($\rho$ = 0.836). Second, we show that the Voronoi tessellation of a converged model is reshapable through margin refinement procedures (MRP): short post-hoc optimization runs that widen token-decision margins without retraining. We compare direct margin maximization against Fisher information distance maximization across a dose-response sweep. Both methods find the same ceiling of ~16,300 correctable positions per 256K evaluated, but differ critically in collateral damage. Margin maximization damage escalates with intervention strength until corrections are overwhelmed. Fisher damage remains constant at ~5,300 positions across the validated range ($\lambda$ = 0.15-0.6), achieving +28% median margin improvement at $\lambda$ = 0.6 with invariant downstream benchmarks - a geometric reorganization that compresses the expressibility gap while preserving its scaling law. However, frequency and token-class audits reveal that gains concentrate in high-frequency structural tokens (84% of net corrections at $\lambda$ = 0.6), with content and entity-like contributions shrinking at higher $\lambda$. Fisher MRP is therefore a viable geometric polishing tool whose practical ceiling is set not by aggregate damage but by the uniformity of token-level benefit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fisher MRP lets you widen token margins in one LLM with steady collateral damage and a tight scaling-law check, but the gains skew toward high-frequency tokens and the setup is narrow.

read the letter

The main thing here is that the paper shows how to reshape the Voronoi tessellation around token embeddings after training by maximizing Fisher information distance instead of raw margins. On Qwen3.5-4B-Base this keeps collateral damage flat at roughly 5300 positions across a range of intervention strengths while delivering a 28 percent median margin lift and leaving benchmarks unchanged. It also gives the cleanest confirmation yet of Mabrok's scaling law, with an R squared of 0.9997, plus a clear picture of mid-layer geometric ambiguity that resolves into strong alignment by the final layer.

Referee Report

3 major / 2 minor

Summary. The paper empirically investigates the Voronoi tessellation induced by token embeddings in the latent manifold of Qwen3.5-4B-Base. It validates Mabrok's (2026) linear scaling law for the expressibility gap using float32 margin recomputation, achieving R²=0.9997, identifies a mid-layer anti-correlation regime (layers 24-28, ρ=-0.29) transitioning to final-layer alignment (ρ=0.836), and introduces margin refinement procedures (MRP) that compare direct margin maximization against Fisher-information distance maximization. Both MRP variants identify a shared ceiling of ~16,300 correctable positions per 256K tokens, but Fisher MRP maintains constant collateral damage (~5,300 positions) across λ=0.15-0.6 while delivering +28% median margin gain and invariant downstream benchmarks, with 84% of net corrections at λ=0.6 concentrating on high-frequency structural tokens. The conclusion is that Fisher MRP is a viable geometric polishing tool whose limit is set by non-uniform token-level benefit rather than aggregate damage.

Significance. If the constant-damage regime and token-concentration pattern generalize, the work offers a strong empirical anchor for the expressibility-gap scaling law and a practical post-hoc method for reshaping LLM decision boundaries without retraining. The side-by-side comparison of two MRP formulations, the layer-wise geometric analysis, and the frequency audit provide concrete evidence that the tessellation can be selectively refined while preserving overall scaling behavior. These elements would be of interest to researchers studying geometric properties of representation spaces and model editing techniques.

major comments (3)

[Scaling law validation] Abstract and scaling-law validation paragraph: the R²=0.9997 is reported without error bars, confidence intervals, number of fitted points, or explicit exclusion criteria for the linear fit; this makes it impossible to assess whether the 'strongest confirmation to date' claim is robust to sampling variation or outlier removal.
[MRP experiments] MRP dose-response results: the central claim that Fisher MRP's ceiling is set by non-uniform token benefit (not aggregate damage) rests on a single checkpoint (Qwen3.5-4B-Base) and a single 256K-position sample; the constant-damage regime (~5,300 positions across λ=0.15-0.6) and 84% high-frequency concentration at λ=0.6 are not shown to be intrinsic to the tessellation via held-out sets, cross-model replication, or distribution-shift tests, leaving the generalization of the uniformity limit unverified.
[Geometric analysis] Layer-wise correlation and MRP sections: the mid-layer anti-correlation (ρ=-0.29) and final-layer alignment (ρ=0.836) are reported independently of the MRP outcomes; no analysis demonstrates that these geometric signatures predict or explain the observed constant-damage behavior, so their role in the overall geometric interpretation remains unlinked.

minor comments (2)

The abstract states that downstream benchmarks remain invariant but provides neither the specific benchmark suite nor quantitative thresholds used to declare invariance.
The float32 margin recomputation procedure used to resolve bfloat16 artifacts is mentioned but not described in sufficient detail (e.g., which positions were recomputed or how many were affected) to allow reproduction.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below, along with planned revisions to address the concerns raised.

read point-by-point responses

Referee: [Scaling law validation] Abstract and scaling-law validation paragraph: the R²=0.9997 is reported without error bars, confidence intervals, number of fitted points, or explicit exclusion criteria for the linear fit; this makes it impossible to assess whether the 'strongest confirmation to date' claim is robust to sampling variation or outlier removal.

Authors: We agree that additional statistical details are necessary to support the robustness of the linear scaling law validation. We will revise the manuscript to report the number of fitted points used in the linear regression, include bootstrap-derived confidence intervals for the R² and the fit parameters, and add an analysis of robustness to outlier removal by recomputing the R² after excluding the most deviant points. These changes will allow for a better assessment of the claim's stability. revision: yes
Referee: [MRP experiments] MRP dose-response results: the central claim that Fisher MRP's ceiling is set by non-uniform token benefit (not aggregate damage) rests on a single checkpoint (Qwen3.5-4B-Base) and a single 256K-position sample; the constant-damage regime (~5,300 positions across λ=0.15-0.6) and 84% high-frequency concentration at λ=0.6 are not shown to be intrinsic to the tessellation via held-out sets, cross-model replication, or distribution-shift tests, leaving the generalization of the uniformity limit unverified.

Authors: The current study focuses on a detailed analysis of a single model and token sample due to the computational intensity of the MRP optimization procedure. We will incorporate an additional held-out token sample from the same model to confirm the stability of the constant-damage regime and the high-frequency token concentration. However, performing cross-model replications and distribution-shift experiments would necessitate significant new computational resources and fall outside the scope of this work. We will explicitly discuss this limitation in the revised manuscript and highlight the need for future multi-model studies to establish the generality of these geometric properties. revision: partial
Referee: [Geometric analysis] Layer-wise correlation and MRP sections: the mid-layer anti-correlation (ρ=-0.29) and final-layer alignment (ρ=0.836) are reported independently of the MRP outcomes; no analysis demonstrates that these geometric signatures predict or explain the observed constant-damage behavior, so their role in the overall geometric interpretation remains unlinked.

Authors: The layer-wise correlation analysis is intended to characterize the evolution of the Voronoi tessellation through the model's depth, serving as an independent geometric observation. The MRP is applied specifically to the final-layer embeddings. To address the lack of linkage, we will add a discussion section that explores potential relationships, such as how the final-layer alignment might contribute to the observed stability in Fisher MRP damage. While we cannot provide a direct predictive model without additional experiments, we will clarify that these elements together provide a more complete picture of the manifold's geometric properties. revision: partial

standing simulated objections not resolved

Cross-model replication and distribution-shift tests for the MRP results, as these require experiments on additional models and datasets not present in the current study.

Circularity Check

0 steps flagged

No circularity: empirical validation and optimization sweeps are independent of self-referential definitions or fitted inputs.

full rationale

The paper validates an external scaling law (Mabrok 2026) via direct margin recomputation on Qwen3.5-4B-Base, reporting R²=0.9997 and layer-wise correlations as observational findings. MRP comparisons (margin vs. Fisher maximization) are presented as outcomes of post-hoc optimization sweeps over λ ranges, with ceilings, damage counts, and token audits derived from the resulting data rather than from any equation that redefines its inputs. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear; the central claim about token-benefit uniformity follows from frequency audits on the experimental outputs, not from constructional equivalence.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard geometric definitions of Voronoi tessellation and on empirical optimization; no new physical entities are postulated and the scaling law is imported from prior literature.

free parameters (1)

lambda = 0.15-0.6
Intervention strength hyperparameter swept in the dose-response experiment (0.15-0.6).

axioms (1)

domain assumption Voronoi tessellation induced by token embeddings accurately captures the model's token-decision boundaries in latent space
Invoked throughout the geometric analysis of the representation manifold.

pith-pipeline@v0.9.0 · 5652 in / 1538 out tokens · 123848 ms · 2026-05-10T17:41:42.242243+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J uniqueness) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Voronoi margin m(h) = ℓ_{t*}(h) − ℓ_t(h); expressibility gap η(ε) = μ({h : m(h) < ε}) / vol(M) obeys linear scaling η(ε) = α·ε + O(ε²) (Theorem 2.3 / Mabrok 10.5); Fisher distance d_Fisher(w_i, w_j; h)² = (w̃_i − w̃_j)^T G(h) (w̃_i − w̃_j) with G(h) = W^T Σ_p W
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Fisher MRP keeps R→W damage constant (~5,300) across λ_MRP = 0.15–0.6 while widening margins; gains concentrate in high-frequency structural tokens (84 % at λ=0.6)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Methodological:We identify and resolve a bfloat16 margin quantization artifact that sys- tematically depresses R2 in the log-log scaling regression, and provide a float32 recomputation technique that recovers R2 = 0.9997

work page
[2]

Analytical:We characterize the layer-wise evolution of Voronoi geometry, discovering that geometric regularization is redundant with cross-entropy at the output layer but anti-correlated at layers 24-28 — a mid-layer geometric ambiguity regime

work page
[3]

However, damage escalates super-linearly withλ_MRP, making the method destructive aboveλ_MRP≈0.3

Empirical — Margin Maximization:We demonstrate that direct margin maximization can reshape the Voronoi tessellation of a converged model in 200 gradient steps, achieving +53% median margin atλ_MRP=0.3 with a 2.1:1 correction-to-damage ratio. However, damage escalates super-linearly withλ_MRP, making the method destructive aboveλ_MRP≈0.3

work page
[4]

Empirical — Fisher Information Distance:We show that Fisher information distance maximization achieves comparable margin improvements (+28% atλ_MRP=0.6) with fun- damentally different damage characteristics: collateral damage is constant across the fully validatedλ_MRP range, downstream benchmarks are invariant throughλ_MRP=0.6, and additional runs atλ_MR...

work page arXiv 2026
[5]

Margins near zero, MRP uniformly high, no correlation with final-layer CE

Layers 4-20 (contextual integration):Voronoi geometry is meaningless. Margins near zero, MRP uniformly high, no correlation with final-layer CE

work page
[6]

When intermediate hidden states are projected through the final lm_head, positions that are ultimately predicted correctly often remain geometrically ambiguous at these depths

Layers 24-28 (mid-layer geometric ambiguity):CE-MRP correlation isnegative(Spear- man ρ= -0.29, n = 256,577 positions, p < 10−5). When intermediate hidden states are projected through the final lm_head, positions that are ultimately predicted correctly often remain geometrically ambiguous at these depths. Late layers then sharpen and linearize these state...

work page
[7]

geometrically accessible

Layers 31-32 (prediction crystallization):CE-MRP correlation is strongly positive (Spearmanρ= 0.836, n = 256,577, p < 10−5). Voronoi geometry aligns with prediction quality. MRP is largely redundant with CE. All correlations in this paper are Spearman rank correlations computed over the full 256,577-position evaluation set unless otherwise noted. At this ...

work page
[8]

Examples:,,.,",@,=,–-

Structural:all characters are Unicode punctuation (category P*) or symbols (category S*), or the token is whitespace-only. Examples:,,.,",@,=,–-. 2.Numeric:matchesˆ[0-9][0-9,./:%+-]*$. Examples:123,2023,3.14,50%. 11

work page 2023
[9]

Function word:pure alphabetic token whose lowercased form appears in a fixed 101-word set of articles, prepositions, pronouns, auxiliaries, and conjunctions (e.g.,the,of,is,they, which)

work page
[10]

Examples:Paris,John,USA,NLP

Entity-like:starts with a capital letter followed by lowercase letters (ˆ[A-Z][A-Za-z]+$) or is all-caps with 2+ letters (ˆ[A-Z]{2,}$). Examples:Paris,John,USA,NLP

work page
[11]

Content word:remaining pure alphabetic tokens (nouns, verbs, adjectives, adverbs not in the function-word set)

work page
[12]

high-value tail preserved, head cleaned up

Fragment/other:everything else (mixed alphanumeric, subword fragments with special characters). The full classification code and word list are inscripts/analysis/token_class_flip_audit.py. This is a coarse taxonomy rather than a linguistic gold standard, but it distinguishes punctua- tion/formatting cleanup from content-bearing gains. Approximately 33 pos...

work page 2017
[13]

Constant damage:The rotation follows the model’s natural geometry, so it doesn’t create distortion regardless of magnitude

work page
[14]

Increasing margin:Higher λ_MRP means further rotation along the same natural direction, not harder pushing along an unnatural one

work page
[15]

14 5.4 Revised Higher-λReading The main sweep (Section 3.4) covers Fisherλ_MRP through 0.6

Preserved per-band accuracy:Band transitions are “clean” — positions move to higher bands because their geometry genuinely supports wider margins after runner-up rotation, not because they were forced there. 14 5.4 Revised Higher-λReading The main sweep (Section 3.4) covers Fisherλ_MRP through 0.6. Two additional runs atλ_MRP = 1.0 and 2.0 extend the dose...

work page
[16]

Fisher’s geometric improvements continue beyond the primary sweep range

work page
[17]

Higher λ_MRP does not collapse benchmarks or the churn profile, but it does make the token-value distribution less uniform

work page
[18]

Gu and Z

The practical operating ceiling is determined less by aggregate damage and more by whether the added gains at highλ_MRP justify the growing concentration in structural/head-token space. Fisher λ_MRP is therefore not a free parameter. Geometry continued to improve throughλ_MRP = 2.0, but the practical cost shifted from obvious benchmark damage to a subtler...

work page arXiv 2024

[1] [1]

Methodological:We identify and resolve a bfloat16 margin quantization artifact that sys- tematically depresses R2 in the log-log scaling regression, and provide a float32 recomputation technique that recovers R2 = 0.9997

work page

[2] [2]

Analytical:We characterize the layer-wise evolution of Voronoi geometry, discovering that geometric regularization is redundant with cross-entropy at the output layer but anti-correlated at layers 24-28 — a mid-layer geometric ambiguity regime

work page

[3] [3]

However, damage escalates super-linearly withλ_MRP, making the method destructive aboveλ_MRP≈0.3

Empirical — Margin Maximization:We demonstrate that direct margin maximization can reshape the Voronoi tessellation of a converged model in 200 gradient steps, achieving +53% median margin atλ_MRP=0.3 with a 2.1:1 correction-to-damage ratio. However, damage escalates super-linearly withλ_MRP, making the method destructive aboveλ_MRP≈0.3

work page

[4] [4]

Empirical — Fisher Information Distance:We show that Fisher information distance maximization achieves comparable margin improvements (+28% atλ_MRP=0.6) with fun- damentally different damage characteristics: collateral damage is constant across the fully validatedλ_MRP range, downstream benchmarks are invariant throughλ_MRP=0.6, and additional runs atλ_MR...

work page arXiv 2026

[5] [5]

Margins near zero, MRP uniformly high, no correlation with final-layer CE

Layers 4-20 (contextual integration):Voronoi geometry is meaningless. Margins near zero, MRP uniformly high, no correlation with final-layer CE

work page

[6] [6]

When intermediate hidden states are projected through the final lm_head, positions that are ultimately predicted correctly often remain geometrically ambiguous at these depths

Layers 24-28 (mid-layer geometric ambiguity):CE-MRP correlation isnegative(Spear- man ρ= -0.29, n = 256,577 positions, p < 10−5). When intermediate hidden states are projected through the final lm_head, positions that are ultimately predicted correctly often remain geometrically ambiguous at these depths. Late layers then sharpen and linearize these state...

work page

[7] [7]

geometrically accessible

Layers 31-32 (prediction crystallization):CE-MRP correlation is strongly positive (Spearmanρ= 0.836, n = 256,577, p < 10−5). Voronoi geometry aligns with prediction quality. MRP is largely redundant with CE. All correlations in this paper are Spearman rank correlations computed over the full 256,577-position evaluation set unless otherwise noted. At this ...

work page

[8] [8]

Examples:,,.,",@,=,–-

Structural:all characters are Unicode punctuation (category P*) or symbols (category S*), or the token is whitespace-only. Examples:,,.,",@,=,–-. 2.Numeric:matchesˆ[0-9][0-9,./:%+-]*$. Examples:123,2023,3.14,50%. 11

work page 2023

[9] [9]

Function word:pure alphabetic token whose lowercased form appears in a fixed 101-word set of articles, prepositions, pronouns, auxiliaries, and conjunctions (e.g.,the,of,is,they, which)

work page

[10] [10]

Examples:Paris,John,USA,NLP

Entity-like:starts with a capital letter followed by lowercase letters (ˆ[A-Z][A-Za-z]+$) or is all-caps with 2+ letters (ˆ[A-Z]{2,}$). Examples:Paris,John,USA,NLP

work page

[11] [11]

Content word:remaining pure alphabetic tokens (nouns, verbs, adjectives, adverbs not in the function-word set)

work page

[12] [12]

high-value tail preserved, head cleaned up

Fragment/other:everything else (mixed alphanumeric, subword fragments with special characters). The full classification code and word list are inscripts/analysis/token_class_flip_audit.py. This is a coarse taxonomy rather than a linguistic gold standard, but it distinguishes punctua- tion/formatting cleanup from content-bearing gains. Approximately 33 pos...

work page 2017

[13] [13]

Constant damage:The rotation follows the model’s natural geometry, so it doesn’t create distortion regardless of magnitude

work page

[14] [14]

Increasing margin:Higher λ_MRP means further rotation along the same natural direction, not harder pushing along an unnatural one

work page

[15] [15]

14 5.4 Revised Higher-λReading The main sweep (Section 3.4) covers Fisherλ_MRP through 0.6

Preserved per-band accuracy:Band transitions are “clean” — positions move to higher bands because their geometry genuinely supports wider margins after runner-up rotation, not because they were forced there. 14 5.4 Revised Higher-λReading The main sweep (Section 3.4) covers Fisherλ_MRP through 0.6. Two additional runs atλ_MRP = 1.0 and 2.0 extend the dose...

work page

[16] [16]

Fisher’s geometric improvements continue beyond the primary sweep range

work page

[17] [17]

Higher λ_MRP does not collapse benchmarks or the churn profile, but it does make the token-value distribution less uniform

work page

[18] [18]

Gu and Z

The practical operating ceiling is determined less by aggregate damage and more by whether the added gains at highλ_MRP justify the growing concentration in structural/head-token space. Fisher λ_MRP is therefore not a free parameter. Geometry continued to improve throughλ_MRP = 2.0, but the practical cost shifted from obvious benchmark damage to a subtler...

work page arXiv 2024