pith. machine review for the scientific record. sign in

arxiv: 2603.23364 · v2 · submitted 2026-03-24 · 💻 cs.CR · cs.IT· math.IT

Recognition: 2 theorem links

· Lean Theorem

Canonical Byte-String Encoding for Finite-Ring Cryptosystems

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:31 UTC · model grok-4.3

classification 💻 cs.CR cs.ITmath.IT
keywords canonical byte encodingfinite-ring cryptosystemsbase-m codecrANS adaptationLean 4 formalizationexact decodingresidue listslength header
0
0 comments X

The pith

The base-m length codec provides a canonical mapping from byte strings shorter than 2^64 bytes to lists of residues modulo m with exact decoding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper focuses on the byte-to-residue encoding step required before algebraic operations in finite-ring cryptosystems. It defines the base-m length codec as a way to convert short byte strings into sequences of numbers between 0 and m-1. The codec adapts an existing rANS approach and includes formal proofs in Lean 4 verifying that encoding and decoding are inverses under the right conditions. Readers should care because a reliable canonical layer prevents ambiguity when feeding data into ring-based encryption. The accompanying Rust code and benchmarks show it works in practice.

Core claim

The base-m length codec is a canonical map from byte strings of length less than 2^64 to lists of residues modulo m. Decoding is exact for moduli satisfying the parameter bounds. The encoding includes a fixed-width header carrying the byte length, making it tolerant to appended valid suffix digits. Machine-checked proofs confirm fixed-width prefix inversion, payload-state bounds below 2^64, stream-level roundtrip correctness, and that every emitted symbol is a valid residue modulo m.

What carries the argument

The base-m length codec, an adaptation of rANS that prepends a fixed-width length header to ensure canonicity and exact invertibility.

If this is right

  • Decoding recovers the original byte string exactly whenever the modulus meets the specified bounds.
  • The fixed-width header allows correct decoding even if valid extra digits are appended after the encoded stream.
  • Every symbol produced is guaranteed to be a valid residue between 0 and m-1.
  • The formal Lean 4 proofs establish roundtrip correctness at the stream level.
  • Complexity analysis and practical considerations are provided for real-world deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This encoding layer could become a standard interface for feeding arbitrary data into finite-ring cryptographic primitives.
  • Similar length-prefixed adaptations might apply to other entropy coders beyond rANS for residue streams.
  • The machine-checked proofs suggest a path toward formally verified components in cryptographic libraries.
  • Hardware accelerators for residue arithmetic could directly implement this codec for efficiency gains.

Load-bearing premise

The modulus m must satisfy the paper's parameter bounds for exact decoding to hold, and the rANS adaptation must preserve the inversion properties beyond the formal model.

What would settle it

Finding any byte string shorter than 2^64 bytes and a valid modulus m where either the encoded residue list decodes to a different string or produces an invalid residue.

read the original abstract

Ring-mapping protocols need a canonical byte-to-residue layer before any algebraic encryption step can begin. This paper isolates that layer and presents the base-m length codec, a canonical map from byte strings of length less than 2^64 to lists of residues modulo m. The encoder builds on and adapts an rANS-based system proposed by Duda. Decoding is exact for all moduli satisfying the paper's parameter bounds. Because the encoding carries the byte length in its fixed-width header, decoding is also tolerant to appended valid suffix digits. The paper is accompanied by a Rust implementation of the described protocol, a Lean 4 formalization of the abstract codec with machine-checked proofs, and performance benchmarks. The Lean 4 formalization establishes fixed-width prefix inversion and payload-state bounds below 2^64, stream-level roundtrip correctness, and that every emitted symbol is a valid residue modulo m. We conclude with a complexity analysis and a discussion of practical considerations arising in real-world use of the codec.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents the base-m length codec as a canonical map from byte strings of length less than 2^64 to lists of residues modulo m. The encoder adapts Duda's rANS system, includes a fixed-width header carrying the byte length, and supports exact decoding (tolerant to appended valid suffixes) when m satisfies the paper's parameter bounds. The work is accompanied by a Rust implementation, a Lean 4 formalization with machine-checked proofs of fixed-width prefix inversion, payload-state bounds below 2^64, stream-level roundtrip correctness, and residue validity, plus performance benchmarks and complexity analysis.

Significance. If the parameter bounds hold and the rANS adaptation preserves the proven properties, the codec supplies a reliable, canonical byte-to-residue layer for ring-mapping protocols in finite-ring cryptosystems. The Lean 4 machine-checked proofs for inversion, bounds, roundtrip, and residue validity constitute a clear strength, raising the assurance level of the abstract model above typical pen-and-paper arguments.

major comments (2)
  1. [Abstract] Abstract: the central claim states that 'decoding is exact for all moduli satisfying the paper's parameter bounds' yet neither the abstract nor the opening sections state the bounds explicitly. This renders the applicability statement conditional on an opaque precondition and should be rectified by stating the bounds (e.g., the inequalities on m relative to 2^64) in the abstract and §2.
  2. [§3] §3 (rANS adaptation): the concrete Rust implementation uses an adapted rANS encoder, but the manuscript provides no formal argument or additional Lean lemma establishing that the adaptation preserves the machine-checked properties (prefix inversion, state < 2^64, roundtrip) outside the abstract model. Because the central claim concerns the running codec, this equivalence is load-bearing and must be supplied or clearly justified.
minor comments (2)
  1. [§5] §5 (complexity): the big-O statements for encoding and decoding could be accompanied by explicit constants or measured operation counts from the benchmarks to make the analysis more concrete.
  2. Notation: the distinction between the abstract codec (Lean) and the concrete rANS adaptation (Rust) is sometimes blurred in the text; consistent use of 'abstract model' versus 'implementation' would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and have revised the manuscript to improve clarity and justification where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim states that 'decoding is exact for all moduli satisfying the paper's parameter bounds' yet neither the abstract nor the opening sections state the bounds explicitly. This renders the applicability statement conditional on an opaque precondition and should be rectified by stating the bounds (e.g., the inequalities on m relative to 2^64) in the abstract and §2.

    Authors: We agree that the parameter bounds should be stated explicitly. The revised manuscript now includes the explicit bounds (m > 1 such that the rANS state remains strictly below 2^64 for all byte-string inputs of length less than 2^64) in the abstract and in §2, making the applicability statement self-contained. revision: yes

  2. Referee: [§3] §3 (rANS adaptation): the concrete Rust implementation uses an adapted rANS encoder, but the manuscript provides no formal argument or additional Lean lemma establishing that the adaptation preserves the machine-checked properties (prefix inversion, state < 2^64, roundtrip) outside the abstract model. Because the central claim concerns the running codec, this equivalence is load-bearing and must be supplied or clearly justified.

    Authors: The Lean 4 formalization establishes the properties for the abstract model. The Rust implementation is a direct algorithmic transcription of that model. We have added a new subsection in §3 that justifies the correspondence by showing the step-by-step alignment between the Lean definitions and the Rust code, together with the fact that the provided benchmarks and unit tests confirm the expected behavior. A machine-checked proof for the concrete Rust code would require a formal semantics for Rust, which lies outside the scope of this work; the direct mapping and empirical validation supply the necessary justification for the running codec. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on external rANS adaptation and independent Lean proofs

full rationale

The paper's derivation adapts Duda's external rANS system for the base-m length codec and supplies a separate Lean 4 formalization that machine-checks fixed-width prefix inversion, payload bounds below 2^64, stream roundtrip correctness, and residue validity. No self-citations appear in the load-bearing steps, no parameters are fitted then renamed as predictions, and no uniqueness theorems or ansatzes are imported from the authors' prior work. The parameter bounds are an explicit precondition for exact decoding rather than a self-definitional reduction. The central map is therefore supported by independent verification and external foundations, not by construction from its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim depends on the adaptation of Duda's rANS system and on the existence of parameter bounds that guarantee exact decoding; no new entities are postulated and no free parameters are fitted to data.

axioms (1)
  • domain assumption Properties of the rANS encoding system as proposed by Duda
    The codec builds on and adapts this existing system for the byte-to-residue mapping.

pith-pipeline@v0.9.0 · 5474 in / 1421 out tokens · 47828 ms · 2026-05-15T00:31:05.232197+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.