Spectral-Aligned Pruning for Universal Error-Correcting Code Transformers

Dae-Young Yun; Hee-Youl Kwak; Sanghyeon Cho; Sang-Hyo Kim; Seong-Joon Park; Taewoo Park; Yongjune Kim

arxiv: 2602.01602 · v3 · submitted 2026-02-02 · 💻 cs.IT · math.IT

Spectral-Aligned Pruning for Universal Error-Correcting Code Transformers

Sanghyeon Cho , Taewoo Park , Seong-Joon Park , Dae-Young Yun , Hee-Youl Kwak , Sang-Hyo Kim , Yongjune Kim This is my paper

Pith reviewed 2026-05-16 08:51 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords error-correcting codestransformer decodersstructured pruningspectral graph analysisuniversal modelslow-rank adaptationbipartite graphschannel coding

0 comments

The pith

Spectral signatures from code bipartite graphs let one pruned transformer decoder handle many different error-correcting codes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Universal transformer-based decoders achieve good results across code families but carry high compute and memory costs that hinder deployment. The paper shows that the two largest adjacency eigenvalues of each code's bipartite graph form a compact signature that retrieves effective structured pruning masks from a shared library. These eigenvalues track degree scale, expansion, and distance bounds that matter for decoding. After the common backbone is pruned at the kernel level, small low-rank adapters restore per-code accuracy. The outcome is performance near that of separate per-code pruning, yet with large cuts in overall computation and storage.

Core claim

The central claim is that the two algebraically largest adjacency eigenvalues of the bipartite graph tied to an error-correcting code provide a lightweight two-dimensional signature sufficient to select structured pruning masks that can be reused across codes; low-rank adaptation then recovers decoding performance comparable to dedicated per-code pruning.

What carries the argument

The two algebraically largest adjacency eigenvalues of the code bipartite graph, serving as a compact spectral signature for retrieving shared pruning masks.

If this is right

Decoding performance stays comparable to dedicated per-code pruning across tested code families.
Kernel-level structured pruning produces large reductions in computational cost and model memory.
Only small code-specific low-rank adapter parameters must be stored after the shared backbone is pruned.
The two-eigenvalue signature performs as well as higher-dimensional spectral features for mask selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Graph-spectrum matching may predict pruning compatibility without running full decoding simulations for each candidate mask.
The same alignment idea could guide pruning in other neural models that operate on graph-structured data such as networks or molecules.
Testing the signature on codes of widely varying lengths and rates would show where the proxy breaks.

Load-bearing premise

The two largest adjacency eigenvalues reliably indicate which pruning mask will preserve decoding quality for a given code.

What would settle it

Finding two codes with nearly identical leading eigenvalues where applying the same pruning mask and adaptation produces clearly worse error rates on one code than on the other.

read the original abstract

Universal channel decoders based on transformers-such as the Foundation Error Correction Code Transformer (FECCT)-achieve competitive decoding performance across diverse code families with a single shared backbone, optionally followed by code-specific finetuning. However, the high computational complexity and large parameter footprint of FECCT present substantial obstacles to practical deployment. To address these challenges, we investigate structured pruning for FECCT and propose Spectral-Aligned Pruning (SAP), a structure-aware framework that enables cross-code reuse of structured pruning masks by leveraging the spectrum of the corresponding bipartite graph. SAP is grounded in classical graph analysis of codes: the two algebraically largest adjacency eigenvalues provide compact spectral proxies for degree scale, expansion ratio, and minimum-distance lower bounds. These quantities are directly relevant to decoding performance: degree scale reflects how densely codeword bits and parity checks are connected; expansion ratio influences how information propagates across the bipartite graph; and minimum distance characterizes codeword separation. Based on this connection, SAP uses these two leading eigenvalues as a lightweight code signature for pruning-mask retrieval. Empirically, this two-dimensional signature yields stable library selection equivalent to higher-dimensional spectral signatures in our evaluation. After pruning, SAP performs per-code recovery via parameter-efficient low-rank adaptation (LoRA), enabling a shared pruned backbone while storing only small code-specific adapter parameters. Experiments across diverse codes show that SAP achieves decoding performance comparable to dedicated per-code pruning, while enabling substantial reductions in computational cost and model memory footprint through kernel-level structured pruning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Spectral-Aligned Pruning (SAP) for the Foundation Error-Correcting Code Transformer (FECCT). It derives a two-dimensional code signature from the algebraically largest adjacency eigenvalues of the code bipartite graph, uses this signature to retrieve kernel-level structured pruning masks from a precomputed library for cross-code reuse, and applies low-rank adaptation (LoRA) for per-code recovery. The central empirical claim is that SAP matches the decoding performance of dedicated per-code pruning while substantially reducing computational cost and model memory footprint.

Significance. If the two-eigenvalue proxy reliably transfers effective pruning masks, the method would provide a lightweight, graph-theoretic route to efficient universal decoders, reducing the deployment cost of transformer-based decoders across code families. The grounding in classical expander and distance properties of bipartite graphs is a conceptual strength that could generalize beyond the evaluated codes.

major comments (3)

[§3.1] §3.1: The argument that the two largest adjacency eigenvalues suffice as proxies for pruning-mask retrieval rests on their correlation with degree, expansion, and minimum distance, yet supplies no quantitative bound or similarity metric showing that spectral proximity in this 2-D space implies comparable optimal pruning structure; other invariants (girth, trapping-set spectrum) are not ablated against.
[Table 3] Table 3 and §4.2: Reported BER/FER curves for SAP versus per-code pruning are presented as comparable, but the tables lack error bars, seed-averaged runs, or statistical tests; without these, it is impossible to determine whether observed gaps fall within experimental variability or undermine the cross-code claim.
[§4.3] §4.3: The ablation demonstrating that the 2-D signature yields stable library selection equivalent to higher-dimensional spectra is useful, but does not test whether the selected masks actually preserve the same decoding graph properties (e.g., expansion after pruning) that the signature is meant to capture.

minor comments (2)

[§2.1] The notation for the adjacency matrix and its eigenvalues is introduced without an explicit reference to the standard bipartite-graph construction for linear codes; a short reminder equation would improve readability.
[Figure 2] Figure 2 caption does not state the precise pruning ratio or the code parameters used for the visualized masks, making direct comparison with the numerical tables difficult.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We provide point-by-point responses to the major comments and indicate the revisions we will implement.

read point-by-point responses

Referee: [§3.1] The argument that the two largest adjacency eigenvalues suffice as proxies for pruning-mask retrieval rests on their correlation with degree, expansion, and minimum distance, yet supplies no quantitative bound or similarity metric showing that spectral proximity in this 2-D space implies comparable optimal pruning structure; other invariants (girth, trapping-set spectrum) are not ablated against.

Authors: We agree that a formal quantitative bound would provide stronger theoretical support. Our choice of the two leading eigenvalues is grounded in classical results: the largest eigenvalue relates directly to the average degree, and the spectral gap to expansion properties that bound minimum distance. Empirically, the 2D signature selects masks that match dedicated performance. To strengthen this, we will include an ablation study comparing the 2D signature against girth and trapping-set spectrum in the revised manuscript. revision: partial
Referee: Table 3 and §4.2: Reported BER/FER curves for SAP versus per-code pruning are presented as comparable, but the tables lack error bars, seed-averaged runs, or statistical tests; without these, it is impossible to determine whether observed gaps fall within experimental variability or undermine the cross-code claim.

Authors: We acknowledge the need for statistical validation. In the revised paper, we will report mean BER/FER with standard deviation over 5 random seeds and include statistical tests (e.g., t-tests) to show that performance differences are insignificant. This will confirm the comparability claim. revision: yes
Referee: §4.3: The ablation demonstrating that the 2-D signature yields stable library selection equivalent to higher-dimensional spectra is useful, but does not test whether the selected masks actually preserve the same decoding graph properties (e.g., expansion after pruning) that the signature is meant to capture.

Authors: We thank the referee for this suggestion. While matching decoding performance implies preservation of relevant properties, we will add measurements of post-pruning expansion ratios and connectivity metrics for the selected masks in the revised §4.3 to explicitly demonstrate that the spectral proxy maintains the intended graph characteristics. revision: yes

Circularity Check

0 steps flagged

No circularity: spectral proxies drawn from classical graph theory independent of pruning outcomes

full rationale

The derivation grounds the two-eigenvalue signature in standard bipartite-graph properties (degree scale, expansion ratio, minimum-distance bounds) drawn from classical coding theory, then uses the signature only for library retrieval of pruning masks followed by empirical validation and LoRA recovery. No equation or step redefines the pruning outcome in terms of the eigenvalues themselves, fits a parameter to a subset and renames it a prediction, or relies on a self-citation chain whose cited result is unverified outside the paper. The central performance claim therefore remains externally falsifiable via the reported experiments rather than tautological.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that leading eigenvalues capture decoding-relevant graph properties and on the empirical claim that the resulting masks plus LoRA suffice; both require full-text validation.

free parameters (1)

choice of exactly two leading eigenvalues
Selected as compact signature; the number two is presented as sufficient but could be tuned.

axioms (1)

domain assumption The two largest adjacency eigenvalues provide compact proxies for degree scale, expansion ratio, and minimum-distance lower bounds relevant to decoding performance.
Invoked directly in the abstract as grounding for the pruning-mask retrieval.

pith-pipeline@v0.9.0 · 5593 in / 1217 out tokens · 28600 ms · 2026-05-16T08:51:35.062892+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the two algebraically largest adjacency eigenvalues provide compact spectral proxies for degree scale, expansion ratio, and minimum-distance lower bounds

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.