Leveraging Code Automorphisms for Improved Syndrome-Based Neural Decoding

Ahmad Ismail; Charbel Abdel Nour; Elsa Dupraz; Rapha\"el Le Bidan

arxiv: 2605.03620 · v1 · submitted 2026-05-05 · 💻 cs.IT · cs.LG· math.IT

Leveraging Code Automorphisms for Improved Syndrome-Based Neural Decoding

Rapha\"el Le Bidan , Ahmad Ismail , Elsa Dupraz , Charbel Abdel Nour This is my paper

Pith reviewed 2026-05-07 13:42 UTC · model grok-4.3

classification 💻 cs.IT cs.LGmath.IT

keywords syndrome-based neural decodingcode automorphismsdata augmentationmaximum-likelihood decodingerror-correcting codesdeep learning decodingshort block codes

0 comments

The pith

Code automorphisms used as data augmentation let syndrome-based neural decoders approach maximum-likelihood performance even with small training sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that applying the symmetry transformations of a linear code to generate additional syndrome-error examples during both training and inference improves the accuracy of existing syndrome-based neural decoding networks. For the short high-rate codes examined, the resulting models come close to the error-rate performance of exact maximum-likelihood decoding while requiring only modest amounts of original data. The authors further note that earlier neural decoders in the literature likely fell short of their potential because training was stopped too early. If the approach holds, it shows that the correction power of these neural decoders has been systematically underestimated and that simple symmetry-based expansion of the training distribution is sufficient to close most of the remaining gap to optimal decoding.

Core claim

By generating new training and test examples from the automorphisms of the code, syndrome-based neural decoders can be trained to map received syndromes to error patterns in a manner that closely matches the decisions of maximum-likelihood decoding, even when the original training set is small; the same augmentation applied at inference time further improves the output.

What carries the argument

Code automorphisms, the structure-preserving permutations of the code that map valid codewords to valid codewords, used to create augmented syndrome-error pairs for training and to produce multiple candidate corrections at inference time.

If this is right

For the tested codes, properly trained augmented models achieve block-error rates close to those of exact maximum-likelihood decoding.
Training sets that are only a small fraction of the size previously thought necessary become sufficient once automorphism-based examples are included.
Many earlier published syndrome-based neural decoders can be brought closer to optimal performance simply by continuing training longer and adding automorphism augmentation.
The gap between neural and maximum-likelihood decoding for short high-rate codes is largely a training issue rather than an inherent limitation of the network architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same augmentation technique could be tested on codes that possess fewer automorphisms to determine how much symmetry is required for the benefit to appear.
Combining automorphism augmentation with other forms of synthetic data generation might further reduce the amount of real channel data needed for training.
If the method scales, it suggests that neural decoders could be made competitive with classical decoders for a wider range of code lengths without increasing model size.

Load-bearing premise

The new examples created by code automorphisms must still reflect the true statistical relationship between syndromes and errors rather than introducing misleading patterns that the network learns instead of the actual maximum-likelihood rule.

What would settle it

On one of the short high-rate codes studied, run the augmented neural decoder against a true maximum-likelihood decoder on a large test set and measure whether the block-error rate of the neural model stays within a small gap of the MLD rate; a persistent large gap would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.03620 by Ahmad Ismail, Charbel Abdel Nour, Elsa Dupraz, Rapha\"el Le Bidan.

**Figure 1.** Figure 1: Syndrome-based neural decoder architecture and supervised training view at source ↗

**Figure 2.** Figure 2: Principle of code automorphism-based test-time augmentation. view at source ↗

**Figure 3.** Figure 3: FER for an ECCT trained to decode the (31, 21, 5) BCH code with either unique or automorphism-augmented fixed datasets (ML labels). selecting the most confident prediction (logit with maximum absolute value) for each bit. Taking the empirical average ℓ = 1 P PP j=1 ℓj of the predictions proved to be the best option. The resulting inference architecture is depicted in view at source ↗

**Figure 6.** Figure 6: FER performance for an ECCT trained to decode the view at source ↗

**Figure 7.** Figure 7: FER performance for ECCT decoding of the view at source ↗

read the original abstract

Syndrome-based neural decoding (SBND) has emerged as a promising deep learning approach for soft-decision decoding of high-rate, short-length codes. However, this approach still has substantial room for improvement. In this paper, we show how to leverage code automorphisms to enhance the ability of existing SBND models to learn and generalize through data augmentation during training and inference. As a result, for the short high-rate codes considered, we obtain models that closely approach MLD performance using small datasets and proper training. Our findings also suggest that many prior results for SBND models in the literature underestimate their true correction capability due to undertraining. Code to reproduce all results is available at: https://github.com/lebidan/sbnd.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that automorphism augmentation on syndromes and soft inputs lets SBND models reach near-MLD frame error rates on short high-rate codes with small training sets, and the public code makes the empirical claims checkable.

read the letter

This paper shows that code automorphisms can be used to augment both training and inference for syndrome-based neural decoders, producing models that get close to maximum-likelihood performance on short high-rate codes even with limited data. They also point out that earlier SBND work likely stopped training too soon and therefore reported weaker results than the method actually supports.

Referee Report

2 major / 1 minor

Summary. The paper introduces the use of code automorphisms to perform data augmentation for syndrome-based neural decoding (SBND) of short, high-rate linear codes. By applying automorphisms during training and inference, the authors demonstrate that neural decoders can achieve performance close to maximum-likelihood decoding (MLD) even with small training sets. They further suggest that previous SBND approaches in the literature suffered from undertraining, leading to suboptimal reported performance. The manuscript includes public code for reproducibility.

Significance. Should the empirical findings be confirmed, this work could have notable impact on the design of efficient soft-decision decoders for short codes, where training data is limited. The emphasis on proper training and the availability of reproduction code strengthen the contribution by addressing common issues in neural decoding research. However, the ultimate significance depends on whether the observed gains represent genuine approximation of MLD or are due to the augmentation introducing beneficial regularization effects.

major comments (2)

[Method section on automorphism augmentation] The central claim that automorphism-based augmentation leads to models approaching MLD performance relies on the assumption that the augmented (syndrome, soft-information) pairs preserve the likelihood ordering of error patterns for AWGN channels. The manuscript should include either a theoretical argument or targeted experiments (e.g., comparing likelihoods before and after transformation) to rule out that the network is learning spurious symmetries instead of the true mapping. Without this, the improvement could be explained by better coverage of the training distribution rather than convergence to optimal decoding.
[Experimental results and tables] The performance claims (e.g., FER curves approaching MLD) need to be supported by detailed reporting of data splits, training procedures, hyperparameter selection, and statistical significance across multiple runs. As the reader notes, absence of these details makes it difficult to exclude post-hoc tuning or selection bias in the reported gains.

minor comments (1)

[Abstract] The abstract mentions 'proper training' but does not define what constitutes proper training; this could be clarified to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the paper's clarity and rigor. We address each major comment below and will incorporate revisions accordingly.

read point-by-point responses

Referee: The central claim that automorphism-based augmentation leads to models approaching MLD performance relies on the assumption that the augmented (syndrome, soft-information) pairs preserve the likelihood ordering of error patterns for AWGN channels. The manuscript should include either a theoretical argument or targeted experiments (e.g., comparing likelihoods before and after transformation) to rule out that the network is learning spurious symmetries instead of the true mapping. Without this, the improvement could be explained by better coverage of the training distribution rather than convergence to optimal decoding.

Authors: We agree that explicit justification is valuable. Code automorphisms of linear codes are coordinate permutations preserving the code. For the memoryless AWGN channel, such permutations preserve the likelihood ordering of error patterns, as the joint probability depends only on the permuted LLR values. We will add a concise theoretical argument in the Methods section and include a targeted verification experiment showing that log-likelihoods are unchanged under the automorphisms. revision: yes
Referee: The performance claims (e.g., FER curves approaching MLD) need to be supported by detailed reporting of data splits, training procedures, hyperparameter selection, and statistical significance across multiple runs. As the reader notes, absence of these details makes it difficult to exclude post-hoc tuning or selection bias in the reported gains.

Authors: We will revise the experimental section to include complete details on data splits, training procedures (optimizer, schedules, early stopping), hyperparameter selection process, and statistical significance via averages and standard deviations over at least five independent runs. This addresses reproducibility and rules out selection bias. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical augmentation results are independent and reproducible

full rationale

The paper advances an empirical technique of using code automorphisms for data augmentation in syndrome-based neural decoding. Its claims rest on experimental outcomes (models approaching MLD performance on small datasets for short high-rate codes) rather than any mathematical derivation, fitted parameter renamed as prediction, or self-referential definition. No equations, uniqueness theorems, or load-bearing self-citations are invoked that would reduce the reported performance gains to the authors' own inputs by construction. The provided GitHub code enables external verification of the training procedure and results, satisfying the criteria for non-circular empirical support.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard machine-learning assumptions about data augmentation preserving label correctness and on the existence of non-trivial automorphisms for the chosen codes. No new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Automorphisms of the code can be used to generate valid augmented examples whose labels remain correct for the decoding task.
Implicit in the claim that augmentation improves generalization to MLD.

pith-pipeline@v0.9.0 · 5428 in / 1149 out tokens · 49898 ms · 2026-05-07T13:42:22.838238+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

[1]

Efficient decoders for short block length codes in 6G URLLC,

C. Yue, V . Miloslavskaya, M. Shirvanimoghaddam, B. Vucetic, and Y . Li, “Efficient decoders for short block length codes in 6G URLLC,” IEEE Commun. Mag., vol. 61, no. 4, pp. 84–90, 2023

2023
[2]

Soft-decision decoding of linear block codes based on ordered statistics,

M. Fossorier and S. Lin, “Soft-decision decoding of linear block codes based on ordered statistics,”IEEE Trans. Inform. Theory, vol. 41, no. 5, Sep. 1995

1995
[3]

Toward universal belief propagation decoding for short binary block codes,

Y . Shenet al., “Toward universal belief propagation decoding for short binary block codes,”IEEE J. Selec. Areas Commun., vol. 43, no. 5, Apr. 2025

2025
[4]

Toward universal decoding of binary linear block codes via enhanced polar transforma- tions,

C.-Y . Lin, Y .-C. Huang, S.-L. Shieh, and P.-N. Chen, “Toward universal decoding of binary linear block codes via enhanced polar transforma- tions,”IEEE Trans. Commun., vol. 73, no. 11, 2025

2025
[5]

Ordered reliability bits guessing codeword decoding of short codes,

Q. Wang, Y . Wang, X. Zheng, and X. Ma, “Ordered reliability bits guessing codeword decoding of short codes,”IEEE Wireless Commun. Lett., vol. 14, no. 9, 2025

2025
[6]

Learning to decode linear codes using deep learning,

E. Nachmani, Y . Be’ery, and D. Burshtein, “Learning to decode linear codes using deep learning,” in2016 54th Annual Allerton Conf. on Commun., Control, and Computing, Monticello, IL, USA, 2016

2016
[7]

On deep learning- based channel decoding,

T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning- based channel decoding,” in2017 51st Annual Conf. on Inform. Sci. and Sys. (CISS), Baltimore, MD, USA, 2017

2017
[8]

Deep learning for decoding of linear codes: A syndrome-based approach,

A. Bennatan, Y . Choukroun, and P. Kisilev, “Deep learning for decoding of linear codes: A syndrome-based approach,” inProc. IEEE Int. Symp. Inform. Theory (ISIT), Vail, CO, USA, June 2018

2018
[9]

On the design and performance of machine learning based error correcting decoders,

Y . Yuanet al., “On the design and performance of machine learning based error correcting decoders,” in2025 14th Int. ITG Conf. on Sys., Commun. and Coding (SCC), Karlsruhe, Germany, Mar. 2025

2025
[10]

Error correction code transformer,

Y . Choukroun and L. Wolf, “Error correction code transformer,” inProc. NeurIPS, New Orleans, LO, USA, 2022

2022
[11]

Improved syndrome-based neu- ral decoder for linear block codes,

G. De Boni Rovella and M. Benammar, “Improved syndrome-based neu- ral decoder for linear block codes,” inProc. IEEE Global Telecommun. Conf. (GLOBECOM), Kuala Lumpur, Malaysia, Dec. 2023

2023
[12]

A foundation model for error correction codes,

Y . Choukroun and L. Wolf, “A foundation model for error correction codes,” inProc. 12th Int. Conf. Learning Repr . (ICLR), Vienna, Austria, May 2024

2024
[13]

Interplay between belief propagation and transformer: Differential-attention message pass- ing transformer,

C. W. K. Lau, X. Shi, Z. Zheng, H. Cao, and N. Guo, “Interplay between belief propagation and transformer: Differential-attention message pass- ing transformer,” in2025 IEEE Int. Symp. Inform. Theory (ISIT), Ann Arbor, MI, USA, June 2025

2025
[14]

Hybrid mamba- transformer decoder for error-correcting codes,

S.-e. Cohen, Y . Choukroun, and E. Nachmani, “Hybrid mamba- transformer decoder for error-correcting codes,”Preprint. arXiv:2505.17834, 2025

work page arXiv 2025
[15]

Doing more with less: Towards more data-efficient syndrome-based neural decoders,

A. Ismail, R. Le Bidan, E. Dupraz, and C. Abdel Nour, “Doing more with less: Towards more data-efficient syndrome-based neural decoders,” inProc. IEEE Int. Conf. Mach. Learn. Commun. Netw. (ICMLCN), Barcelona, Spain, May 2025

2025
[16]

Automorphism ensemble decoding of reed–muller codes,

M. Geiselhart, A. Elkelesh, M. Ebada, S. Cammerer, and S. ten Brink, “Automorphism ensemble decoding of reed–muller codes,”IEEE Trans. Commun., vol. 69, no. 10, pp. 6424–6438, 2021

2021
[17]

Maximum likelihood soft decoding of binary block codes and decoders for the Golay codes,

J. Snyders and Y . Be’ery, “Maximum likelihood soft decoding of binary block codes and decoders for the Golay codes,”IEEE Trans. Inform. Theory, vol. 35, no. 5, Sep. 1989

1989
[18]

CrossMPT: Cross-attention message-passing transformer for error correcting codes,

S.-J. Park, H.-Y . Kwak, S.-H. Kim, Y . Kim, and J.-S. No, “CrossMPT: Cross-attention message-passing transformer for error correcting codes,” inProc. Int. Conf. Learning Repr . (ICLR), Singapore, Apr. 2025

2025
[19]

F. J. MacWilliams and N. J. A. Sloane,The Theory of Error-Correcting Codes. North-Holland Pub, 1977

1977
[20]

On the automorphism group of polar codes,

M. Geiselhart, A. Elkelesh, M. Ebada, S. Cammerer, and S. ten Brink, “On the automorphism group of polar codes,” inProc. IEEE Int. Symp. on Inform. Theory (ISIT), Melbourne, Australia, July 2021

2021
[21]

Database of channel codes and ML simulation results,

M. Helmlinget al., “Database of channel codes and ML simulation results,” 2025. [Online]. Available: http://rptu.de/en/channel-codes

2025

[1] [1]

Efficient decoders for short block length codes in 6G URLLC,

C. Yue, V . Miloslavskaya, M. Shirvanimoghaddam, B. Vucetic, and Y . Li, “Efficient decoders for short block length codes in 6G URLLC,” IEEE Commun. Mag., vol. 61, no. 4, pp. 84–90, 2023

2023

[2] [2]

Soft-decision decoding of linear block codes based on ordered statistics,

M. Fossorier and S. Lin, “Soft-decision decoding of linear block codes based on ordered statistics,”IEEE Trans. Inform. Theory, vol. 41, no. 5, Sep. 1995

1995

[3] [3]

Toward universal belief propagation decoding for short binary block codes,

Y . Shenet al., “Toward universal belief propagation decoding for short binary block codes,”IEEE J. Selec. Areas Commun., vol. 43, no. 5, Apr. 2025

2025

[4] [4]

Toward universal decoding of binary linear block codes via enhanced polar transforma- tions,

C.-Y . Lin, Y .-C. Huang, S.-L. Shieh, and P.-N. Chen, “Toward universal decoding of binary linear block codes via enhanced polar transforma- tions,”IEEE Trans. Commun., vol. 73, no. 11, 2025

2025

[5] [5]

Ordered reliability bits guessing codeword decoding of short codes,

Q. Wang, Y . Wang, X. Zheng, and X. Ma, “Ordered reliability bits guessing codeword decoding of short codes,”IEEE Wireless Commun. Lett., vol. 14, no. 9, 2025

2025

[6] [6]

Learning to decode linear codes using deep learning,

E. Nachmani, Y . Be’ery, and D. Burshtein, “Learning to decode linear codes using deep learning,” in2016 54th Annual Allerton Conf. on Commun., Control, and Computing, Monticello, IL, USA, 2016

2016

[7] [7]

On deep learning- based channel decoding,

T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning- based channel decoding,” in2017 51st Annual Conf. on Inform. Sci. and Sys. (CISS), Baltimore, MD, USA, 2017

2017

[8] [8]

Deep learning for decoding of linear codes: A syndrome-based approach,

A. Bennatan, Y . Choukroun, and P. Kisilev, “Deep learning for decoding of linear codes: A syndrome-based approach,” inProc. IEEE Int. Symp. Inform. Theory (ISIT), Vail, CO, USA, June 2018

2018

[9] [9]

On the design and performance of machine learning based error correcting decoders,

Y . Yuanet al., “On the design and performance of machine learning based error correcting decoders,” in2025 14th Int. ITG Conf. on Sys., Commun. and Coding (SCC), Karlsruhe, Germany, Mar. 2025

2025

[10] [10]

Error correction code transformer,

Y . Choukroun and L. Wolf, “Error correction code transformer,” inProc. NeurIPS, New Orleans, LO, USA, 2022

2022

[11] [11]

Improved syndrome-based neu- ral decoder for linear block codes,

G. De Boni Rovella and M. Benammar, “Improved syndrome-based neu- ral decoder for linear block codes,” inProc. IEEE Global Telecommun. Conf. (GLOBECOM), Kuala Lumpur, Malaysia, Dec. 2023

2023

[12] [12]

A foundation model for error correction codes,

Y . Choukroun and L. Wolf, “A foundation model for error correction codes,” inProc. 12th Int. Conf. Learning Repr . (ICLR), Vienna, Austria, May 2024

2024

[13] [13]

Interplay between belief propagation and transformer: Differential-attention message pass- ing transformer,

C. W. K. Lau, X. Shi, Z. Zheng, H. Cao, and N. Guo, “Interplay between belief propagation and transformer: Differential-attention message pass- ing transformer,” in2025 IEEE Int. Symp. Inform. Theory (ISIT), Ann Arbor, MI, USA, June 2025

2025

[14] [14]

Hybrid mamba- transformer decoder for error-correcting codes,

S.-e. Cohen, Y . Choukroun, and E. Nachmani, “Hybrid mamba- transformer decoder for error-correcting codes,”Preprint. arXiv:2505.17834, 2025

work page arXiv 2025

[15] [15]

Doing more with less: Towards more data-efficient syndrome-based neural decoders,

A. Ismail, R. Le Bidan, E. Dupraz, and C. Abdel Nour, “Doing more with less: Towards more data-efficient syndrome-based neural decoders,” inProc. IEEE Int. Conf. Mach. Learn. Commun. Netw. (ICMLCN), Barcelona, Spain, May 2025

2025

[16] [16]

Automorphism ensemble decoding of reed–muller codes,

M. Geiselhart, A. Elkelesh, M. Ebada, S. Cammerer, and S. ten Brink, “Automorphism ensemble decoding of reed–muller codes,”IEEE Trans. Commun., vol. 69, no. 10, pp. 6424–6438, 2021

2021

[17] [17]

Maximum likelihood soft decoding of binary block codes and decoders for the Golay codes,

J. Snyders and Y . Be’ery, “Maximum likelihood soft decoding of binary block codes and decoders for the Golay codes,”IEEE Trans. Inform. Theory, vol. 35, no. 5, Sep. 1989

1989

[18] [18]

CrossMPT: Cross-attention message-passing transformer for error correcting codes,

S.-J. Park, H.-Y . Kwak, S.-H. Kim, Y . Kim, and J.-S. No, “CrossMPT: Cross-attention message-passing transformer for error correcting codes,” inProc. Int. Conf. Learning Repr . (ICLR), Singapore, Apr. 2025

2025

[19] [19]

F. J. MacWilliams and N. J. A. Sloane,The Theory of Error-Correcting Codes. North-Holland Pub, 1977

1977

[20] [20]

On the automorphism group of polar codes,

M. Geiselhart, A. Elkelesh, M. Ebada, S. Cammerer, and S. ten Brink, “On the automorphism group of polar codes,” inProc. IEEE Int. Symp. on Inform. Theory (ISIT), Melbourne, Australia, July 2021

2021

[21] [21]

Database of channel codes and ML simulation results,

M. Helmlinget al., “Database of channel codes and ML simulation results,” 2025. [Online]. Available: http://rptu.de/en/channel-codes

2025