Sequential Neural Probabilistic Amplitude Shaping: Learning the Channel's Language

Amirhossein Ghazisaeidi; Lutz Lampe; Mohammad Taha Askari

arxiv: 2605.28143 · v1 · pith:MXPHRYUUnew · submitted 2026-05-27 · 💻 cs.LG · cs.IT· eess.SP· math.IT

Sequential Neural Probabilistic Amplitude Shaping: Learning the Channel's Language

Mohammad Taha Askari , Lutz Lampe , Amirhossein Ghazisaeidi This is my paper

Pith reviewed 2026-06-29 14:17 UTC · model grok-4.3

classification 💻 cs.LG cs.ITeess.SPmath.IT

keywords probabilistic amplitude shapingneural networksarithmetic distribution matchingrate lossachievable information rateautoregressive encoderoptical communicationssequential shaping

0 comments

The pith

A sequential autoregressive neural encoder enables probabilistic amplitude shaping that outperforms prior methods while fully accounting for implementation losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a neural approach to probabilistic amplitude shaping that operates sequentially rather than in blocks. It uses an autoregressive model to generate amplitude sequences that match the channel's preferred distribution and works directly with arithmetic distribution matching. This design is presented as the first to reduce rate loss and raise achievable information rates after every implementation cost is included. A reader would care because amplitude shaping determines how efficiently power is used in high-speed links, so lower rate loss means more bits per transmission without extra hardware.

Core claim

We present the first neural probabilistic amplitude shaping that outperforms existing methods while accounting for all implementation losses, using a block-less, easily implementable sequential autoregressive encoder compatible with arithmetic distribution matching, yielding reduced rate loss and higher achievable information rates.

What carries the argument

The sequential autoregressive neural encoder that produces amplitude sequences on the fly and remains compatible with arithmetic distribution matching.

If this is right

Rate loss decreases because the encoder avoids block boundaries and fixed-length constraints.
Achievable information rates increase once all implementation losses are subtracted from the mutual information.
The encoder integrates with existing arithmetic distribution matching without requiring new hardware blocks.
Sequential generation allows the shaping to adapt symbol-by-symbol rather than waiting for an entire block.
The method applies to any modulation format where amplitude probabilities can be learned from channel statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same autoregressive structure could be retrained on-line when channel statistics drift, enabling continuous adaptation.
Because the encoder is sequential, it may combine naturally with forward error correction that also processes symbols in order.
Extending the model to jointly shape both amplitude and phase could further raise rates in channels where phase noise is the dominant impairment.
The reduced rate loss might allow shorter codewords in the outer error-correcting code while keeping the same overall performance.

Load-bearing premise

That the neural encoder can be made block-less and fully compatible with arithmetic distribution matching while truly capturing every implementation loss without introducing hidden overheads.

What would settle it

A side-by-side hardware implementation that measures end-to-end achievable information rate for the neural method versus a conventional block-based PAS system on the same channel, with every coding and quantization loss counted.

Figures

Figures reproduced from arXiv: 2605.28143 by Amirhossein Ghazisaeidi, Lutz Lampe, Mohammad Taha Askari.

**Figure 2.** Figure 2: AIR versus launch power. 1.93 bits/1D to match the entropy of the optimal MB marginal. Neural models are sampled through ADM with 2048 input bits. The ESS blocklength and sequence-selection parameters are optimized by grid search to maximize AIR. For the considered link, the best ESS blocklength is 32, while sequence selection achieves its best performance with selection blocklength 64 and 16 candidates … view at source ↗

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper claims the first neural sequential autoregressive encoder for probabilistic amplitude shaping that is block-less, compatible with arithmetic distribution matching, and accounts for all implementation losses to cut rate loss.

read the letter

The main takeaway is a neural method for amplitude shaping that generates amplitudes sequentially with an autoregressive model instead of fixed blocks. It positions itself as the first to do this while staying compatible with arithmetic distribution matching and folding in every implementation loss, which the authors say yields higher achievable rates.

What stands out as new is the shift to a streaming, block-free design that learns the channel directly. Prior shaping work often relied on block-based probabilistic amplitude shaping or distribution matching with separate handling of losses. Making the encoder autoregressive and neural lets it adapt on the fly, which could reduce the rate penalty that comes from blocking or incomplete loss tracking.

The paper does well in tying the approach to practical constraints in communications systems. Compatibility with existing arithmetic distribution matching is a concrete engineering choice that could let this plug into current pipelines without wholesale replacement. If the experiments back the claim of outperforming baselines after all losses, that would be a useful data point for people working on shaped constellations.

The soft spots are around the missing details on implementation cost and verification. Autoregressive sampling can introduce latency or require careful handling to stay real-time, and it is not obvious from the abstract whether the neural overhead is fully costed or if some trade-offs are understated. Without seeing the architecture, training loss, or side-by-side complexity numbers, it is hard to judge how "easily implementable" the scheme really is in hardware.

This is for researchers already working on learned modulation and shaping in information theory or signal processing. A reader focused on practical rate improvements in optical or wireless links would find the sequential angle worth checking.

I would send it to peer review. The claim is specific enough that referees can test the loss accounting and the sequential compatibility directly.

Referee Report

2 major / 0 minor

Summary. The manuscript claims to introduce the first neural probabilistic amplitude shaping (PAS) method that outperforms existing techniques while fully accounting for all implementation losses. It relies on a block-less sequential autoregressive neural encoder that is easily implementable and compatible with arithmetic distribution matching (ADM), yielding reduced rate loss and higher achievable information rates.

Significance. If the architectural and empirical claims hold after full validation, the work could advance practical learned shaping in communications by providing a sequential neural encoder that integrates with established ADM without block constraints, potentially improving finite-length performance and information rates.

major comments (2)

Abstract: the central claim of outperforming existing methods while accounting for every implementation loss is presented without any supporting derivations, equations, experimental results, error bars, complexity analysis, or overhead measurements, rendering the claim impossible to assess for soundness or reproducibility.
Abstract: the assertion that the sequential autoregressive encoder is 'easily implementable' and 'fully compatible with arithmetic distribution matching' without hidden overheads or performance trade-offs is a load-bearing assumption that requires concrete implementation details and measurements, none of which are provided.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review and the opportunity to clarify the manuscript. Below we respond point-by-point to the major comments on the abstract. The abstract is a concise summary; all supporting material appears in the full paper.

read point-by-point responses

Referee: [—] Abstract: the central claim of outperforming existing methods while accounting for every implementation loss is presented without any supporting derivations, equations, experimental results, error bars, complexity analysis, or overhead measurements, rendering the claim impossible to assess for soundness or reproducibility.

Authors: The abstract summarizes the paper's main contributions at a high level, as is standard. The supporting derivations, equations, experimental results (including error bars), complexity analysis, and overhead measurements are provided in the full manuscript, specifically in the sections on the sequential autoregressive encoder, the integration with arithmetic distribution matching, the rate-loss analysis, and the numerical evaluation. These elements enable assessment of soundness and reproducibility. revision: no
Referee: [—] Abstract: the assertion that the sequential autoregressive encoder is 'easily implementable' and 'fully compatible with arithmetic distribution matching' without hidden overheads or performance trade-offs is a load-bearing assumption that requires concrete implementation details and measurements, none of which are provided.

Authors: The manuscript contains the concrete implementation details, architectural description, and measurements demonstrating compatibility with arithmetic distribution matching and the absence of hidden overheads. These appear in the sections describing the block-less sequential encoder, its training procedure, and the end-to-end performance evaluation that accounts for all implementation aspects. We are prepared to expand any specific subsection if the referee identifies a particular gap. revision: no

Circularity Check

0 steps flagged

No significant circularity; derivation chain not present in provided text

full rationale

The provided manuscript text consists solely of the abstract, which states an empirical and architectural claim ('first neural probabilistic amplitude shaping that outperforms existing methods while accounting for all implementation losses, using a block-less, easily implementable sequential autoregressive encoder compatible with arithmetic distribution matching') without any equations, derivations, fitted parameters, or self-citations that could form a load-bearing chain. No self-definitional steps, fitted inputs called predictions, or uniqueness theorems are visible. The central result is presented as an empirical outcome rather than a reduction to prior inputs by construction. Per the hard rules, when the paper is self-contained against external benchmarks with no detectable circular steps, the score is 0 and steps remain empty.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities can be identified from the abstract alone.

pith-pipeline@v0.9.1-grok · 5563 in / 897 out tokens · 46182 ms · 2026-06-29T14:17:46.634801+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 14 canonical work pages · 2 internal anchors

[1]

Bandwidth effi- cient and rate-matched low-density parity-check coded modulation

G. Böcherer, F . Steiner, and P . Schulte, “Bandwidth effi- cient and rate-matched low-density parity-check coded modulation”,IEEE Transactions on Communications, vol. 63, no. 12, pp. 4651–4665, 2015.DOI: 10.1109/ TCOMM.2015.2494016

work page arXiv 2015
[2]

On shaping gain in the nonlinear fiber-optic channel

R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “On shaping gain in the nonlinear fiber-optic channel”, in2014 IEEE International Symposium on Information Theory, 2014, pp. 2794–2798.DOI:10.1109/ISIT.2014.6875343

work page doi:10.1109/isit.2014.6875343 2014
[3]

Probabilistic shaping for nonlinearity tolerance

M. T. Askari and L. Lampe, “Probabilistic shaping for nonlinearity tolerance”,Journal of Lightwave Technology, vol. 43, no. 4, pp. 1565–1580, 2025.DOI: 10.1109/JLT. 2024.3521642

work page doi:10.1109/jlt 2025
[4]

Introducing enumerative sphere shap- ing for optical communication systems with short block- lengths

A. Amari et al., “Introducing enumerative sphere shap- ing for optical communication systems with short block- lengths”,Journal of Lightwave Technology, vol. 37, no. 23, pp. 5926–5936, 2019.DOI: 10.1109/JLT.2019. 2943938

work page doi:10.1109/jlt.2019 2019
[5]

Probabilistic am- plitude shaping and nonlinearity tolerance: Analysis and sequence selection method

M. T. Askari, L. Lampe, and J. Mitra, “Probabilistic am- plitude shaping and nonlinearity tolerance: Analysis and sequence selection method”,Journal of Lightwave Tech- nology, vol. 41, no. 17, pp. 5503–5517, 2023.DOI: 10. 1109/JLT.2023.3264032

work page arXiv 2023
[6]

Analysis of nonlinear fiber interactions for finite-length constant-composition se- quences

T. Fehenberger, D. S. Millar, T. Koike-Akino, K. Kojima, K. Parsons, and H. Griesser, “Analysis of nonlinear fiber interactions for finite-length constant-composition se- quences”,Journal of Lightwave Technology, vol. 38, no. 2, pp. 457–465, 2020.DOI: 10 . 1109 / JLT . 2019 . 2937926

2020
[7]

New lower bounds on the capacity of optical fiber chan- nels via optimized shaping and detection

M. Secondini, S. Civelli, E. Forestieri, and L. Z. Khan, “New lower bounds on the capacity of optical fiber chan- nels via optimized shaping and detection”,Journal of Lightwave Technology, vol. 40, no. 10, pp. 3197–3209, 2022.DOI:10.1109/JLT.2022.3148322

work page doi:10.1109/jlt.2022.3148322 2022
[8]

Sequence- selection-based constellation shaping for nonlinear channels

S. Civelli, E. Forestieri, and M. Secondini, “Sequence- selection-based constellation shaping for nonlinear channels”,Journal of Lightwave Technology, vol. 42, no. 3, pp. 1031–1043, 2024.DOI: 10.1109/JLT.2023. 3332487

work page doi:10.1109/jlt.2023 2024
[9]

Cost-gain analysis of se- quence selection for nonlinearity mitigation

S. Civelli and M. Secondini, “Cost-gain analysis of se- quence selection for nonlinearity mitigation”, inOpti- cal fiber communication conference, Optica Publishing Group, 2025, Tu2F–7.DOI: 10.1364/OFC.2025.Tu2F.7

work page doi:10.1364/ofc.2025.tu2f.7 2025
[10]

Arithmetic distribution match- ing

S. Baur and G. Böcherer, “Arithmetic distribution match- ing”, inInternational ITG Conference on Systems, Com- munications and Coding (SGC), 2015, pp. 1–6

2015
[11]

Neural prob- abilistic amplitude shaping for nonlinear fiber channels

M. T. Askari, L. Lampe, and A. Ghazisaeidi, “Neural prob- abilistic amplitude shaping for nonlinear fiber channels”, arXiv preprint arXiv:2602.02716, 2026

work page arXiv 2026
[12]

Neural probabilistic shaping: Joint distribution learning for op- tical fiber communications

M. T. Askari, L. Lampe, and A. Ghazisaeidi, “Neural probabilistic shaping: Joint distribution learning for op- tical fiber communications”, in2025 European Confer- ence on Optical Communications (ECOC), IEEE, 2025, pp. 1–4.DOI:10.1109/ECOC66593.2025.11263051

work page doi:10.1109/ecoc66593.2025.11263051 2025
[13]

Joint learning of probabilistic and geometric shaping for coded modulation systems

F . A. Aoudia and J. Hoydis, “Joint learning of probabilistic and geometric shaping for coded modulation systems”, inIEEE Global Communications Conference (Globe- com), 2020, pp. 1–6.DOI: 10 . 1109 / GLOBECOM42002 . 2020.9348032

work page arXiv 2020
[14]

Categorical reparame- terization with gumbel-softmax

E. Jang, S. Gu, and B. Poole, “Categorical reparame- terization with gumbel-softmax”, inInternational Confer- ence on Learning Representations (ICLR), 2017. [On- line]. Available: https://openreview.net/forum?id= rkE3y85ee

2017
[15]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Y . Bengio, N. Léonard, and A. Courville, “Esti- mating or propagating gradients through stochastic neurons for conditional computation”,arXiv preprint arXiv:1308.3432, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[16]

Log- likelihood ratio calculation for pilot symbol assisted coded modulation schemes with residual phase noise

P . Neshaastegaran and A. H. Banihashemi, “Log- likelihood ratio calculation for pilot symbol assisted coded modulation schemes with residual phase noise”, IEEE Transactions on Communications, vol. 67, no. 5, pp. 3782–3790, 2019.DOI: 10 . 1109 / TCOMM . 2019 . 2896190

2019
[17]

Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

S. Hochreiter and J. Schmidhuber, “Long short-term memory”,Neural Computation, vol. 9, no. 8, pp. 1735– 1780, 1997.DOI:10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997
[18]

Attention is all you need

A. Vaswani et al., “Attention is all you need”,Advances in neural information processing systems, vol. 30, 2017

2017
[19]

Ro- former: Enhanced transformer with rotary position em- bedding

J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Ro- former: Enhanced transformer with rotary position em- bedding”,Neurocomputing, vol. 568, p. 127 063, 2024

2024
[20]

GLU Variants Improve Transformer

N. Shazeer, “Glu variants improve transformer”,arXiv preprint arXiv:2002.05202, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2002
[21]

Perturbation-based se- quence selection for probabilistic amplitude shaping

M. T. Askari and L. Lampe, “Perturbation-based se- quence selection for probabilistic amplitude shaping”, inEuropean Conference on Optical Communication (ECOC), 2024, pp. 846–849

2024

[1] [1]

Bandwidth effi- cient and rate-matched low-density parity-check coded modulation

G. Böcherer, F . Steiner, and P . Schulte, “Bandwidth effi- cient and rate-matched low-density parity-check coded modulation”,IEEE Transactions on Communications, vol. 63, no. 12, pp. 4651–4665, 2015.DOI: 10.1109/ TCOMM.2015.2494016

work page arXiv 2015

[2] [2]

On shaping gain in the nonlinear fiber-optic channel

R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “On shaping gain in the nonlinear fiber-optic channel”, in2014 IEEE International Symposium on Information Theory, 2014, pp. 2794–2798.DOI:10.1109/ISIT.2014.6875343

work page doi:10.1109/isit.2014.6875343 2014

[3] [3]

Probabilistic shaping for nonlinearity tolerance

M. T. Askari and L. Lampe, “Probabilistic shaping for nonlinearity tolerance”,Journal of Lightwave Technology, vol. 43, no. 4, pp. 1565–1580, 2025.DOI: 10.1109/JLT. 2024.3521642

work page doi:10.1109/jlt 2025

[4] [4]

Introducing enumerative sphere shap- ing for optical communication systems with short block- lengths

A. Amari et al., “Introducing enumerative sphere shap- ing for optical communication systems with short block- lengths”,Journal of Lightwave Technology, vol. 37, no. 23, pp. 5926–5936, 2019.DOI: 10.1109/JLT.2019. 2943938

work page doi:10.1109/jlt.2019 2019

[5] [5]

Probabilistic am- plitude shaping and nonlinearity tolerance: Analysis and sequence selection method

M. T. Askari, L. Lampe, and J. Mitra, “Probabilistic am- plitude shaping and nonlinearity tolerance: Analysis and sequence selection method”,Journal of Lightwave Tech- nology, vol. 41, no. 17, pp. 5503–5517, 2023.DOI: 10. 1109/JLT.2023.3264032

work page arXiv 2023

[6] [6]

Analysis of nonlinear fiber interactions for finite-length constant-composition se- quences

T. Fehenberger, D. S. Millar, T. Koike-Akino, K. Kojima, K. Parsons, and H. Griesser, “Analysis of nonlinear fiber interactions for finite-length constant-composition se- quences”,Journal of Lightwave Technology, vol. 38, no. 2, pp. 457–465, 2020.DOI: 10 . 1109 / JLT . 2019 . 2937926

2020

[7] [7]

New lower bounds on the capacity of optical fiber chan- nels via optimized shaping and detection

M. Secondini, S. Civelli, E. Forestieri, and L. Z. Khan, “New lower bounds on the capacity of optical fiber chan- nels via optimized shaping and detection”,Journal of Lightwave Technology, vol. 40, no. 10, pp. 3197–3209, 2022.DOI:10.1109/JLT.2022.3148322

work page doi:10.1109/jlt.2022.3148322 2022

[8] [8]

Sequence- selection-based constellation shaping for nonlinear channels

S. Civelli, E. Forestieri, and M. Secondini, “Sequence- selection-based constellation shaping for nonlinear channels”,Journal of Lightwave Technology, vol. 42, no. 3, pp. 1031–1043, 2024.DOI: 10.1109/JLT.2023. 3332487

work page doi:10.1109/jlt.2023 2024

[9] [9]

Cost-gain analysis of se- quence selection for nonlinearity mitigation

S. Civelli and M. Secondini, “Cost-gain analysis of se- quence selection for nonlinearity mitigation”, inOpti- cal fiber communication conference, Optica Publishing Group, 2025, Tu2F–7.DOI: 10.1364/OFC.2025.Tu2F.7

work page doi:10.1364/ofc.2025.tu2f.7 2025

[10] [10]

Arithmetic distribution match- ing

S. Baur and G. Böcherer, “Arithmetic distribution match- ing”, inInternational ITG Conference on Systems, Com- munications and Coding (SGC), 2015, pp. 1–6

2015

[11] [11]

Neural prob- abilistic amplitude shaping for nonlinear fiber channels

M. T. Askari, L. Lampe, and A. Ghazisaeidi, “Neural prob- abilistic amplitude shaping for nonlinear fiber channels”, arXiv preprint arXiv:2602.02716, 2026

work page arXiv 2026

[12] [12]

Neural probabilistic shaping: Joint distribution learning for op- tical fiber communications

M. T. Askari, L. Lampe, and A. Ghazisaeidi, “Neural probabilistic shaping: Joint distribution learning for op- tical fiber communications”, in2025 European Confer- ence on Optical Communications (ECOC), IEEE, 2025, pp. 1–4.DOI:10.1109/ECOC66593.2025.11263051

work page doi:10.1109/ecoc66593.2025.11263051 2025

[13] [13]

Joint learning of probabilistic and geometric shaping for coded modulation systems

F . A. Aoudia and J. Hoydis, “Joint learning of probabilistic and geometric shaping for coded modulation systems”, inIEEE Global Communications Conference (Globe- com), 2020, pp. 1–6.DOI: 10 . 1109 / GLOBECOM42002 . 2020.9348032

work page arXiv 2020

[14] [14]

Categorical reparame- terization with gumbel-softmax

E. Jang, S. Gu, and B. Poole, “Categorical reparame- terization with gumbel-softmax”, inInternational Confer- ence on Learning Representations (ICLR), 2017. [On- line]. Available: https://openreview.net/forum?id= rkE3y85ee

2017

[15] [15]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Y . Bengio, N. Léonard, and A. Courville, “Esti- mating or propagating gradients through stochastic neurons for conditional computation”,arXiv preprint arXiv:1308.3432, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[16] [16]

Log- likelihood ratio calculation for pilot symbol assisted coded modulation schemes with residual phase noise

P . Neshaastegaran and A. H. Banihashemi, “Log- likelihood ratio calculation for pilot symbol assisted coded modulation schemes with residual phase noise”, IEEE Transactions on Communications, vol. 67, no. 5, pp. 3782–3790, 2019.DOI: 10 . 1109 / TCOMM . 2019 . 2896190

2019

[17] [17]

Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

S. Hochreiter and J. Schmidhuber, “Long short-term memory”,Neural Computation, vol. 9, no. 8, pp. 1735– 1780, 1997.DOI:10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997

[18] [18]

Attention is all you need

A. Vaswani et al., “Attention is all you need”,Advances in neural information processing systems, vol. 30, 2017

2017

[19] [19]

Ro- former: Enhanced transformer with rotary position em- bedding

J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Ro- former: Enhanced transformer with rotary position em- bedding”,Neurocomputing, vol. 568, p. 127 063, 2024

2024

[20] [20]

GLU Variants Improve Transformer

N. Shazeer, “Glu variants improve transformer”,arXiv preprint arXiv:2002.05202, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2002

[21] [21]

Perturbation-based se- quence selection for probabilistic amplitude shaping

M. T. Askari and L. Lampe, “Perturbation-based se- quence selection for probabilistic amplitude shaping”, inEuropean Conference on Optical Communication (ECOC), 2024, pp. 846–849

2024