End-to-end Learning for GMI Optimized Geometric Constellation Shape

Darko Zibar; Metodi P. Yankov; Rasmus T. Jones

arxiv: 1907.08535 · v1 · pith:WEMMAUCSnew · submitted 2019-07-19 · 💻 cs.IT · eess.SP· math.IT· stat.ML

End-to-end Learning for GMI Optimized Geometric Constellation Shape

Rasmus T. Jones , Metodi P. Yankov , Darko Zibar This is my paper

Pith reviewed 2026-05-24 18:46 UTC · model grok-4.3

classification 💻 cs.IT eess.SPmath.ITstat.ML

keywords geometric constellation shapingautoencodergeneralized mutual informationbit mappingend-to-end learningtransceiver impairmentsQAM

0 comments

The pith

End-to-end autoencoder training jointly optimizes geometric constellation shapes and bit mappings to raise GMI by up to 0.2 bits per QAM symbol.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an autoencoder can be trained to discover both the positions of constellation points and the assignment of bits to those points. Training occurs through a channel model that includes transceiver impairments, with the objective of maximizing generalized mutual information. Reported gains reach 0.2 bits per QAM symbol across multiple rates. The resulting constellations remain compatible with ordinary binary forward-error-correction codes.

Core claim

By casting constellation design as the end-to-end training of a neural network that includes the transmitter, channel, and receiver, the method learns point locations and bit mappings that achieve higher GMI than conventional geometric shaping while remaining usable with standard binary FEC.

What carries the argument

The autoencoder that parameterizes both the constellation geometry and the bit-to-symbol mapping, trained end-to-end to maximize GMI through a differentiable impairment model.

If this is right

The same binary FEC can be retained while increasing achievable rate.
The approach applies across a range of data rates without redesigning the outer code.
Gains persist when transceiver impairments are included in the training loop.
No change to the receiver architecture beyond the demapper is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the learned shapes transfer to hardware, they could narrow the gap between practical links and theoretical limits without new coding schemes.
The same training loop could be re-targeted to other differentiable metrics such as bit-error rate or mutual information.
Including nonlinear fiber effects in the training model might produce constellations suited to long-haul optical systems.

Load-bearing premise

The simulation model of transceiver impairments used during autoencoder training is sufficiently representative of real hardware that the learned constellations and mappings will deliver the reported GMI gain when deployed.

What would settle it

Deploy the learned constellation points and mappings on actual transceiver hardware, measure the realized GMI, and check whether the 0.2 bits per QAM symbol improvement over conventional BICM still appears.

Figures

Figures reproduced from arXiv: 1907.08535 by Darko Zibar, Metodi P. Yankov, Rasmus T. Jones.

read the original abstract

Autoencoder-based geometric shaping is proposed that includes optimizing bit mappings. Up to 0.2 bits/QAM symbol gain in GMI is achieved for a variety of data rates and in the presence of transceiver impairments. The gains can be harvested with standard binary FEC at no cost w.r.t. conventional BICM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an autoencoder-based end-to-end learning framework for jointly optimizing geometric constellation shapes and bit mappings to maximize generalized mutual information (GMI) under transceiver impairments. It reports empirical gains of up to 0.2 bits per QAM symbol across a range of data rates, which can be realized using standard binary FEC without additional overhead relative to conventional BICM.

Significance. If the reported GMI gains prove robust, the work provides a practical route to improved spectral efficiency in impaired channels by learning constellations and labelings that remain compatible with existing binary FEC. The joint optimization via autoencoders is a clear methodological strength when accompanied by reproducible training procedures and baseline comparisons.

major comments (2)

[Abstract and simulation section] Abstract and § on simulation setup: the central claim of 0.2 bit/QAM GMI improvement 'in the presence of transceiver impairments' rests on the training impairment model (nonlinearities, noise statistics, etc.) being representative of hardware; the manuscript supplies no validation of this model against measured hardware statistics or sensitivity analysis showing that the learned points remain GMI-optimal when the model is perturbed.
[Results] Results section: the abstract states empirical gains but the provided description supplies no training details, validation procedure, baseline comparisons, or error bars, preventing assessment of whether the 0.2 bit margin is robust or an artifact of the simulation setup.

minor comments (1)

Notation for GMI and the autoencoder loss should be defined explicitly on first use to aid readers unfamiliar with the intersection of machine learning and information theory.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract and simulation section] Abstract and § on simulation setup: the central claim of 0.2 bit/QAM GMI improvement 'in the presence of transceiver impairments' rests on the training impairment model (nonlinearities, noise statistics, etc.) being representative of hardware; the manuscript supplies no validation of this model against measured hardware statistics or sensitivity analysis showing that the learned points remain GMI-optimal when the model is perturbed.

Authors: Our impairment model follows standard transceiver models from the literature (nonlinearities, phase noise, and AWGN). We agree that a sensitivity analysis would strengthen the claims and will add one in the revised version to show that the learned points remain near-optimal under moderate perturbations to model parameters. Direct validation against measured hardware statistics is outside the scope of this simulation study, as the work focuses on the end-to-end learning methodology rather than a specific hardware campaign. revision: partial
Referee: [Results] Results section: the abstract states empirical gains but the provided description supplies no training details, validation procedure, baseline comparisons, or error bars, preventing assessment of whether the 0.2 bit margin is robust or an artifact of the simulation setup.

Authors: We will expand the results section to explicitly detail the training procedure (optimizer, learning rate schedule, number of epochs), the validation split used, the exact baseline constellations and labelings compared, and error bars computed over multiple independent training runs with different random seeds to demonstrate that the reported GMI gains are robust. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical simulation gains are measured outcomes

full rationale

The paper uses an autoencoder to learn constellation points and bit mappings that maximize GMI under a chosen impairment model, then reports measured GMI improvements versus conventional BICM in simulation. No load-bearing equations, fitted parameters renamed as predictions, or self-citation chains reduce the reported gains to inputs by construction. The central result is an empirical comparison of two separately simulated systems, which remains falsifiable against external hardware or alternative models and does not collapse into a definitional identity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; the ledger is therefore minimal and provisional.

axioms (1)

domain assumption The channel and impairment model used for training is representative of the target deployment scenario.
Required for any learned constellation to transfer from simulation to hardware.

pith-pipeline@v0.9.0 · 5579 in / 1088 out tokens · 47045 ms · 2026-05-24T18:46:09.271026+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

[1]

We show that the proposed autoencoder arrives at a Gray-like code, which does not exhibit this problem

Conventional bit-interleaved coded modulation (BICM) is penalized with geometric shaping due to the non-Gray labeling. We show that the proposed autoencoder arrives at a Gray-like code, which does not exhibit this problem

work page
[2]

We show that in the operating regions of interest and with the application of modulation- format independent digital signal processing (DSP) chain, the penalty is the same

The implementation penalty is higher for geometric shap- ing than rectangular QAM. We show that in the operating regions of interest and with the application of modulation- format independent digital signal processing (DSP) chain, the penalty is the same

work page
[3]

End-to-end Learning for GMI Optimized Geometric Constellation Shape

Iterative demapping or non-binary FEC are required for geometric shaping schemes. We show that the proposed labelings do not have this requirement because they are GMI optimized. 1 arXiv:1907.08535v1 [cs.IT] 19 Jul 2019 -1 0 1 In-phase -1 0 1 Quadrature 0 1 In-phase 0 1Quadrature 00001000 00001001 00001010 00001011 00001101 00001111 00011000 00011001 0001...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[4]

A family of three geometric shapes optimized as in Section 2 are evaluated

frequency offset between transmitter laser and local oscilla- tor of 50 MHz; 3) ADC sampling frequency of 80 GSa/s; 4) ADC resolution of 6 bits, modelled with a uniform quantiza- tion step. A family of three geometric shapes optimized as in Section 2 are evaluated. The shapes are optimized for transmis- sion at 2, 5 and 10 spans. The optimal of the three ...

work page
[5]

Constellation shaping for WDM systems using 256QAM/1024QAM with probabilistic optimization

Yankov, M. P., et al. "Constellation shaping for WDM systems using 256QAM/1024QAM with probabilistic optimization." Journal of Lightwave Technology 34.22 (2016): 5146-5156

work page 2016
[6]

Rate adaptation and reach increase by probabilistically shaped 64-QAM: An experimental demonstration

Buchali, F., et al. "Rate adaptation and reach increase by probabilistically shaped 64-QAM: An experimental demonstration." Journal of Lightwave Technology 34.7 (2016): 1599-1609

work page 2016
[7]

Bandwidth efﬁ- cient and rate-matched low-density parity-check coded modulation

Böcherer, G., Steiner, F., and Schulte, P. "Bandwidth efﬁ- cient and rate-matched low-density parity-check coded modulation." IEEE Transactions on Communications 63.12 (2015): 4651-4665

work page 2015
[8]

Capacity achieving nonbinary LDPC coded non-uniform shaping modulation for adaptive optical communications

Lin, C., et al. "Capacity achieving nonbinary LDPC coded non-uniform shaping modulation for adaptive optical communications." Optics express 24.16 (2016): 18095-18104

work page 2016
[9]

Coded PDM-OFDM transmission with shaped 256-iterative-polar-modulation achieving 11.15-b/s/Hz intrachannel spectral efﬁciency and 800-km reach

Lotz, T. H., et al. "Coded PDM-OFDM transmission with shaped 256-iterative-polar-modulation achieving 11.15-b/s/Hz intrachannel spectral efﬁciency and 800-km reach." Journal of Lightwave Technology 31.4 (2013): 538-545

work page 2013
[10]

Constant composition dis- tribution matching

Schulte, P., and Böcherer, G. "Constant composition dis- tribution matching." IEEE Transactions on Information Theory 62.1 (2016): 430-434

work page 2016
[11]

Hierarchical distribution matching for probabilistically shaped coded modulation

Yoshida, T., Karlsson, M., and Agrell, E. "Hierarchical distribution matching for probabilistically shaped coded modulation." Journal of Lightwave Technology (2019)

work page 2019
[12]

”Properties of nonlinear noise in long, dispersion-uncompensated ﬁber links.” Opt

Dar, R., et al. ”Properties of nonlinear noise in long, dispersion-uncompensated ﬁber links.” Opt. Exp. 21.22 (2013): 25685-25699

work page 2013
[13]

Ultrahigh-Spectral-Efﬁciency WDM/SDM Transmission Using PDM-1024-QAM Probabilistic Shaping With Adaptive Rate

Hu, H., et al. "Ultrahigh-Spectral-Efﬁciency WDM/SDM Transmission Using PDM-1024-QAM Probabilistic Shaping With Adaptive Rate." Journal of Lightwave Technology 36.6 (2018): 1304-1308. Conference Paper

work page 2018
[14]

Design and performance evaluation of a GMI-optimized 32QAM

Zhang, S., et al. "Design and performance evaluation of a GMI-optimized 32QAM." 2017 European Conference on Optical Communication (ECOC). IEEE, 2017

work page 2017
[15]

Increasing achievable information rates via geometric shaping

Chen, B., et al. "Increasing achievable information rates via geometric shaping." 2018 European Conference on Optical Communication (ECOC). IEEE, 2018

work page 2018
[16]

Deep learning of geometric constella- tion shaping including ﬁber nonlinearities

Jones, R. T., et al. "Deep learning of geometric constella- tion shaping including ﬁber nonlinearities." 2018 Euro- pean Conference on Optical Communication (ECOC). IEEE, 2018. Standards

work page 2018
[17]

LTE 3GPP TS 36.212: Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and channel cod- ing’, 2013 Source Code

work page 2013
[18]

R. T. Jones, https://github.com/rassibassi/claude, 2019. 4

work page 2019

[1] [1]

We show that the proposed autoencoder arrives at a Gray-like code, which does not exhibit this problem

Conventional bit-interleaved coded modulation (BICM) is penalized with geometric shaping due to the non-Gray labeling. We show that the proposed autoencoder arrives at a Gray-like code, which does not exhibit this problem

work page

[2] [2]

We show that in the operating regions of interest and with the application of modulation- format independent digital signal processing (DSP) chain, the penalty is the same

The implementation penalty is higher for geometric shap- ing than rectangular QAM. We show that in the operating regions of interest and with the application of modulation- format independent digital signal processing (DSP) chain, the penalty is the same

work page

[3] [3]

End-to-end Learning for GMI Optimized Geometric Constellation Shape

Iterative demapping or non-binary FEC are required for geometric shaping schemes. We show that the proposed labelings do not have this requirement because they are GMI optimized. 1 arXiv:1907.08535v1 [cs.IT] 19 Jul 2019 -1 0 1 In-phase -1 0 1 Quadrature 0 1 In-phase 0 1Quadrature 00001000 00001001 00001010 00001011 00001101 00001111 00011000 00011001 0001...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[4] [4]

A family of three geometric shapes optimized as in Section 2 are evaluated

frequency offset between transmitter laser and local oscilla- tor of 50 MHz; 3) ADC sampling frequency of 80 GSa/s; 4) ADC resolution of 6 bits, modelled with a uniform quantiza- tion step. A family of three geometric shapes optimized as in Section 2 are evaluated. The shapes are optimized for transmis- sion at 2, 5 and 10 spans. The optimal of the three ...

work page

[5] [5]

Constellation shaping for WDM systems using 256QAM/1024QAM with probabilistic optimization

Yankov, M. P., et al. "Constellation shaping for WDM systems using 256QAM/1024QAM with probabilistic optimization." Journal of Lightwave Technology 34.22 (2016): 5146-5156

work page 2016

[6] [6]

Rate adaptation and reach increase by probabilistically shaped 64-QAM: An experimental demonstration

Buchali, F., et al. "Rate adaptation and reach increase by probabilistically shaped 64-QAM: An experimental demonstration." Journal of Lightwave Technology 34.7 (2016): 1599-1609

work page 2016

[7] [7]

Bandwidth efﬁ- cient and rate-matched low-density parity-check coded modulation

Böcherer, G., Steiner, F., and Schulte, P. "Bandwidth efﬁ- cient and rate-matched low-density parity-check coded modulation." IEEE Transactions on Communications 63.12 (2015): 4651-4665

work page 2015

[8] [8]

Capacity achieving nonbinary LDPC coded non-uniform shaping modulation for adaptive optical communications

Lin, C., et al. "Capacity achieving nonbinary LDPC coded non-uniform shaping modulation for adaptive optical communications." Optics express 24.16 (2016): 18095-18104

work page 2016

[9] [9]

Coded PDM-OFDM transmission with shaped 256-iterative-polar-modulation achieving 11.15-b/s/Hz intrachannel spectral efﬁciency and 800-km reach

Lotz, T. H., et al. "Coded PDM-OFDM transmission with shaped 256-iterative-polar-modulation achieving 11.15-b/s/Hz intrachannel spectral efﬁciency and 800-km reach." Journal of Lightwave Technology 31.4 (2013): 538-545

work page 2013

[10] [10]

Constant composition dis- tribution matching

Schulte, P., and Böcherer, G. "Constant composition dis- tribution matching." IEEE Transactions on Information Theory 62.1 (2016): 430-434

work page 2016

[11] [11]

Hierarchical distribution matching for probabilistically shaped coded modulation

Yoshida, T., Karlsson, M., and Agrell, E. "Hierarchical distribution matching for probabilistically shaped coded modulation." Journal of Lightwave Technology (2019)

work page 2019

[12] [12]

”Properties of nonlinear noise in long, dispersion-uncompensated ﬁber links.” Opt

Dar, R., et al. ”Properties of nonlinear noise in long, dispersion-uncompensated ﬁber links.” Opt. Exp. 21.22 (2013): 25685-25699

work page 2013

[13] [13]

Ultrahigh-Spectral-Efﬁciency WDM/SDM Transmission Using PDM-1024-QAM Probabilistic Shaping With Adaptive Rate

Hu, H., et al. "Ultrahigh-Spectral-Efﬁciency WDM/SDM Transmission Using PDM-1024-QAM Probabilistic Shaping With Adaptive Rate." Journal of Lightwave Technology 36.6 (2018): 1304-1308. Conference Paper

work page 2018

[14] [14]

Design and performance evaluation of a GMI-optimized 32QAM

Zhang, S., et al. "Design and performance evaluation of a GMI-optimized 32QAM." 2017 European Conference on Optical Communication (ECOC). IEEE, 2017

work page 2017

[15] [15]

Increasing achievable information rates via geometric shaping

Chen, B., et al. "Increasing achievable information rates via geometric shaping." 2018 European Conference on Optical Communication (ECOC). IEEE, 2018

work page 2018

[16] [16]

Deep learning of geometric constella- tion shaping including ﬁber nonlinearities

Jones, R. T., et al. "Deep learning of geometric constella- tion shaping including ﬁber nonlinearities." 2018 Euro- pean Conference on Optical Communication (ECOC). IEEE, 2018. Standards

work page 2018

[17] [17]

LTE 3GPP TS 36.212: Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and channel cod- ing’, 2013 Source Code

work page 2013

[18] [18]

R. T. Jones, https://github.com/rassibassi/claude, 2019. 4

work page 2019