Condition-Wise Sinkhorn Drifting for One-Shot Learned Channel Simulation

Rafael F. Schaefer; Rick Fritschek

arxiv: 2606.17893 · v1 · pith:RVZVVHX4new · submitted 2026-06-16 · 📡 eess.SP · cs.IT· math.IT

Condition-Wise Sinkhorn Drifting for One-Shot Learned Channel Simulation

Rick Fritschek , Rafael F. Schaefer This is my paper

Pith reviewed 2026-06-26 23:04 UTC · model grok-4.3

classification 📡 eess.SP cs.ITmath.IT

keywords condition-wise Sinkhorn driftingone-shot channel simulationlearned communication systemsconditional transportSinkhorn divergencediffusion modelssymbol error ratechannel surrogate

0 comments

The pith

Condition-wise Sinkhorn drifting supplies a one-shot generator that fixes the transmitted symbol and matches only the conditional output law p(y|x).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Learned communication systems must call stochastic channel models millions of times inside training loops, rendering repeated reverse sampling from diffusion models too slow. The paper therefore develops condition-wise Sinkhorn drifting, a generator trained on a conditional Sinkhorn objective that holds the input symbol fixed while transporting only the output conditional distributions. Training proceeds by computing finite-sample barycentric velocities and then performing detached particle regression. On AWGN, Rayleigh, SSPA, and TDL channels the method outperforms other one-shot drifting variants on conditional diagnostics and symbolic-coding tests, although diffusion models still lead on the most demanding symbol-error-rate curves. The result is a practical one-shot surrogate usable precisely when repeated channel evaluations become the dominant cost.

Core claim

A conditional Sinkhorn objective defined over repeated outputs conditioned on the same transmitted symbol can be optimized by finite-sample barycentric velocities followed by detached particle regression, yielding a generator that produces samples from p(y|x) in a single forward pass while exactly preserving the input symbol.

What carries the argument

Condition-wise Sinkhorn objective over repeated outputs at fixed transmitted symbol, optimized via finite-sample barycentric velocities and detached particle regression.

If this is right

Enables millions of differentiable channel evaluations inside training loops at substantially lower cost than diffusion-style sampling.
Exactly preserves the transmitted symbol while matching the conditional output distribution p(y|x).
Among one-shot drifting variants, condition-wise Sinkhorn yields the strongest results on conditional diagnostics and symbolic-coding checks.
Supplies a usable operating point whenever repeated channel calls make diffusion sampling prohibitive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit separation of symbol preservation from conditional transport may transfer to other conditional generation problems that require strict input conditioning.
One-shot sampling could tighten computational budgets in larger end-to-end learned transceiver designs that currently rely on diffusion.
Scaling tests on higher-dimensional or non-stationary channels would expose whether the barycentric-velocity training remains stable.
The same training recipe might be applied to other optimal-transport divergences beyond Sinkhorn.

Load-bearing premise

Finite-sample barycentric velocities and detached particle regression correctly optimize the conditional Sinkhorn objective and produce unbiased samples from p(y|x) without artifacts that degrade downstream symbol-error-rate performance.

What would settle it

If samples drawn from the trained condition-wise Sinkhorn generator produce measurably higher symbol-error rates than either true channel realizations or diffusion samples on the same modulation and coding scheme, the claim of practical equivalence fails.

Figures

Figures reproduced from arXiv: 2606.17893 by Rafael F. Schaefer, Rick Fritschek.

**Figure 1.** Figure 1: Fixed-condition SSPA output fibers. Each row fixes one transmitted-symbol anchor xi and plots 512 repeated channel outputs in the first I/Q plane; the three anchors are selected from 256 candidate transmitted symbols. The analytic channel is shown in gray. The learned columns overlay generated samples from joint Sinkhorn and condition-wise Sinkhorn on the same analytic cloud. Joint transport can match aspe… view at source ↗

**Figure 2.** Figure 2: SER curves for learned channel surrogates. Symbolic autoencoders are trained through each surrogate at the nominal point and evaluated on the analytic channel over an Eb/N0 grid. Curves show seed means with standard-error bars; floor-clipped points are upper bounds. lowest on Rayleigh, and WGAN is lowest on compact TDL. Condition-wise Sinkhorn reaches a competitive downstream [PITH_FULL_IMAGE:figures/full… view at source ↗

**Figure 3.** Figure 3: shows the resulting BER and block error rate (BLER) curves over 30 seeds. At the 4 dB training point, analyticchannel training gives BER 1.96 × 10−3 and BLER 8.18 × 10−2 , with 95% confidence intervals (CIs) 5.21 × 10−4 and 2.30 × 10−2 . Training through the condition-wise Sinkhorn surrogate gives BER 3.30×10−3 and BLER 1.39×10−1 , with 95% CIs 4.30×10−4 and 1.91×10−2 . The surrogate preserves the waterfa… view at source ↗

read the original abstract

Learned communication systems may evaluate stochastic channel surrogates millions of times inside differentiable training loops, making diffusion-style reverse sampling expensive. This paper proposes condition-wise Sinkhorn drifting, a one-shot channel surrogate that preserves the transmitted symbol and transports only the conditional output laws \(p(y\mid x)\). We formulate a conditional Sinkhorn objective over repeated outputs at the same transmitted symbol and train the generator with finite-sample barycentric velocities followed by detached particle regression. Experiments on additive white Gaussian noise (AWGN), Rayleigh fading, solid-state power amplifier (SSPA) nonlinearity, and a compact tapped-delay-line (TDL) channel compare direct drifting, joint Sinkhorn drifting, condition-wise Sinkhorn drifting, conditional denoising diffusion probabilistic modeling (DDPM), denoising diffusion implicit modeling (DDIM), and Wasserstein generative adversarial network (WGAN) references. Within the evaluated one-shot drifting-family variants, condition-wise Sinkhorn is strongest under conditional diagnostics and symbolic-coding checks, while diffusion remains strongest on the hardest downstream symbol-error-rate (SER) curves. The resulting operating point is a condition-preserving one-shot simulator for settings where repeated channel calls make diffusion-style sampling too costly.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Condition-wise Sinkhorn drifting gives a usable one-shot channel surrogate for training loops but the detached particle regression leaves open whether the conditional law stays unbiased.

read the letter

Hi,

The main takeaway is that this paper restricts Sinkhorn drifting to act condition-wise so the generator keeps the transmitted symbol fixed and only transports the conditional output law p(y|x). That formulation is new relative to the joint drifting and diffusion baselines they compare against.

The work does a clean job stating the practical bottleneck—millions of channel calls inside differentiable loops—and runs the same set of experiments on AWGN, Rayleigh, SSPA, and TDL channels. Condition-wise Sinkhorn beats the other one-shot drifting variants on the conditional diagnostics and symbol-coding checks, while diffusion still wins on the hardest SER curves. That gives a usable operating point when speed matters more than peak accuracy.

The soft spot is the training recipe itself. Finite-sample barycentric velocities followed by detached particle regression is claimed to optimize the conditional Sinkhorn objective, yet detaching severs the gradient path that would normally keep the transported particles on the right marginal. The abstract and method description supply no convergence argument or bias bound, so it is not clear the fixed point remains a true minimizer rather than an approximation whose error grows with particle count or conditioning granularity. All the reported gains rest on the samples being faithful to p(y|x); any systematic deviation would directly affect the SER comparisons.

This is for people already working on end-to-end learned communication systems who need faster surrogates than diffusion sampling. A reader who cares about conditional fidelity in differentiable pipelines will find the formulation and the empirical trade-offs worth seeing.

It deserves a serious referee. The idea is practical and the comparisons are concrete, even though the training step needs tighter justification.

I'd send it for review and ask specifically for a derivation or diagnostic showing the detached regression does not introduce measurable bias in the conditional law.

Referee Report

2 major / 1 minor

Summary. The paper proposes condition-wise Sinkhorn drifting as a one-shot channel surrogate for learned communication systems. It preserves the transmitted symbol and transports only the conditional laws p(y|x) via a conditional Sinkhorn objective, trained using finite-sample barycentric velocities followed by detached particle regression. Experiments on AWGN, Rayleigh fading, SSPA nonlinearity, and TDL channels compare it to direct/joint drifting variants, conditional DDPM/DDIM, and WGAN, claiming condition-wise Sinkhorn is strongest on conditional diagnostics and symbolic-coding checks while diffusion excels on SER curves. The operating point targets settings where repeated channel calls make diffusion sampling too costly.

Significance. If the central training procedure produces faithful samples from p(y|x), the work supplies a computationally lighter one-shot alternative to diffusion models for repeated evaluations inside differentiable training loops. The explicit multi-channel comparison and emphasis on condition preservation are strengths that could support practical adoption in communication-system design.

major comments (2)

[Abstract and training description] Abstract and training description: the claim that finite-sample barycentric velocities followed by detached particle regression optimizes the conditional Sinkhorn objective lacks any derivation or convergence argument showing that detachment preserves the marginal constraint and yields unbiased draws from p(y|x). This is load-bearing for all reported conditional diagnostics, symbolic-coding checks, and the final operating-point conclusion.
[Experiments section] Experiments section: reported superiority of condition-wise Sinkhorn over joint drifting and WGAN on conditional diagnostics rests on the samples being faithful to p(y|x); without evidence that bias does not grow with particle count or conditioning granularity, the cross-method ranking is not established.

minor comments (1)

[Abstract] The abstract refers to 'symbolic-coding checks' without defining the metric or procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. The two major comments correctly identify that the manuscript presents the finite-sample barycentric-velocity + detached-regression procedure without a supporting derivation or bias analysis. We address both points below and will revise the manuscript to strengthen the justification and empirical support.

read point-by-point responses

Referee: [Abstract and training description] Abstract and training description: the claim that finite-sample barycentric velocities followed by detached particle regression optimizes the conditional Sinkhorn objective lacks any derivation or convergence argument showing that detachment preserves the marginal constraint and yields unbiased draws from p(y|x). This is load-bearing for all reported conditional diagnostics, symbolic-coding checks, and the final operating-point conclusion.

Authors: We agree that the current text does not supply a derivation. The procedure is motivated by the fact that barycentric projections yield a consistent estimator of the conditional OT map and that detaching the regression targets avoids differentiating through the Sinkhorn iterations. In expectation the marginal constraint on the generated particles is preserved because the targets are themselves obtained from a feasible conditional plan; however, we acknowledge that a rigorous convergence statement is missing. We will add a concise paragraph (with a short proof sketch) in the revised training section clarifying the approximation properties and the role of detachment, while explicitly noting that the method remains an empirical surrogate whose fidelity is assessed downstream. revision: yes
Referee: [Experiments section] Experiments section: reported superiority of condition-wise Sinkhorn over joint drifting and WGAN on conditional diagnostics rests on the samples being faithful to p(y|x); without evidence that bias does not grow with particle count or conditioning granularity, the cross-method ranking is not established.

Authors: The concern is valid: the reported rankings rest on the assumption that any approximation bias remains small across the tested regimes. We will augment the experimental section with two new figures that (i) sweep particle count from 64 to 1024 while monitoring the same conditional diagnostics and (ii) vary the number of distinct conditioning symbols (granularity) on the AWGN and Rayleigh channels. These additions will either confirm stability of the ranking or qualify the operating regime in which condition-wise Sinkhorn remains preferable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents condition-wise Sinkhorn drifting as a novel one-shot surrogate formulated directly from a conditional Sinkhorn objective over repeated outputs at fixed symbols, trained via the stated finite-sample barycentric velocities plus detached particle regression procedure. No load-bearing step reduces a claimed prediction or uniqueness result to a self-citation, a fitted parameter renamed as output, or an ansatz imported from the authors' prior work. The empirical comparisons to DDPM, DDIM, WGAN and other drifting variants rest on external diagnostics (conditional metrics, SER curves) rather than internal redefinition of the target distribution. The derivation chain is therefore self-contained against the stated objective and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the approach rests on standard optimal transport concepts without additional postulates visible here.

axioms (1)

domain assumption The Sinkhorn algorithm can be conditioned on the input symbol to transport only p(y|x) while leaving the symbol unchanged
This is the core modeling choice stated in the abstract.

pith-pipeline@v0.9.1-grok · 5736 in / 1265 out tokens · 36194 ms · 2026-06-26T23:04:43.997088+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 2 linked inside Pith

[1]

Diffusion models for accurate channel distribution generation,

M. Kim, R. Fritschek, and R. F. Schaefer, “Diffusion models for accurate channel distribution generation,”arXiv preprint arXiv:2309.10505, 2023

arXiv 2023
[2]

Robust generation of channel distributions with diffusion mod- els,

——, “Robust generation of channel distributions with diffusion mod- els,” inICC 2024 – IEEE International Conference on Communications, 2024, pp. 330–335

2024
[3]

Generating high dimen- sional user-specific wireless channels using diffusion models,

T. Lee, J. Park, H. Kim, and J. G. Andrews, “Generating high dimen- sional user-specific wireless channels using diffusion models,”IEEE Transactions on Wireless Communications, vol. 25, pp. 2907–2921, 2026

2026
[4]

Digital twin of channel: Diffusion model for sensing-assisted statistical channel state information generation,

X. Gong, X. Liu, A. A. Lu, X. Gao, X. G. Xia, C.-X. Wang, and X. You, “Digital twin of channel: Diffusion model for sensing-assisted statistical channel state information generation,”IEEE Transactions on Wireless Communications, vol. 24, no. 5, pp. 3805–3821, 2025

2025
[5]

Generative diffusion models for high dimensional channel estimation,

X. Zhou, L. Liang, J. Zhang, P. Jiang, Y . Li, and S. Jin, “Generative diffusion models for high dimensional channel estimation,”IEEE Trans- actions on Wireless Communications, vol. 24, no. 7, pp. 5840–5854, 2025

2025
[6]

Diffusion- based generative prior for low-complexity MIMO channel estimation,

B. Fesl, M. Baur, F. Strasser, M. Joham, and W. Utschick, “Diffusion- based generative prior for low-complexity MIMO channel estimation,” IEEE Wireless Communications Letters, vol. 13, no. 12, pp. 3493–3497, 2024

2024
[7]

Generative diffusion model- based variational inference for MIMO channel estimation,

Z. Chen, H. Shin, and A. Nallanathan, “Generative diffusion model- based variational inference for MIMO channel estimation,”IEEE Trans- actions on Communications, vol. 73, no. 10, pp. 9254–9269, 2025

2025
[8]

Joint channel estimation and data detection in massive MIMO systems based on diffusion models,

N. Zilberstein, A. Swami, and S. Segarra, “Joint channel estimation and data detection in massive MIMO systems based on diffusion models,” inICASSP 2024 – IEEE International Conference on Acoustics, Speech and Signal Processing, 2024, pp. 13 291–13 295

2024
[9]

Conditional denoising diffusion-based channel estimation for fast time-varying MIMO-OFDM systems,

H. Fu, W. Si, and R. Liu, “Conditional denoising diffusion-based channel estimation for fast time-varying MIMO-OFDM systems,”Digital Signal Processing, vol. 164, p. 105283, 2025

2025
[10]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 6840–6851

2020
[11]

Denoising diffusion implicit models,

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inInternational Conference on Learning Representations, 2021

2021
[12]

Consistency models,

Y . Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency models,” inProceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 2023, pp. 32 211–32 252. [Online]. Available: https://proceedings.mlr.press/v202/song23a.html

2023
[13]

One-step diffusion with distribution matching distillation,

T. Yin, M. Gharbi, R. Zhang, E. Shechtman, F. Durand, W. T. Freeman, and T. Park, “One-step diffusion with distribution matching distillation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2024, pp. 6613–6623

2024
[14]

Generative modeling via drifting,

M. Deng, H. Li, T. Li, Y . Du, and K. He, “Generative modeling via drifting,”arXiv preprint arXiv:2602.04770, 2026

Pith/arXiv arXiv 2026
[15]

The geometry of noise: Why diffusion models don’t need noise conditioning,

M. Sahraee-Ardakan, M. Delbracio, and P. Milanfar, “The geometry of noise: Why diffusion models don’t need noise conditioning,”arXiv preprint arXiv:2602.18428, 2026

arXiv 2026
[16]

One- step generative modeling via Wasserstein gradient flows,

J. Han, P. Li, Q. Guo, R. Xu, S. Ermon, and E. J. Cand `es, “One- step generative modeling via Wasserstein gradient flows,”arXiv preprint arXiv:2605.11755, 2026

Pith/arXiv arXiv 2026
[17]

Sinkhorn distances: Lightspeed computation of optimal transport,

M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” inAdvances in Neural Information Processing Systems, vol. 26, 2013, pp. 2292–2300

2013
[18]

Interpolating between optimal transport and MMD using Sinkhorn divergences,

J. Feydy, T. S ´ejourn´e, F.-X. Vialard, S.-i. Amari, A. Trouv ´e, and G. Peyr ´e, “Interpolating between optimal transport and MMD using Sinkhorn divergences,” inProceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, vol. 89. PMLR, 2019, pp. 2681–2690. [Online]. ...

2019
[19]

Study on Channel Model for Frequencies from 0.5 to 100 GHz,

3GPP, “Study on Channel Model for Frequencies from 0.5 to 100 GHz,” 3GPP, Technical Report TR 38.901, 2022, version 17.1.0

2022

[1] [1]

Diffusion models for accurate channel distribution generation,

M. Kim, R. Fritschek, and R. F. Schaefer, “Diffusion models for accurate channel distribution generation,”arXiv preprint arXiv:2309.10505, 2023

arXiv 2023

[2] [2]

Robust generation of channel distributions with diffusion mod- els,

——, “Robust generation of channel distributions with diffusion mod- els,” inICC 2024 – IEEE International Conference on Communications, 2024, pp. 330–335

2024

[3] [3]

Generating high dimen- sional user-specific wireless channels using diffusion models,

T. Lee, J. Park, H. Kim, and J. G. Andrews, “Generating high dimen- sional user-specific wireless channels using diffusion models,”IEEE Transactions on Wireless Communications, vol. 25, pp. 2907–2921, 2026

2026

[4] [4]

Digital twin of channel: Diffusion model for sensing-assisted statistical channel state information generation,

X. Gong, X. Liu, A. A. Lu, X. Gao, X. G. Xia, C.-X. Wang, and X. You, “Digital twin of channel: Diffusion model for sensing-assisted statistical channel state information generation,”IEEE Transactions on Wireless Communications, vol. 24, no. 5, pp. 3805–3821, 2025

2025

[5] [5]

Generative diffusion models for high dimensional channel estimation,

X. Zhou, L. Liang, J. Zhang, P. Jiang, Y . Li, and S. Jin, “Generative diffusion models for high dimensional channel estimation,”IEEE Trans- actions on Wireless Communications, vol. 24, no. 7, pp. 5840–5854, 2025

2025

[6] [6]

Diffusion- based generative prior for low-complexity MIMO channel estimation,

B. Fesl, M. Baur, F. Strasser, M. Joham, and W. Utschick, “Diffusion- based generative prior for low-complexity MIMO channel estimation,” IEEE Wireless Communications Letters, vol. 13, no. 12, pp. 3493–3497, 2024

2024

[7] [7]

Generative diffusion model- based variational inference for MIMO channel estimation,

Z. Chen, H. Shin, and A. Nallanathan, “Generative diffusion model- based variational inference for MIMO channel estimation,”IEEE Trans- actions on Communications, vol. 73, no. 10, pp. 9254–9269, 2025

2025

[8] [8]

Joint channel estimation and data detection in massive MIMO systems based on diffusion models,

N. Zilberstein, A. Swami, and S. Segarra, “Joint channel estimation and data detection in massive MIMO systems based on diffusion models,” inICASSP 2024 – IEEE International Conference on Acoustics, Speech and Signal Processing, 2024, pp. 13 291–13 295

2024

[9] [9]

Conditional denoising diffusion-based channel estimation for fast time-varying MIMO-OFDM systems,

H. Fu, W. Si, and R. Liu, “Conditional denoising diffusion-based channel estimation for fast time-varying MIMO-OFDM systems,”Digital Signal Processing, vol. 164, p. 105283, 2025

2025

[10] [10]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 6840–6851

2020

[11] [11]

Denoising diffusion implicit models,

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inInternational Conference on Learning Representations, 2021

2021

[12] [12]

Consistency models,

Y . Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency models,” inProceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 2023, pp. 32 211–32 252. [Online]. Available: https://proceedings.mlr.press/v202/song23a.html

2023

[13] [13]

One-step diffusion with distribution matching distillation,

T. Yin, M. Gharbi, R. Zhang, E. Shechtman, F. Durand, W. T. Freeman, and T. Park, “One-step diffusion with distribution matching distillation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2024, pp. 6613–6623

2024

[14] [14]

Generative modeling via drifting,

M. Deng, H. Li, T. Li, Y . Du, and K. He, “Generative modeling via drifting,”arXiv preprint arXiv:2602.04770, 2026

Pith/arXiv arXiv 2026

[15] [15]

The geometry of noise: Why diffusion models don’t need noise conditioning,

M. Sahraee-Ardakan, M. Delbracio, and P. Milanfar, “The geometry of noise: Why diffusion models don’t need noise conditioning,”arXiv preprint arXiv:2602.18428, 2026

arXiv 2026

[16] [16]

One- step generative modeling via Wasserstein gradient flows,

J. Han, P. Li, Q. Guo, R. Xu, S. Ermon, and E. J. Cand `es, “One- step generative modeling via Wasserstein gradient flows,”arXiv preprint arXiv:2605.11755, 2026

Pith/arXiv arXiv 2026

[17] [17]

Sinkhorn distances: Lightspeed computation of optimal transport,

M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” inAdvances in Neural Information Processing Systems, vol. 26, 2013, pp. 2292–2300

2013

[18] [18]

Interpolating between optimal transport and MMD using Sinkhorn divergences,

J. Feydy, T. S ´ejourn´e, F.-X. Vialard, S.-i. Amari, A. Trouv ´e, and G. Peyr ´e, “Interpolating between optimal transport and MMD using Sinkhorn divergences,” inProceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, vol. 89. PMLR, 2019, pp. 2681–2690. [Online]. ...

2019

[19] [19]

Study on Channel Model for Frequencies from 0.5 to 100 GHz,

3GPP, “Study on Channel Model for Frequencies from 0.5 to 100 GHz,” 3GPP, Technical Report TR 38.901, 2022, version 17.1.0

2022