Soft Graph Transformer for MIMO Detection

Jiadong Hong; Lei Liu; Wenjie Wang; Xinyu Bian; Zhaoyang Zhang

arxiv: 2509.12694 · v5 · submitted 2025-09-16 · 💻 cs.LG · cs.IT· eess.SP· math.IT

Soft Graph Transformer for MIMO Detection

Jiadong Hong , Lei Liu , Xinyu Bian , Wenjie Wang , Zhaoyang Zhang This is my paper

Pith reviewed 2026-05-18 16:10 UTC · model grok-4.3

classification 💻 cs.LG cs.ITeess.SPmath.IT

keywords MIMO detectiongraph transformersoft-input soft-outputneural receiverwireless communicationsattention mechanism

0 comments

The pith

Soft Graph Transformer models MIMO detection as two subgraphs linked by graph-aware cross-attention to reach near-maximum-likelihood performance while accepting soft priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the Soft Graph Transformer as a neural detector for multiple-input multiple-output wireless systems. It decomposes the underlying factor graph into a symbol subgraph and a constraint subgraph, then applies ordinary self-attention inside each subgraph while introducing a graph-aware cross-attention layer to exchange information between them. This design lets the model ingest soft prior probabilities from other receiver modules and emit soft outputs in return. The architecture is meant to avoid both the exponential cost of exact maximum-likelihood search and the limiting approximations used by classical message-passing algorithms. If the approach works as claimed, it supplies a practical, extensible receiver block that can be inserted into iterative detection and decoding chains.

Core claim

By representing MIMO detection as interactions between a symbol subgraph and a constraint subgraph, and by combining intra-subgraph self-attention with a dedicated graph-aware cross-attention mechanism for inter-subgraph message passing, the Soft Graph Transformer produces soft-input soft-output estimates that approach maximum-likelihood accuracy while remaining computationally tractable for finite-dimensional systems.

What carries the argument

Graph-aware cross-attention that performs structured message passing between the symbol subgraph and the constraint subgraph.

Load-bearing premise

MIMO detection can be usefully split into symbol and constraint subgraphs whose internal dependencies are captured by standard self-attention and whose cross-subgraph exchanges are captured by the proposed graph-aware cross-attention.

What would settle it

An experiment on a large MIMO system in which removing the graph-aware cross-attention layer causes detection error rates to rise well above those of maximum-likelihood or strong message-passing baselines while keeping all other components fixed.

read the original abstract

We propose the Soft Graph Transformer (SGT), a soft-input-soft-output neural architecture designed for MIMO detection. While Maximum Likelihood (ML) detection achieves optimal accuracy, its exponential complexity makes it infeasible in large systems, and conventional message-passing algorithms rely on asymptotic assumptions that often fail in finite dimensions. Recent Transformer-based detectors show strong performance but typically overlook the MIMO factor graph structure and cannot exploit prior soft information. SGT addresses these limitations by combining self-attention, which encodes contextual dependencies within symbol and constraint subgraphs, with graph-aware cross-attention, which performs structured message passing across subgraphs. Its soft-input interface allows the integration of auxiliary priors, producing effective soft outputs while maintaining computational efficiency. Experiments demonstrate that SGT achieves near-ML performance and offers a flexible and interpretable framework for receiver systems that leverage soft priors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SGT structures a transformer around MIMO factor-graph subgraphs with soft inputs and cross-attention, but offers no derivation that the attention reproduces the needed messages and reports no concrete performance numbers.

read the letter

The main thing to know is that this work splits the MIMO factor graph into symbol and constraint subgraphs, applies self-attention inside each, and adds a custom graph-aware cross-attention layer to move information between them while accepting soft priors at the input. The goal is a soft-in soft-out detector that stays efficient yet gets close to ML accuracy without relying on large-system approximations.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Soft Graph Transformer (SGT), a soft-input soft-output neural architecture for MIMO detection. It models the problem as a bipartite factor graph consisting of symbol and constraint subgraphs, applies standard self-attention within each subgraph to encode contextual dependencies, and introduces a graph-aware cross-attention mechanism to perform structured message passing between subgraphs. The design supports integration of auxiliary soft priors and is claimed to achieve near-maximum-likelihood performance with computational efficiency suitable for large systems.

Significance. If the empirical claims are substantiated with rigorous comparisons, the approach could provide a flexible and interpretable neural receiver framework that explicitly incorporates MIMO factor-graph structure and soft information, potentially improving upon prior Transformer-based detectors that ignore this structure.

major comments (2)

[Architecture description (cross-attention mechanism)] The central claim that graph-aware cross-attention adequately captures the conditional dependencies induced by the channel matrix (without explicit enumeration of the joint posterior or dropping higher-order correlations) is load-bearing for the near-ML performance assertion, yet no derivation or analysis is provided showing that the attention scores reproduce factor-graph messages or their soft approximations under finite dimensions or ill-conditioned channels.
[Experiments section] The experimental validation of near-ML performance lacks reported quantitative results, specific baselines (e.g., ML, MMSE, other message-passing or Transformer detectors), error bars, dataset details, or channel conditions, which prevents verification of the performance claims and the assertion that the architecture handles finite-dimensional cases where asymptotic assumptions fail.

minor comments (2)

Clarify the exact formulation of the graph-aware cross-attention (e.g., how the bipartite structure is encoded in the attention mask or scores) with explicit equations to improve reproducibility.
The abstract states strong performance without any numbers or references to tables/figures; ensure the main text consistently ties claims to specific results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions planned for the next version.

read point-by-point responses

Referee: [Architecture description (cross-attention mechanism)] The central claim that graph-aware cross-attention adequately captures the conditional dependencies induced by the channel matrix (without explicit enumeration of the joint posterior or dropping higher-order correlations) is load-bearing for the near-ML performance assertion, yet no derivation or analysis is provided showing that the attention scores reproduce factor-graph messages or their soft approximations under finite dimensions or ill-conditioned channels.

Authors: We agree that a formal derivation linking the cross-attention scores to exact factor-graph messages would strengthen the theoretical justification. The architecture is explicitly constructed around the bipartite factor graph: self-attention operates within symbol and constraint subgraphs to encode local context, while the graph-aware cross-attention routes information between the two partitions in a manner analogous to message passing. This design choice avoids explicit enumeration of the joint posterior by using learned attention to approximate soft dependencies induced by the channel matrix. Although the current manuscript does not contain a closed-form proof of equivalence under all channel conditions, empirical results indicate that the mechanism captures the necessary correlations sufficiently for near-ML performance. We will add a dedicated analysis subsection that (i) derives the attention update for a small MIMO example, (ii) discusses the approximation to soft messages, and (iii) examines behavior on ill-conditioned finite-dimensional channels. revision: partial
Referee: [Experiments section] The experimental validation of near-ML performance lacks reported quantitative results, specific baselines (e.g., ML, MMSE, other message-passing or Transformer detectors), error bars, dataset details, or channel conditions, which prevents verification of the performance claims and the assertion that the architecture handles finite-dimensional cases where asymptotic assumptions fail.

Authors: We accept that the experimental section must be expanded to allow independent verification. In the revised manuscript we will report concrete quantitative results (BER/SER curves) together with the following: exact ML detection for small systems, MMSE, AMP, and representative Transformer-based detectors as baselines; error bars computed over multiple independent channel realizations; full dataset generation details (i.i.d. Rayleigh fading, antenna configurations from 4x4 to 64x64, SNR ranges, and training/validation splits); and additional experiments on finite-dimensional and ill-conditioned channels to demonstrate that performance does not rely on asymptotic assumptions. revision: yes

Circularity Check

0 steps flagged

No circularity: learned architecture validated empirically

full rationale

The paper proposes the Soft Graph Transformer as a trainable neural network that combines standard self-attention within symbol and constraint subgraphs with a custom graph-aware cross-attention mechanism for MIMO detection. All performance claims rest on experimental training and evaluation against ML baselines rather than any closed-form derivation or first-principles reduction. No equations are presented that equate a prediction to a fitted input by construction, and the provided abstract and context contain no load-bearing self-citations that would render the central architectural choices tautological. The model is explicitly data-driven with soft-input interfaces, rendering the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The design rests on the domain assumption that MIMO detection admits a clean bipartite factor-graph decomposition and that attention mechanisms can replace conventional message-passing updates. No new physical entities are postulated; the free parameters are the usual neural-network weights learned from data.

free parameters (1)

Neural network weights
All attention and feed-forward parameters are fitted during training on simulated or measured MIMO data.

axioms (1)

domain assumption MIMO detection can be modeled as message passing on a factor graph with symbol and constraint subgraphs.
Invoked when the architecture splits self-attention within subgraphs and cross-attention between them.

pith-pipeline@v0.9.0 · 5680 in / 1302 out tokens · 36349 ms · 2026-05-18T16:10:23.545049+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SGT integrates self-attention within symbol and constraint subgraphs with graph-aware cross-attention for structured message passing
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Factor-graph-guided self- and cross-attention jointly realize contextual encoding and iterative message passing

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Yet, efficient symbol detection remains challenging

INTRODUCTION Multiple-Input Multiple-Output (MIMO) systems are fundamental to modern wireless communications, offering high spectral effi- ciency and robust links. Yet, efficient symbol detection remains challenging. Maximum Likelihood (ML) detection achieves optimal accuracy but is computationally prohibitive for large systems. Low- complexity message-pa...

work page
[2]

Soft Graph Transformer for MIMO Detection

SYSTEM MODEL We consider a standard Multiple-Input Multiple-Output (MIMO) communication system, assuming perfect channel state information (CSI) at the receiver. The transmitter first encodes the informa- tion bits using an error correction code and maps the encoded bits to complex constellation symbols. These symbols are arranged into a time-frequency re...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Model Overview The proposedSoft Graph Transformer(SGT) is a modular neural architecture for soft-input soft-output (SISO) detection in structured linear noisy systems

METHODOLOGY 3.1. Model Overview The proposedSoft Graph Transformer(SGT) is a modular neural architecture for soft-input soft-output (SISO) detection in structured linear noisy systems. As illustrated in Fig. 1, SGT processes prob- abilistic bit-level inputs and raw received features, updates them through a graph-informed Transformer backbone, and outputs ...

work page
[4]

EXPERIMENT 4.1. Ablation Studies We assess the effectiveness of graph-aware tokenization and cross- attention in standalone MIMO detection under Perfect-CSI Rayleigh fading channels with Quadrature Phase-Shift Keying (QPSK) mod- ulation. All models have a model dimension of 128 and an 8-layer architecture trained with the same learning rates, batch sizes,...

work page
[5]

CONCLUSION In this work, we proposed theSoft Graph Transformer(SGT), a soft-input–soft-output MIMO detector that unifies the contextual modeling ability of self-attention with the structured message pass- ing of factor graphs. Unlike conventional detectors that either suffer from exponential complexity (ML) or fragility in finite dimensions (AMP, OAMP, MA...

work page
[6]

Message-passing algorithms for compressed sensing,

David L. Donoho, Arian Maleki, and Andrea Montanari, “Message-passing algorithms for compressed sensing,”Pro- ceedings of the National Academy of Sciences, vol. 106, no. 45, pp. 18914–18919, Nov. 2009

work page 2009
[7]

Orthogonal amp,

Junjie Ma and Li Ping, “Orthogonal amp,”IEEE Access, vol. 5, pp. 2020–2033, 2017

work page 2020
[8]

Memory amp,

Lei Liu, Shunqi Huang, and Brian M. Kurkoski, “Memory amp,”IEEE Transactions on Information Theory, vol. 68, no. 12, pp. 8015–8039, 2022

work page 2022
[9]

A model-driven deep learning network for mimo detection,

Hengtao He, Chao-Kai Wen, Shi Jin, and Geoffrey Ye Li, “A model-driven deep learning network for mimo detection,” in 2018 IEEE Global Conference on Signal and Information Pro- cessing (GlobalSIP). IEEE, 2018, pp. 584–588

work page 2018
[10]

Model-driven deep learning for mimo detection,

Hengtao He, Chao-Kai Wen, Shi Jin, and Geoffrey Ye Li, “Model-driven deep learning for mimo detection,”IEEE Transactions on Signal Processing, vol. 68, pp. 1702–1715, 2020

work page 2020
[11]

Deep mimo de- tection,

Neev Samuel, Tzvi Diskin, and Ami Wiesel, “Deep mimo de- tection,” in2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPA WC). IEEE, 2017, pp. 1–5

work page 2017
[12]

Attention is all you need,

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017
[13]

Re-mimo: Recurrent and permutation equivariant neural mimo detection,

Kumar Pratik, Bhaskar D Rao, and Max Welling, “Re-mimo: Recurrent and permutation equivariant neural mimo detection,” IEEE Transactions on Signal Processing, vol. 69, pp. 459–473, 2020

work page 2020
[14]

Transformer learning- based efficient mimo detection method,

Saleem Ahmed, Sooyoung Kim, et al., “Transformer learning- based efficient mimo detection method,”Physical Communi- cation, vol. 70, pp. 102637, 2025

work page 2025
[15]

Error correction code trans- former,

Yoni Choukroun and Lior Wolf, “Error correction code trans- former,”Advances in Neural Information Processing Systems, vol. 35, pp. 38695–38705, 2022

work page 2022
[16]

CrossMPT: Cross-attention message-passing transformer for error correcting codes,

Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Yongjune Kim, and Jong-Seon No, “CrossMPT: Cross-attention message-passing transformer for error correcting codes,” in The Thirteenth International Conference on Learning Repre- sentations, 2025

work page 2025

[1] [1]

Yet, efficient symbol detection remains challenging

INTRODUCTION Multiple-Input Multiple-Output (MIMO) systems are fundamental to modern wireless communications, offering high spectral effi- ciency and robust links. Yet, efficient symbol detection remains challenging. Maximum Likelihood (ML) detection achieves optimal accuracy but is computationally prohibitive for large systems. Low- complexity message-pa...

work page

[2] [2]

Soft Graph Transformer for MIMO Detection

SYSTEM MODEL We consider a standard Multiple-Input Multiple-Output (MIMO) communication system, assuming perfect channel state information (CSI) at the receiver. The transmitter first encodes the informa- tion bits using an error correction code and maps the encoded bits to complex constellation symbols. These symbols are arranged into a time-frequency re...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Model Overview The proposedSoft Graph Transformer(SGT) is a modular neural architecture for soft-input soft-output (SISO) detection in structured linear noisy systems

METHODOLOGY 3.1. Model Overview The proposedSoft Graph Transformer(SGT) is a modular neural architecture for soft-input soft-output (SISO) detection in structured linear noisy systems. As illustrated in Fig. 1, SGT processes prob- abilistic bit-level inputs and raw received features, updates them through a graph-informed Transformer backbone, and outputs ...

work page

[4] [4]

EXPERIMENT 4.1. Ablation Studies We assess the effectiveness of graph-aware tokenization and cross- attention in standalone MIMO detection under Perfect-CSI Rayleigh fading channels with Quadrature Phase-Shift Keying (QPSK) mod- ulation. All models have a model dimension of 128 and an 8-layer architecture trained with the same learning rates, batch sizes,...

work page

[5] [5]

CONCLUSION In this work, we proposed theSoft Graph Transformer(SGT), a soft-input–soft-output MIMO detector that unifies the contextual modeling ability of self-attention with the structured message pass- ing of factor graphs. Unlike conventional detectors that either suffer from exponential complexity (ML) or fragility in finite dimensions (AMP, OAMP, MA...

work page

[6] [6]

Message-passing algorithms for compressed sensing,

David L. Donoho, Arian Maleki, and Andrea Montanari, “Message-passing algorithms for compressed sensing,”Pro- ceedings of the National Academy of Sciences, vol. 106, no. 45, pp. 18914–18919, Nov. 2009

work page 2009

[7] [7]

Orthogonal amp,

Junjie Ma and Li Ping, “Orthogonal amp,”IEEE Access, vol. 5, pp. 2020–2033, 2017

work page 2020

[8] [8]

Memory amp,

Lei Liu, Shunqi Huang, and Brian M. Kurkoski, “Memory amp,”IEEE Transactions on Information Theory, vol. 68, no. 12, pp. 8015–8039, 2022

work page 2022

[9] [9]

A model-driven deep learning network for mimo detection,

Hengtao He, Chao-Kai Wen, Shi Jin, and Geoffrey Ye Li, “A model-driven deep learning network for mimo detection,” in 2018 IEEE Global Conference on Signal and Information Pro- cessing (GlobalSIP). IEEE, 2018, pp. 584–588

work page 2018

[10] [10]

Model-driven deep learning for mimo detection,

Hengtao He, Chao-Kai Wen, Shi Jin, and Geoffrey Ye Li, “Model-driven deep learning for mimo detection,”IEEE Transactions on Signal Processing, vol. 68, pp. 1702–1715, 2020

work page 2020

[11] [11]

Deep mimo de- tection,

Neev Samuel, Tzvi Diskin, and Ami Wiesel, “Deep mimo de- tection,” in2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPA WC). IEEE, 2017, pp. 1–5

work page 2017

[12] [12]

Attention is all you need,

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017

[13] [13]

Re-mimo: Recurrent and permutation equivariant neural mimo detection,

Kumar Pratik, Bhaskar D Rao, and Max Welling, “Re-mimo: Recurrent and permutation equivariant neural mimo detection,” IEEE Transactions on Signal Processing, vol. 69, pp. 459–473, 2020

work page 2020

[14] [14]

Transformer learning- based efficient mimo detection method,

Saleem Ahmed, Sooyoung Kim, et al., “Transformer learning- based efficient mimo detection method,”Physical Communi- cation, vol. 70, pp. 102637, 2025

work page 2025

[15] [15]

Error correction code trans- former,

Yoni Choukroun and Lior Wolf, “Error correction code trans- former,”Advances in Neural Information Processing Systems, vol. 35, pp. 38695–38705, 2022

work page 2022

[16] [16]

CrossMPT: Cross-attention message-passing transformer for error correcting codes,

Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Yongjune Kim, and Jong-Seon No, “CrossMPT: Cross-attention message-passing transformer for error correcting codes,” in The Thirteenth International Conference on Learning Repre- sentations, 2025

work page 2025