Learning-to-Defer with Expert-Conditional Advice

Axel Carlier; Lai Xing Ng; Le\"ina Montreuil; Wei Tsang Ooi; Yannis Montreuil

REVIEW 4 major objections 2 minor 44 references

Separated heads for routing and advice are inconsistent; a joint expert–advice surrogate recovers the Bayes-optimal deferral policy.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.5

2026-07-14 21:22 UTC pith:NJA6KD2E

load-bearing objection We cannot audit the L2D-with-advice claims: the supplied full text is a different paper (CSI/GMTC, 2603.14325), so the consistency theorems are not checkable from what we have. the 4 major comments →

arxiv 2603.14324 v5 pith:NJA6KD2E submitted 2026-03-15 stat.ML cs.LG

Learning-to-Defer with Expert-Conditional Advice

Yannis Montreuil , Le\"ina Montreuil , Axel Carlier , Lai Xing Ng , Wei Tsang Ooi This is my paper

classification stat.ML cs.LG

keywords learning to deferexpert advicesurrogate consistencyH-consistencyexcess riskBayes-optimal policycomposite action space

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard Learning-to-Defer picks which expert should handle each input, but treats the information each expert sees as fixed. Many real systems also choose what extra context that expert gets—retrieved documents, tool outputs, or escalation notes—after the routing decision. The paper shows that a natural family of surrogates that train routing and advice with separate heads is inconsistent even in the smallest nontrivial case, so they need not recover the optimal joint policy. It then defines an augmented surrogate over the combined expert–advice action space, proves an H-consistency guarantee and an excess-risk transfer bound, and shows that the Bayes-optimal policy is recovered in the limit. Empirically the method beats ordinary Learning-to-Defer on tabular, language, and multi-modal tasks and adjusts how much advice it buys to the cost regime; a synthetic check reproduces the predicted failure of separated surrogates.

Core claim

A broad class of separated surrogates that learn routing and advice with distinct heads is inconsistent even in the smallest nontrivial setting. An augmented surrogate that treats the composite expert–advice pair as the action recovers Bayes optimality in the limit via an H-consistency guarantee and an excess-risk transfer bound.

What carries the argument

The augmented surrogate on the composite expert–advice action space: it replaces separate routing and advice heads with a joint surrogate whose H-consistency and excess-risk transfer bound force recovery of the Bayes-optimal joint policy.

Load-bearing premise

That treating practical advice (documents, tools, escalation context) as a finite composite action space, and that the H-consistency conditions hold for the hypothesis classes and costs used in the experiments, is enough for the consistency theory to carry over from the abstract setting.

What would settle it

On the paper’s synthetic benchmark, train a separated two-head surrogate in the smallest nontrivial expert–advice setting; if it still converges to the Bayes-optimal joint policy (rather than the predicted inconsistent limit), the inconsistency claim fails. On the real tasks, if the joint method does not improve over standard Learning-to-Defer or fails to adapt advice acquisition to cost, the practical claim fails.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Routing and advice should be optimized jointly; separate heads are not a safe default even for simple instances.
Systems that retrieve documents, call tools, or escalate can treat those choices as part of the deferral action and still target Bayes optimality.
As the joint surrogate is driven to zero excess risk, the induced policy approaches the cost-optimal expert–advice map.
Advice acquisition can be made cost-aware: the method spends more on context only when the cost regime justifies it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same composite-action construction may apply to other two-stage decisions (who acts, then what context they receive) beyond classical Learning-to-Defer.
Inconsistency of separated heads suggests that multi-head architectures for sequential decisions need joint consistency proofs, not only separate calibration of each head.
If advice is continuous or combinatorial (large document sets), the finite composite space may need approximation theory the paper leaves open.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

We cannot audit the L2D-with-advice claims: the supplied full text is a different paper (CSI/GMTC, 2603.14325), so the consistency theorems are not checkable from what we have.

read the letter

The one thing you need to know: the abstract is for Learning-to-Defer with expert-conditional advice (2603.14324), but the full manuscript we were given is Fundamental Limits of CSI Compression / GMTC (self-labeled 2603.14325). So I cannot verify the inconsistency proof, the H-consistency theorem, the excess-risk transfer bound, or the experiments. That is a document identity failure, not a soft spot inside a correct L2D argument.

From the abstract alone, the contribution is clear and non-trivial if true. Standard L2D assumes each expert’s information is fixed when you route. Here, after you pick an expert you also choose what advice they get (retrieval, tools, escalation). The paper claims a broad family of natural separated surrogates—routing head plus advice head—is inconsistent even in the smallest non-trivial setting, and that an augmented surrogate on the composite expert–advice action space is H-consistent with an excess-risk transfer bound that recovers the Bayes policy in the limit. Experiments are said to beat standard L2D and to adapt advice spend to cost, with a synthetic check of the separated-surrogate failure mode. That is a real problem formulation for hybrid decision systems and tool-use pipelines, not a cosmetic rename.

What I cannot do is confirm any of it. No definitions of the composite action space, no statement of the separated family, no theorem statements, no proofs, no tables. The CSI paper in the cache is a different, self-contained RD/transform-coding story; it does not substitute. Free parameters (advice costs, hypothesis class H) and the modeling reduction to a finite composite action space are exactly the places a referee would pressure—and we have no text to pressure.

Who it is for: people who train deferral / routing with post-routing information acquisition. Value is high if the negative result on separated heads is clean and the joint surrogate is practical. Right now I would not bring it to reading group or cite it until we have the correct PDF. A serious editor should still send the real paper to peer review on the strength of the abstract’s claims; the topic and the consistency angle deserve referee time. My recommendation: hold judgment, pull 2603.14324, then re-read. Do not treat the CSI manuscript as evidence for or against the L2D results.

Referee Report

4 major / 2 minor

Summary. The manuscript (as supplied under the title Learning-to-Defer with Expert-Conditional Advice) claims that standard Learning-to-Defer is incomplete when experts can also receive selectable advice (documents, tools, escalation context). It asserts that a broad family of natural separated surrogates with distinct routing and advice heads is inconsistent even in the smallest non-trivial setting, and that an augmented surrogate on the composite expert–advice action space admits an H-consistency guarantee and excess-risk transfer bound that recovers the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks are claimed to improve over standard L2D and to adapt advice acquisition to the cost regime, with a synthetic check of the separated-surrogate failure mode. The body text actually provided, however, is a different paper on Gaussian-mixture transform coding (GMTC) for CSI compression in FDD massive MIMO (self-identified as arXiv:2603.14325), with RD bounds, reverse-waterfilling, and COST2100 experiments.

Significance. If the L2D-with-advice claims were substantiated, they would be a useful extension of the surrogate-consistency literature to joint routing-and-information-acquisition decisions that arise in modern retrieval- and tool-augmented systems. The abstract’s structure (inconsistency of separated heads; H-consistency of a joint surrogate; excess-risk transfer) is the standard and valuable pattern in that literature. The supplied body, by contrast, is a coherent CSI-compression paper with a clean multi-modal reverse-waterfilling result and strong empirical RD gains, but that is not the paper under review. Because the two documents do not match, the significance of the claimed L2D contribution cannot be assessed from the materials provided.

major comments (4)

Document identity failure: the abstract and title describe Learning-to-Defer with Expert-Conditional Advice (stat.ML, 2603.14324), but the full manuscript text is “Fundamental Limits of CSI Compression in FDD Massive MIMO” (self-labeled arXiv:2603.14325). No definition of the composite expert–advice action space, no statement of the separated-surrogate family, no H-consistency theorem, no excess-risk transfer bound, and no L2D experiments appear. The central claims are therefore not checkable from the supplied manuscript.
Because the body never states the surrogate losses, hypothesis class H, or cost structure for advice, the abstract’s claim that “a broad family of natural separated surrogates is inconsistent even in the smallest non-trivial setting” cannot be verified or refuted. A referee cannot assess whether the inconsistency is load-bearing or an artifact of a particular reduction.
The claimed H-consistency guarantee and excess-risk transfer bound that “yield recovery of the Bayes-optimal policy in the limit” are not present in the provided text. Without the theorem statements, assumptions on H, and proof sketches, the recovery claim cannot be audited.
Experimental claims (tabular/language/multi-modal gains; synthetic confirmation of the separated-surrogate failure mode; cost-regime adaptation of advice acquisition) have no corresponding figures, tables, or protocols in the supplied manuscript, which instead reports COST2100 CSI NMSE and FLOPs comparisons. The empirical support for the L2D contribution is therefore absent.

minor comments (2)

The supplied CSI manuscript itself has presentation issues (e.g., incomplete citation “Proof. The formal proof is provided in [?], [36]” in Lemma 1; dense Fig. 1 encoding; heavy use of special characters that render poorly), but these are irrelevant to the L2D paper under review.
If the correct L2D manuscript is resubmitted, the authors should ensure that the abstract, arXiv id, and full text refer to the same work and that all theorem numbers and experiment sections are present.

Circularity Check

0 steps flagged

No circular derivation: RD sandwich and single global reverse-waterfilling are derived from classical Gaussian RD + KKT, not forced by fitted parameters or self-definition.

full rationale

The supplied full manuscript is the CSI/GMTC paper (arXiv:2603.14325), not the L2D-with-advice abstract (2603.14324). On the text that can be audited: the load-bearing chain is (i) model CSI as a proper complex Gaussian mixture (Sec. II), (ii) classical single-Gaussian RD via KLT + reverse-waterfilling (Prop. 1, Cover/Thomas), (iii) conditional mixture RD as a convex allocation over components, optimized by KKT to a single shared water level (Thm. 1), (iv) genie-aided converse R*(D) ≥ R_cond(D) by free side information (Thm. 2), (v) label-aware achievability R_cond(D) + H(C)/(τN) by explicit lossless label + component-matched TC (Thm. 3). None of these steps defines the target in terms of itself, renames a fit as a prediction, or imports a uniqueness theorem that forbids alternatives. Lemma 2’s entropy decomposition is the standard chain rule (with a self-citation only for a detailed write-up); it is not used to force the RD claim. Synthetic experiments match the generative model by design (standard model-matched validation) and are paired with COST2100 and neural baselines. No circular step meets the quote-and-reduce bar.

Axiom & Free-Parameter Ledger

2 free parameters · 4 axioms · 2 invented entities

Abstract-only review of Learning-to-Defer with Expert-Conditional Advice. Free parameters and invented entities cannot be exhaustively listed without the full text. Load-bearing modeling choices visible from the abstract are listed as domain assumptions. The attached full manuscript is a different paper and was not used as a source of axioms for this report.

free parameters (2)

Advice acquisition costs / cost regime
Experiments claim adaptation of advice-acquisition behavior to the cost regime; the specific cost values and tradeoffs are free design choices that shape empirical conclusions.
Hypothesis class H for the composite surrogate
H-consistency depends on the function class; the abstract does not specify H, so the guarantee’s scope is a free modeling choice until the full paper is available.

axioms (4)

domain assumption Standard Learning-to-Defer assumes expert information is fixed at decision time; many systems violate this by choosing advice after expert selection.
Problem setup stated in the abstract; load-bearing for why the new formulation is needed.
ad hoc to paper A broad family of natural separated surrogates (distinct heads for routing and advice) is the relevant baseline class to prove inconsistent.
The negative result’s scope depends on how ‘broad family’ and ‘natural’ are defined; not checkable from abstract alone.
domain assumption Bayes-optimal policy is defined over the joint expert–advice action space under expected cost minimization.
Standard decision-theoretic framing for L2D-style problems; used as the recovery target.
standard math H-consistency of the augmented surrogate plus excess-risk transfer implies recovery of the Bayes-optimal policy in the limit.
Standard surrogate-consistency program in statistical learning theory; claimed but not verified here.

invented entities (2)

Learning-to-Defer with advice (joint expert–advice decision problem) no independent evidence
purpose: Name and formalize routing plus post-selection advice acquisition as one learning problem.
Problem framing introduced in the abstract; may overlap prior work on tool use or sequential deferral, which cannot be checked without full related work.
Augmented surrogate on composite expert–advice action space no independent evidence
purpose: Provide a consistent training objective that avoids the failure mode of separated heads.
Central methodological object; existence of H-consistency is the paper’s main theoretical claim.

pith-pipeline@v1.1.0-grok45 · 25530 in / 3029 out tokens · 30762 ms · 2026-07-14T21:22:05.293106+00:00 · methodology

0 comments

read the original abstract

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 2 linked inside Pith

[1]

An overview of limited feedback in wireless commu- nication systems,

D. J. Love, R. W. Heath, V . K. N. Lau, D. Gesbert, B. D. Rao, and M. Andrews, “An overview of limited feedback in wireless commu- nication systems,”IEEE J. Sel. Areas Commun., vol. 26, no. 8, pp. 1341–1365, 2008

2008
[2]

MIMO broadcast channels with finite-rate feedback,

N. Jindal, “MIMO broadcast channels with finite-rate feedback,”IEEE Trans. Inf. Theory, vol. 52, no. 11, pp. 5045–5060, 2006. 14

2006
[3]

Achievable rates of MIMO downlink beamforming with non-perfect CSI: A comparison between quantized and analog feedback,

G. Caire, N. Jindal, and M. Kobayashi, “Achievable rates of MIMO downlink beamforming with non-perfect CSI: A comparison between quantized and analog feedback,” inProc. ASILOMAR Signals, Syst., Comput., 2006, pp. 354–358

2006
[4]

Space-time interference alignment and degree- of-freedom regions for the MISO broadcast channel with periodic CSI feedback,

N. Lee and R. W. Heath, “Space-time interference alignment and degree- of-freedom regions for the MISO broadcast channel with periodic CSI feedback,”IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 515–528, 2014

2014
[5]

Adaptive feedback scheme on K-cell MISO inter- fering broadcast channel with limited feedback,

N. Lee and W. Shin, “Adaptive feedback scheme on K-cell MISO inter- fering broadcast channel with limited feedback,”IEEE Trans. Wireless Commun., vol. 10, no. 2, pp. 401–406, 2011

2011
[6]

Joint user selection, power allocation, and precoding design with imperfect CSIT for multi-cell MU- MIMO downlink systems,

J. Choi, N. Lee, S.-N. Hong, and G. Caire, “Joint user selection, power allocation, and precoding design with imperfect CSIT for multi-cell MU- MIMO downlink systems,”IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 162–176, 2020

2020
[7]

The COST 2100 MIMO channel model,

L. Liu, C. Oestges, J. Poutanen, K. Haneda, P. Vainikainen, F. Quitin, F. Tufvesson, and P. De Doncker, “The COST 2100 MIMO channel model,”IEEE Wireless Commun., vol. 19, no. 6, pp. 92–99, 2012

2012
[8]

Study on channel model for frequencies from 0.5 to 100 GHz (release 16),

3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz (release 16),” 3GPP, Tech. Rep. TR 38.901, 11 2020

2020
[9]

A novel millimeter- wave channel simulator and applications for 5G wireless communica- tions,

S. Sun, G. R. MacCartney, and T. S. Rappaport, “A novel millimeter- wave channel simulator and applications for 5G wireless communica- tions,” inProc. IEEE Int. Conf. Commun. (ICC), 2017, pp. 1–7

2017
[10]

Joint spatial division and multiplexing for FDD massive MIMO,

A. Adhikary, J. Nam, J. Ahn, and G. Caire, “Joint spatial division and multiplexing for FDD massive MIMO,”IEEE Trans. Inf. Theory, vol. 59, no. 10, pp. 6441–6463, 2013

2013
[11]

Downlink CSIT under compressed feedback: Joint versus separate source-channel coding,

Y . Song, T. Yang, M. Barzegar Khalilsarai, and G. Caire, “Downlink CSIT under compressed feedback: Joint versus separate source-channel coding,”IEEE Trans. Wireless Commun., vol. 24, no. 10, pp. 8429–8444, 2025

2025
[12]

Theoretical foundations of transform coding,

V . K. Goyal, “Theoretical foundations of transform coding,”IEEE Signal Process. Mag., vol. 18, no. 5, pp. 9–21, 2001

2001
[13]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ, USA: Wiley-Interscience, 2006

2006
[14]

Grassmannian beamforming for multiple-input multiple-output wireless systems,

D. J. Love, R. W. Heath, and T. Strohmer, “Grassmannian beamforming for multiple-input multiple-output wireless systems,”IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2735–2747, 2003

2003
[15]

Compressive sensing based channel feedback protocols for spatially-correlated massive antenna arrays,

P.-H. Kuo, H. T. Kung, and P.-A. Ting, “Compressive sensing based channel feedback protocols for spatially-correlated massive antenna arrays,” inProc. IEEE Wireless Commun. and Net. Conf., 2012, pp. 492–497

2012
[16]

High-dimensional CSI acquisition in massive MIMO: Sparsity-inspired approaches,

J.-C. Shen, J. Zhang, K.-C. Chen, and K. B. Letaief, “High-dimensional CSI acquisition in massive MIMO: Sparsity-inspired approaches,”IEEE Syst. J., vol. 11, no. 1, pp. 32–40, 2017

2017
[17]

Massive MIMO channel subspace esti- mation from low-dimensional projections,

S. Haghighatshoar and G. Caire, “Massive MIMO channel subspace esti- mation from low-dimensional projections,”IEEE Trans. Signal Process., vol. 66, no. 2, pp. 350–365, 2018

2018
[18]

FDD massive MIMO via UL/DL channel covariance extrapolation and active channel sparsification,

M. Barzegar Khalilsarai, S. Haghighatshoar, X. Yi, and G. Caire, “FDD massive MIMO via UL/DL channel covariance extrapolation and active channel sparsification,”IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 121–135, 2019

2019
[19]

Structured channel covariance estimation from limited samples for large antenna arrays,

T. Yang, M. Barzegar Khalilsarai, S. Haghighatshoar, and G. Caire, “Structured channel covariance estimation from limited samples for large antenna arrays,”EURASIP J. Wireless Commun. Netw., vol. 2023, no. 1, p. 24, 2023

2023
[20]

Overview of deep learning- based CSI feedback in massive MIMO systems,

J. Guo, C.-K. Wen, S. Jin, and G. Y . Li, “Overview of deep learning- based CSI feedback in massive MIMO systems,”IEEE Trans. Commun., vol. 70, no. 12, pp. 8017–8045, 2022

2022
[21]

Deep learning for massive MIMO CSI feedback,

C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,”IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, 2018

2018
[22]

Lightweight convolutional neural networks for CSI feedback in massive MIMO,

Z. Cao, W.-T. Shih, J. Guo, C.-K. Wen, and S. Jin, “Lightweight convolutional neural networks for CSI feedback in massive MIMO,” IEEE Commun. Lett., vol. 25, no. 8, pp. 2624–2628, 2021

2021
[23]

TransNet: Full attention network for CSI feedback in FDD massive MIMO system,

Y . Cui, A. Guo, and C. Song, “TransNet: Full attention network for CSI feedback in FDD massive MIMO system,”IEEE Wireless Commun. Lett., vol. 11, no. 5, pp. 903–907, 2022

2022
[24]

Convolutional neural network- based multiple-rate compressive sensing for massive MIMO CSI feed- back: Design, simulation, and analysis,

J. Guo, C.-K. Wen, S. Jin, and G. Y . Li, “Convolutional neural network- based multiple-rate compressive sensing for massive MIMO CSI feed- back: Design, simulation, and analysis,”IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2827–2840, 2020

2020
[25]

Low-complexity CSI feedback for FDD massive MIMO systems via learning to opti- mize,

Y . Ma, H. He, S. Song, J. Zhang, and K. B. Letaief, “Low-complexity CSI feedback for FDD massive MIMO systems via learning to opti- mize,”IEEE Trans. Wireless Commun., vol. 24, no. 4, pp. 3483–3498, 2025

2025
[26]

Nonlinear transform coding,

J. Ball ´e, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, S. J. Hwang, and G. Toderici, “Nonlinear transform coding,”IEEE J. Sel. Topics Signal Process., vol. 15, no. 2, pp. 339–353, 2021

2021
[27]

Multi-rate variable-length CSI compression for FDD massive MIMO,

B. Park, H. Do, and N. Lee, “Multi-rate variable-length CSI compression for FDD massive MIMO,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 7715–7719

2024
[28]

Transformer-based nonlinear transform coding for multi-rate CSI compression in MIMO-OFDM systems,

——, “Transformer-based nonlinear transform coding for multi-rate CSI compression in MIMO-OFDM systems,” inProc. IEEE Int. Conf. Commun. (ICC), 2025, pp. 2327–2333

2025
[29]

Multi-length CSI feedback with ordered finite scalar quantiza- tion,

K. Liotopoulos, N. A. Mitsiou, P. G. Sarigiannidis, and G. K. Karagian- nidis, “Multi-length CSI feedback with ordered finite scalar quantiza- tion,”IEEE Commun. Lett., vol. 29, no. 8, pp. 1973–1977, 2025

1973
[30]

Lossy com- pression with Gaussian diffusion,

L. Theis, T. Salimans, M. D. Hoffman, and F. Mentzer, “Lossy com- pression with Gaussian diffusion,” 2022, arXiv:2206.08889

Pith/arXiv arXiv 2022
[31]

Lossy image compression with conditional diffusion models,

R. Yang and S. Mandt, “Lossy image compression with conditional diffusion models,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 36, 2024

2024
[32]

Generative diffusion model- based compression of MIMO CSI,

H. Kim, T. Lee, H. Kim, G. D. Veciana, M. A. Arfaoui, A. Koc, P. Pietraski, G. Zhang, and J. Kaewell, “Generative diffusion model- based compression of MIMO CSI,” 2025, arXiv:2503.03753

Pith/arXiv arXiv 2025
[33]

Residual diffusion models for variable-rate joint source channel coding of MIMO CSI,

S. K. Ankireddy, H. Kim, J. Cho, and H. Kim, “Residual diffusion models for variable-rate joint source channel coding of MIMO CSI,” 2025, arXiv:2505.21681

arXiv 2025
[34]

FDD massive MIMO channel training: Optimal rate-distortion bounds and the spectral efficiency of “one-shot

M. B. Khalilsarai, Y . Song, T. Yang, and G. Caire, “FDD massive MIMO channel training: Optimal rate-distortion bounds and the spectral efficiency of “one-shot”’ schemes,”IEEE Trans. Wireless Commun., vol. 22, no. 9, pp. 6018–6032, Sep. 2023

2023
[35]

Fundamental limits to exploiting side information for CSI feedback in wireless systems,

H. Kim, G. de Veciana, and H. Kim, “Fundamental limits to exploiting side information for CSI feedback in wireless systems,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2417–2430, 2025

2025
[36]

C. M. Bishop,Pattern Recognition and Machine Learning. New York: Springer, 2006

2006
[37]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 10 012–10 022

2021
[38]

FDD massive MIMO without CSI feedback,

D. Han, J. Park, and N. Lee, “FDD massive MIMO without CSI feedback,”IEEE Trans. Wireless Commun., vol. 23, no. 5, pp. 4518– 4530, 2024

2024
[39]

FDD massive MIMO: How to optimally combine UL pilot and limited DL CSI feedback?

J. Kim, J. Choi, J. Park, A. Alkhateeb, and N. Lee, “FDD massive MIMO: How to optimally combine UL pilot and limited DL CSI feedback?”IEEE Trans. Wireless Commun., vol. 24, no. 2, pp. 926– 939, 2025

2025
[40]

On the entropy of general mixture distributions,

N. Lee, “On the entropy of general mixture distributions,” 2026, arXiv:2602.15303

arXiv 2026
[41]

Berger,Rate Distortion Theory: A Mathematical Basis for Data Compression

T. Berger,Rate Distortion Theory: A Mathematical Basis for Data Compression. Englewood Cliffs, NJ, USA: Prentice-Hall, 1971

1971
[42]

Maximum likelihood from incomplete data via the EM algorithm,

A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,”J. R. Stat. Soc. Ser. B (Methodol.), vol. 39, no. 1, pp. 1–38, 1977

1977
[43]

The rate-distortion function for source coding with side information at the decoder,

A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,”IEEE Trans. Inf. Theory, vol. 22, no. 1, pp. 1–10, 1976

1976
[44]

Changeable rate and novel quantization for CSI feedback based on deep learning,

X. Liang, H. Chang, H. Li, X. Gu, and L. Zhang, “Changeable rate and novel quantization for CSI feedback based on deep learning,”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 100–10 114, 2022

2022

[1] [1]

An overview of limited feedback in wireless commu- nication systems,

D. J. Love, R. W. Heath, V . K. N. Lau, D. Gesbert, B. D. Rao, and M. Andrews, “An overview of limited feedback in wireless commu- nication systems,”IEEE J. Sel. Areas Commun., vol. 26, no. 8, pp. 1341–1365, 2008

2008

[2] [2]

MIMO broadcast channels with finite-rate feedback,

N. Jindal, “MIMO broadcast channels with finite-rate feedback,”IEEE Trans. Inf. Theory, vol. 52, no. 11, pp. 5045–5060, 2006. 14

2006

[3] [3]

Achievable rates of MIMO downlink beamforming with non-perfect CSI: A comparison between quantized and analog feedback,

G. Caire, N. Jindal, and M. Kobayashi, “Achievable rates of MIMO downlink beamforming with non-perfect CSI: A comparison between quantized and analog feedback,” inProc. ASILOMAR Signals, Syst., Comput., 2006, pp. 354–358

2006

[4] [4]

Space-time interference alignment and degree- of-freedom regions for the MISO broadcast channel with periodic CSI feedback,

N. Lee and R. W. Heath, “Space-time interference alignment and degree- of-freedom regions for the MISO broadcast channel with periodic CSI feedback,”IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 515–528, 2014

2014

[5] [5]

Adaptive feedback scheme on K-cell MISO inter- fering broadcast channel with limited feedback,

N. Lee and W. Shin, “Adaptive feedback scheme on K-cell MISO inter- fering broadcast channel with limited feedback,”IEEE Trans. Wireless Commun., vol. 10, no. 2, pp. 401–406, 2011

2011

[6] [6]

Joint user selection, power allocation, and precoding design with imperfect CSIT for multi-cell MU- MIMO downlink systems,

J. Choi, N. Lee, S.-N. Hong, and G. Caire, “Joint user selection, power allocation, and precoding design with imperfect CSIT for multi-cell MU- MIMO downlink systems,”IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 162–176, 2020

2020

[7] [7]

The COST 2100 MIMO channel model,

L. Liu, C. Oestges, J. Poutanen, K. Haneda, P. Vainikainen, F. Quitin, F. Tufvesson, and P. De Doncker, “The COST 2100 MIMO channel model,”IEEE Wireless Commun., vol. 19, no. 6, pp. 92–99, 2012

2012

[8] [8]

Study on channel model for frequencies from 0.5 to 100 GHz (release 16),

3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz (release 16),” 3GPP, Tech. Rep. TR 38.901, 11 2020

2020

[9] [9]

A novel millimeter- wave channel simulator and applications for 5G wireless communica- tions,

S. Sun, G. R. MacCartney, and T. S. Rappaport, “A novel millimeter- wave channel simulator and applications for 5G wireless communica- tions,” inProc. IEEE Int. Conf. Commun. (ICC), 2017, pp. 1–7

2017

[10] [10]

Joint spatial division and multiplexing for FDD massive MIMO,

A. Adhikary, J. Nam, J. Ahn, and G. Caire, “Joint spatial division and multiplexing for FDD massive MIMO,”IEEE Trans. Inf. Theory, vol. 59, no. 10, pp. 6441–6463, 2013

2013

[11] [11]

Downlink CSIT under compressed feedback: Joint versus separate source-channel coding,

Y . Song, T. Yang, M. Barzegar Khalilsarai, and G. Caire, “Downlink CSIT under compressed feedback: Joint versus separate source-channel coding,”IEEE Trans. Wireless Commun., vol. 24, no. 10, pp. 8429–8444, 2025

2025

[12] [12]

Theoretical foundations of transform coding,

V . K. Goyal, “Theoretical foundations of transform coding,”IEEE Signal Process. Mag., vol. 18, no. 5, pp. 9–21, 2001

2001

[13] [13]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Hoboken, NJ, USA: Wiley-Interscience, 2006

2006

[14] [14]

Grassmannian beamforming for multiple-input multiple-output wireless systems,

D. J. Love, R. W. Heath, and T. Strohmer, “Grassmannian beamforming for multiple-input multiple-output wireless systems,”IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2735–2747, 2003

2003

[15] [15]

Compressive sensing based channel feedback protocols for spatially-correlated massive antenna arrays,

P.-H. Kuo, H. T. Kung, and P.-A. Ting, “Compressive sensing based channel feedback protocols for spatially-correlated massive antenna arrays,” inProc. IEEE Wireless Commun. and Net. Conf., 2012, pp. 492–497

2012

[16] [16]

High-dimensional CSI acquisition in massive MIMO: Sparsity-inspired approaches,

J.-C. Shen, J. Zhang, K.-C. Chen, and K. B. Letaief, “High-dimensional CSI acquisition in massive MIMO: Sparsity-inspired approaches,”IEEE Syst. J., vol. 11, no. 1, pp. 32–40, 2017

2017

[17] [17]

Massive MIMO channel subspace esti- mation from low-dimensional projections,

S. Haghighatshoar and G. Caire, “Massive MIMO channel subspace esti- mation from low-dimensional projections,”IEEE Trans. Signal Process., vol. 66, no. 2, pp. 350–365, 2018

2018

[18] [18]

FDD massive MIMO via UL/DL channel covariance extrapolation and active channel sparsification,

M. Barzegar Khalilsarai, S. Haghighatshoar, X. Yi, and G. Caire, “FDD massive MIMO via UL/DL channel covariance extrapolation and active channel sparsification,”IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 121–135, 2019

2019

[19] [19]

Structured channel covariance estimation from limited samples for large antenna arrays,

T. Yang, M. Barzegar Khalilsarai, S. Haghighatshoar, and G. Caire, “Structured channel covariance estimation from limited samples for large antenna arrays,”EURASIP J. Wireless Commun. Netw., vol. 2023, no. 1, p. 24, 2023

2023

[20] [20]

Overview of deep learning- based CSI feedback in massive MIMO systems,

J. Guo, C.-K. Wen, S. Jin, and G. Y . Li, “Overview of deep learning- based CSI feedback in massive MIMO systems,”IEEE Trans. Commun., vol. 70, no. 12, pp. 8017–8045, 2022

2022

[21] [21]

Deep learning for massive MIMO CSI feedback,

C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,”IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, 2018

2018

[22] [22]

Lightweight convolutional neural networks for CSI feedback in massive MIMO,

Z. Cao, W.-T. Shih, J. Guo, C.-K. Wen, and S. Jin, “Lightweight convolutional neural networks for CSI feedback in massive MIMO,” IEEE Commun. Lett., vol. 25, no. 8, pp. 2624–2628, 2021

2021

[23] [23]

TransNet: Full attention network for CSI feedback in FDD massive MIMO system,

Y . Cui, A. Guo, and C. Song, “TransNet: Full attention network for CSI feedback in FDD massive MIMO system,”IEEE Wireless Commun. Lett., vol. 11, no. 5, pp. 903–907, 2022

2022

[24] [24]

Convolutional neural network- based multiple-rate compressive sensing for massive MIMO CSI feed- back: Design, simulation, and analysis,

J. Guo, C.-K. Wen, S. Jin, and G. Y . Li, “Convolutional neural network- based multiple-rate compressive sensing for massive MIMO CSI feed- back: Design, simulation, and analysis,”IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2827–2840, 2020

2020

[25] [25]

Low-complexity CSI feedback for FDD massive MIMO systems via learning to opti- mize,

Y . Ma, H. He, S. Song, J. Zhang, and K. B. Letaief, “Low-complexity CSI feedback for FDD massive MIMO systems via learning to opti- mize,”IEEE Trans. Wireless Commun., vol. 24, no. 4, pp. 3483–3498, 2025

2025

[26] [26]

Nonlinear transform coding,

J. Ball ´e, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, S. J. Hwang, and G. Toderici, “Nonlinear transform coding,”IEEE J. Sel. Topics Signal Process., vol. 15, no. 2, pp. 339–353, 2021

2021

[27] [27]

Multi-rate variable-length CSI compression for FDD massive MIMO,

B. Park, H. Do, and N. Lee, “Multi-rate variable-length CSI compression for FDD massive MIMO,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 7715–7719

2024

[28] [28]

Transformer-based nonlinear transform coding for multi-rate CSI compression in MIMO-OFDM systems,

——, “Transformer-based nonlinear transform coding for multi-rate CSI compression in MIMO-OFDM systems,” inProc. IEEE Int. Conf. Commun. (ICC), 2025, pp. 2327–2333

2025

[29] [29]

Multi-length CSI feedback with ordered finite scalar quantiza- tion,

K. Liotopoulos, N. A. Mitsiou, P. G. Sarigiannidis, and G. K. Karagian- nidis, “Multi-length CSI feedback with ordered finite scalar quantiza- tion,”IEEE Commun. Lett., vol. 29, no. 8, pp. 1973–1977, 2025

1973

[30] [30]

Lossy com- pression with Gaussian diffusion,

L. Theis, T. Salimans, M. D. Hoffman, and F. Mentzer, “Lossy com- pression with Gaussian diffusion,” 2022, arXiv:2206.08889

Pith/arXiv arXiv 2022

[31] [31]

Lossy image compression with conditional diffusion models,

R. Yang and S. Mandt, “Lossy image compression with conditional diffusion models,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 36, 2024

2024

[32] [32]

Generative diffusion model- based compression of MIMO CSI,

H. Kim, T. Lee, H. Kim, G. D. Veciana, M. A. Arfaoui, A. Koc, P. Pietraski, G. Zhang, and J. Kaewell, “Generative diffusion model- based compression of MIMO CSI,” 2025, arXiv:2503.03753

Pith/arXiv arXiv 2025

[33] [33]

Residual diffusion models for variable-rate joint source channel coding of MIMO CSI,

S. K. Ankireddy, H. Kim, J. Cho, and H. Kim, “Residual diffusion models for variable-rate joint source channel coding of MIMO CSI,” 2025, arXiv:2505.21681

arXiv 2025

[34] [34]

FDD massive MIMO channel training: Optimal rate-distortion bounds and the spectral efficiency of “one-shot

M. B. Khalilsarai, Y . Song, T. Yang, and G. Caire, “FDD massive MIMO channel training: Optimal rate-distortion bounds and the spectral efficiency of “one-shot”’ schemes,”IEEE Trans. Wireless Commun., vol. 22, no. 9, pp. 6018–6032, Sep. 2023

2023

[35] [35]

Fundamental limits to exploiting side information for CSI feedback in wireless systems,

H. Kim, G. de Veciana, and H. Kim, “Fundamental limits to exploiting side information for CSI feedback in wireless systems,”IEEE J. Sel. Areas Commun., vol. 43, no. 7, pp. 2417–2430, 2025

2025

[36] [36]

C. M. Bishop,Pattern Recognition and Machine Learning. New York: Springer, 2006

2006

[37] [37]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 10 012–10 022

2021

[38] [38]

FDD massive MIMO without CSI feedback,

D. Han, J. Park, and N. Lee, “FDD massive MIMO without CSI feedback,”IEEE Trans. Wireless Commun., vol. 23, no. 5, pp. 4518– 4530, 2024

2024

[39] [39]

FDD massive MIMO: How to optimally combine UL pilot and limited DL CSI feedback?

J. Kim, J. Choi, J. Park, A. Alkhateeb, and N. Lee, “FDD massive MIMO: How to optimally combine UL pilot and limited DL CSI feedback?”IEEE Trans. Wireless Commun., vol. 24, no. 2, pp. 926– 939, 2025

2025

[40] [40]

On the entropy of general mixture distributions,

N. Lee, “On the entropy of general mixture distributions,” 2026, arXiv:2602.15303

arXiv 2026

[41] [41]

Berger,Rate Distortion Theory: A Mathematical Basis for Data Compression

T. Berger,Rate Distortion Theory: A Mathematical Basis for Data Compression. Englewood Cliffs, NJ, USA: Prentice-Hall, 1971

1971

[42] [42]

Maximum likelihood from incomplete data via the EM algorithm,

A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,”J. R. Stat. Soc. Ser. B (Methodol.), vol. 39, no. 1, pp. 1–38, 1977

1977

[43] [43]

The rate-distortion function for source coding with side information at the decoder,

A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,”IEEE Trans. Inf. Theory, vol. 22, no. 1, pp. 1–10, 1976

1976

[44] [44]

Changeable rate and novel quantization for CSI feedback based on deep learning,

X. Liang, H. Chang, H. Li, X. Gu, and L. Zhang, “Changeable rate and novel quantization for CSI feedback based on deep learning,”IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 100–10 114, 2022

2022