arxiv: 2604.09521 · v1 · submitted 2026-04-10 · 💻 cs.IT · cs.AI· math.IT

Recognition: unknown

Semantic Rate-Distortion for Bounded Multi-Agent Communication: Capacity-Derived Semantic Spaces and the Communication Cost of Alignment

Anthony T. Nixon

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:54 UTC · model grok-4.3

classification 💻 cs.IT cs.AImath.IT

keywords semantic communicationrate-distortion theoryPOMDP abstractionsbounded agentsphase transitionquotient modelsmulti-agent alignmentside-information coding

0 comments

The pith

Bounded agents induce distinct semantic alphabets from their capacity limits, making intent-preserving communication impossible below a critical rate set by the mismatch between those alphabets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that agents with different computational capacities do not merely compress a shared semantic alphabet at different rates; each derives its own alphabet as the coarsest abstraction of the environment that its capacity permits. This leads to a structural phase transition in communication: below a critical rate determined by how much the agents' derived alphabets differ, no coding scheme can preserve the sender's intent at the receiver. The work supplies a general lower bound on this transition via common-history quotient comparison, identifies an exact one-way coding benchmark once the alphabets are fixed, and shows that classical side-information results apply directly to the derived alphabets with matching converses.

Core claim

When two agents interact with the same environment but possess unequal capacities, each induces a semantic alphabet given by its unique coarsest consistent abstraction of the POMDP; communication between such agents then exhibits a sharp phase transition below which intent-preserving transmission is structurally impossible, with the critical rate fixed by the mismatch between the two quotient alphabets.

What carries the argument

The quotient POMDP Q_{m,T}(M), defined as the unique coarsest abstraction of the underlying POMDP that remains consistent with an agent's capacity bound.

If this is right

Intent-preserving communication between capacity-mismatched agents is impossible below the rate fixed by their quotient mismatch.
In the one-way memoryless regime, classical Wyner-Ziv coding on the quotient alphabets achieves the minimum rate with an exact converse.
An asymptotic converse holds in the shrinking-distortion regime where the allowed distortion vanishes with the horizon.
Alignment traversal bounds permit compositional communication by routing through agents of intermediate capacities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Communication protocol design should begin by computing each agent's capacity-derived quotient rather than assuming a shared source alphabet.
The framework suggests that multi-agent systems may require explicit capacity-alignment stages or intermediate agents to cross large quotient gaps efficiently.
The same quotient construction could be used to derive minimal alphabets for learning or planning tasks under resource bounds.

Load-bearing premise

That the quotient POMDP supplies the unique coarsest abstraction and that comparing quotients on common history fully determines the structural mismatch that sets the phase-transition threshold.

What would settle it

A controlled experiment on a known POMDP in which two agents of explicitly different capacities attempt one-way communication at rates straddling the predicted critical rate and the fraction of intent-preserving messages drops sharply to zero below that rate.

Figures

Figures reproduced from arXiv: 2604.09521 by Anthony T. Nixon.

**Figure 2.** Figure 2: Left: Chain5 rate-distortion curves for mA = 16, mB ∈ {1, 2, 4, 8}. A sharp knee appears near Rcrit ≈ 1.4 bits/step (dashed); distortion decays rapidly above it. Right: Rmin vs. capacity gap (slope ≈ 1), confirming Theorem 35. The framework recovers human–AI alignment as a special case ( [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Shrinking-distortion regime illustration for the asymptotic one-way converse (Chain5, [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Phase transition at |S| > 100. Top: mB = 1 creates a capacity gap; distortion floor persists below Rcrit = 2 bits/step. Bottom: mB = 2 closes the gap (|QA| = |QB|). Domains: Chain100 (|S|= 100), Chain150 (|S|= 150), Chain200 (|S|= 200) [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: RockSample(4,4): phase transition and WZ benchmark. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Left: RichGridWorld (T = 2); mB = 1 gap creates a distortion floor. Right: BalancedRand8 (T = 4, refinement ratio 2.58); inflection near Rcrit ≈ 1.63 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Wyner-Ziv benchmark curves under random vs. structured policies (Chain5, [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: IB-style encoder-only compression vs. the semantic WZ benchmark (Theorem 29). The [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Semantic rate-distortion curves for the T [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗

**Figure 10.** Figure 10: Left: Critical rate Rcrit vs. quotient size gap across POMDP instances. Right: Semantic vs. classical coding (mA = 16, mB = 2). Shaded band shows IQR over 10 seeds. CHAIN5: A 5-state chain POMDP (|S|= 5, |A|= 2, |O|= 5, obs. noise 0.3) designed to produce large quotient gaps. The main experiments use T = 4 with mA = 16 and mB ∈ {1, 2, 4, 8}. For mB = 1: |QA| = 781, |QB| = 289, giving R log crit = log(781/… view at source ↗

**Figure 11.** Figure 11: Quotient estimation convergence for Chain5 ( [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

read the original abstract

When two agents of different computational capacities interact with the same environment, they need not compress a common semantic alphabet differently; they can induce different semantic alphabets altogether. We show that the quotient POMDP $Q_{m,T}(M)$ - the unique coarsest abstraction consistent with an agent's capacity - serves as a capacity-derived semantic space for any bounded agent, and that communication between heterogeneous agents exhibits a sharp structural phase transition. Below a critical rate $R_{\text{crit}}$ determined by the quotient mismatch, intent-preserving communication is structurally impossible. In the supported one-way memoryless regime, classical side-information coding then yields exponential decay above the induced benchmark. Classical coding theorems tell you the rate once the source alphabet is fixed; our contribution is to derive that alphabet from bounded interaction itself. Concretely, we prove: (1) a fixed-$\varepsilon$ structural phase-transition theorem whose lower bound is fully general on the common-history quotient comparison; (2) a one-way Wyner-Ziv benchmark identification on quotient alphabets, with exact converse, exact operational equality for memoryless quotient sources, and an ergodic long-run bridge via explicit mixing bounds; (3) an asymptotic one-way converse in the shrinking-distortion regime $\varepsilon = O(1/T)$, proved from the message stream and decoder side information; and (4) alignment traversal bounds enabling compositional communication through intermediate capacity levels. Experiments on eight POMDP environments (including RockSample(4,4)) illustrate the phase transition, a structured-policy benchmark shows the one-way rate can drop by up to $19\times$ relative to the counting bound, and a shrinking-distortion sweep matches the regime of the asymptotic converse.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives semantic alphabets from bounded POMDP capacities via quotients and claims a structural phase transition in communication feasibility, with supporting theorems and experiments.

read the letter

The main thing to know is that this work starts from agents with different capacities interacting in the same POMDP and derives their semantic spaces as quotient POMDPs, then shows that communication between them hits a hard threshold below which intent-preserving exchange becomes impossible regardless of coding scheme. Above that rate the usual side-information results apply with exact matches in the memoryless case. That framing moves the problem from choosing a shared alphabet to letting capacity dictate the alphabet itself, which is a clean shift from most semantic rate-distortion papers. The four stated theorems cover the general lower bound on common-history mismatch, the Wyner-Ziv benchmark with converse, the shrinking-distortion asymptotic, and compositional alignment bounds. The experiments on eight environments, including RockSample, show the transition and report up to 19x rate improvement over a counting baseline, which is concrete enough to be worth checking. The shrinking-distortion sweep also lines up with the asymptotic claim. Those parts are the paper's real contribution and give readers something specific to build on or test. The soft spot sits in the uniqueness step. The phase-transition lower bound requires that the quotient is the single coarsest capacity-consistent abstraction and that comparing two quotients on shared history fully captures the structural mismatch. If other coarsest abstractions exist for the same capacity or if agents can exploit private histories or finer non-quotient encodings to align below the claimed R_crit, the impossibility result does not follow as stated. The abstract presents the quotient as unique, but the strength of that claim depends on the proof details for how uniqueness is shown and whether alternatives are ruled out. The experiments illustrate the transition but do not appear to compare against other possible abstractions, so they do not close that gap. This paper is for researchers working on information-theoretic models of multi-agent systems or semantic communication who already know rate-distortion and POMDPs. A reader looking for new ways to ground semantic alphabets in capacity limits will find usable theorems and numbers here. It deserves a serious referee because the claims are stated precisely enough to be checked and the experiments give something to reproduce or extend. I would send it out, with the main review focus on verifying the uniqueness and common-history arguments.

Referee Report

2 major / 2 minor

Summary. The paper defines the quotient POMDP Q_{m,T}(M) as the unique coarsest abstraction of an environment consistent with an agent's bounded computational capacity, treats this quotient as a capacity-derived semantic alphabet, and proves that communication between agents with mismatched quotients exhibits a sharp phase transition: below a critical rate R_crit determined by the mismatch on common histories, intent-preserving communication is structurally impossible. It establishes four concrete results—an ε-fixed structural phase-transition theorem with a general lower bound, a one-way Wyner-Ziv benchmark with exact converse and operational equality in the memoryless case, an asymptotic converse for shrinking distortion ε = O(1/T), and alignment traversal bounds—supported by experiments on eight POMDP domains showing the transition and up to 19× rate reduction relative to a counting bound.

Significance. If the central claims hold, the work supplies a capacity-derived mechanism for inducing semantic alphabets rather than assuming them, together with explicit rate thresholds and exact converses that classical rate-distortion theory lacks once the alphabet is endogenous. The exact operational equality for memoryless quotient sources and the explicit mixing bounds for the ergodic bridge are notable strengths that would allow direct comparison with standard side-information coding results.

major comments (2)

[Definition of Q_{m,T}(M) and Theorem 1] The phase-transition lower bound in the first theorem relies on Q_{m,T}(M) being the unique coarsest capacity-consistent abstraction and on common-history quotient comparison fully capturing structural mismatch. The manuscript must supply an explicit uniqueness argument (e.g., in the section defining the quotient POMDP) showing that no other non-equivalent coarsest abstraction exists for the same capacity; without it the claimed structural impossibility below R_crit does not follow.
[Proof of the structural phase-transition theorem] The sufficiency of common-history comparison for the impossibility claim must be justified against alternatives such as private histories, finer partitions, or non-quotient encodings. If agents can achieve alignment outside the common-history quotient comparison, the lower bound on R_crit is not tight and the phase-transition statement requires qualification.

minor comments (2)

[Experiments] The experimental section should report the precise capacity parameters m and T used to construct each quotient POMDP and the exact procedure for estimating R_crit from the mismatch.
[Structured-policy benchmark] Clarify whether the 19× rate reduction is measured against the counting bound on the original POMDP or on the quotient alphabet; the comparison should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments identify points where the manuscript's claims on uniqueness and common-history sufficiency require stronger justification. We address each below and have revised the manuscript to incorporate explicit arguments and clarifications.

read point-by-point responses

Referee: [Definition of Q_{m,T}(M) and Theorem 1] The phase-transition lower bound in the first theorem relies on Q_{m,T}(M) being the unique coarsest capacity-consistent abstraction and on common-history quotient comparison fully capturing structural mismatch. The manuscript must supply an explicit uniqueness argument (e.g., in the section defining the quotient POMDP) showing that no other non-equivalent coarsest abstraction exists for the same capacity; without it the claimed structural impossibility below R_crit does not follow.

Authors: We agree that an explicit uniqueness argument strengthens the foundation of the phase-transition claim. The original manuscript asserts uniqueness from the definition of the quotient as the coarsest partition compatible with the capacity-limited policy class, but does not include a standalone proof. We have added a new lemma (Lemma 2 in the revised Section 3) proving uniqueness: any two coarsest abstractions consistent with the same capacity m and horizon T must induce identical equivalence relations on histories, by showing that the capacity constraint defines a unique maximal equivalence class under the partial order of partition refinement. This directly supports the structural impossibility below R_crit. revision: yes
Referee: [Proof of the structural phase-transition theorem] The sufficiency of common-history comparison for the impossibility claim must be justified against alternatives such as private histories, finer partitions, or non-quotient encodings. If agents can achieve alignment outside the common-history quotient comparison, the lower bound on R_crit is not tight and the phase-transition statement requires qualification.

Authors: We accept that the sufficiency argument needs expansion. Intent preservation is defined exclusively over common observable histories, so private histories cannot contribute to shared semantic alignment without violating the common-environment assumption. Finer partitions exceed the capacity bound by construction, and non-quotient encodings are excluded because they are not capacity-consistent abstractions. We have revised the proof of Theorem 1 to include a dedicated paragraph (now in the proof of the lower bound) showing that any code achieving intent preservation must map to the same quotient equivalence classes on common histories; bypassing this would require a finer or non-quotient representation, which is ruled out by the capacity constraint. For the ergodic case we have added an explicit mixing assumption to ensure the bound remains tight. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation applies classical coding to externally introduced quotient spaces

full rationale

The paper defines the quotient POMDP Q_{m,T}(M) as the unique coarsest capacity-consistent abstraction and then applies standard side-information coding (Wyner-Ziv) to obtain the phase-transition and rate bounds. No equation reduces R_crit or the impossibility claim to a fitted parameter, self-citation chain, or redefinition of the input alphabet. The uniqueness assertion and common-history comparison are taken as the starting semantic spaces rather than derived predictions; once those alphabets are fixed, the coding theorems follow from classical results without circular reduction. The provided abstract and context contain no self-definitional loop, fitted-input prediction, or load-bearing self-citation that collapses the central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the existence and uniqueness of the coarsest capacity-consistent abstraction (quotient POMDP) and on the applicability of classical Wyner-Ziv coding once the alphabets are fixed by that abstraction.

axioms (2)

domain assumption The quotient POMDP Q_{m,T}(M) is the unique coarsest abstraction consistent with an agent's capacity.
Invoked in the definition of capacity-derived semantic space and the phase-transition lower bound.
domain assumption Common-history quotient comparison captures the structural mismatch that determines R_crit.
Used for the general lower bound on the phase transition.

invented entities (1)

Quotient POMDP Q_{m,T}(M) no independent evidence
purpose: Capacity-derived semantic alphabet for bounded agents
New abstraction introduced to replace fixed semantic alphabets; no independent falsifiable prediction outside the paper is stated in the abstract.

pith-pipeline@v0.9.0 · 5615 in / 1403 out tokens · 41071 ms · 2026-05-10T16:54:29.997157+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 2 canonical work pages · 1 internal anchor

[1]

C. E. Shannon. A mathematical theory of communication.Bell System Technical Journal, 27:379–423, 1948

1948
[2]

C. E. Shannon. Coding theorems for a discrete source with a fidelity criterion.IRE National Convention Record, 7:142–163, 1959

1959
[3]

J. L. Massey. Causality, feedback and directed information. InProc. Int. Symp. Information Theory and Its Applications (ISITA), pages 303–305, 1990

1990
[4]

A. T. Nixon. The Myhill–Nerode theorem for bounded interaction: Canonical abstractions via agent-bounded indistinguishability.arXiv:2603.21399 [cs.AI], 2026

work page arXiv 2026
[5]

Gündüz, Z

D. Gündüz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae. Beyond transmitting bits: Context, semantics, and task-oriented communications.IEEE JSAC, 41(1):5–41, 2023

2023
[6]

Kountouris and N

M. Kountouris and N. Pappas. Semantics-empowered communication for networked intelligent systems.IEEE Communications Magazine, 59(6):96–102, 2021

2021
[7]

P. A. Ortega and D. A. Braun. Thermodynamics as a theory of decision-making with information- processing costs.Proceedings of the Royal Society A, 469(2153):20120683, 2013

2013
[8]

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains.Artificial Intelligence, 101(1–2):99–134, 1998

1998
[9]

Russell.Human Compatible: Artificial Intelligence and the Problem of Control

S. Russell.Human Compatible: Artificial Intelligence and the Problem of Control. Viking, 2019

2019
[10]

Soares and B

N. Soares and B. Fallenstein. Aligning superintelligence with human interests: A technical research agenda.Machine Intelligence Research Institute, Technical Report 2014-8, 2014

2014
[11]

P. S. Castro, P. Panangaden, and D. Precup. Equivalence relations in fully and partially observable Markov decision processes. InIJCAI, pages 1653–1658, 2009

2009
[12]

El-Yaniv and Y

R. El-Yaniv and Y . Wiener. On the foundations of noise-free selective classification.JMLR, 11:1605–1641, 2010

2010
[13]

Language Models (Mostly) Know What They Know

S. Kadavath, T. Conerly, A. Askell, T. Henighan, D. Drain, E. Perez, N. Schiefer, Z. Hatfield- Dodds, D. DasSarma, E. Tran-Johnson, et al. Language models (mostly) know what they know. arXiv:2207.05221, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[14]

X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou. Self-consistency improves chain of thought reasoning. InICLR, 2023

2023
[15]

Ferns, P

N. Ferns, P. Panangaden, and D. Precup. Metrics for finite Markov decision processes. InUAI, 2004

2004
[16]

L. Li, T. J. Walsh, and M. L. Littman. Towards a unified theory of state abstraction for MDPs. InISAIM, 2006

2006
[17]

Amato, D

C. Amato, D. S. Bernstein, and S. Zilberstein. Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs.Autonomous Agents and Multi-Agent Systems, 21(3):293–320, 2010

2010
[18]

T. M. Cover and J. A. Thomas.Elements of Information Theory. Wiley, 2006

2006
[19]

I. Ong, A. Almahairi, V . Wu, W.-L. Chiang, T. Wu, J. E. Gonzalez, M. W. Kadous, and I. Stoica. RouteLLM: Learning to route LLMs from preference data. InICLR, 2025

2025
[20]

Hendrycks, C

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. Measuring massive multitask language understanding. InICLR, 2021

2021
[21]

Y . Wang, X. Ma, G. Zhang, et al. MMLU-Pro: A more robust and challenging multi-task language understanding benchmark. InNeurIPS Datasets and Benchmarks, 2024

2024
[22]

Tishby, F

N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. InProc. 37th Allerton Conf. on Communication, Control, and Computing, 1999

1999
[23]

C. A. Sims. Implications of rational inattention.Journal of Monetary Economics, 50(3):665–690, 2003. 18

2003
[24]

Genewein, F

T. Genewein, F. Leibfried, K. Grau-Moya, and D. A. Braun. Bounded rationality, abstraction, and hierarchical decision-making: An information-theoretic optimality principle.Frontiers in Robotics and AI, 2:27, 2015

2015
[25]

D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of Markov decision processes.Mathematics of Operations Research, 27(4):819–840, 2002

2002
[26]

Tatikonda and S

S. Tatikonda and S. Mitter. Control under communication constraints.IEEE Transactions on Automatic Control, 49(7):1056–1068, 2004

2004
[27]

A. D. Wyner and J. Ziv. The rate-distortion function for source coding with side information at the decoder.IEEE Transactions on Information Theory, 22(1):1–10, 1976

1976
[28]

Berger.Rate Distortion Theory: A Mathematical Basis for Data Compression

T. Berger.Rate Distortion Theory: A Mathematical Basis for Data Compression. Prentice-Hall, 1971

1971
[29]

Csiszár and J

I. Csiszár and J. Körner.Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, 2nd edition, 2011

2011
[30]

Kostina and S

V . Kostina and S. Verdú. Fixed-length lossy compression in the finite blocklength regime.IEEE Transactions on Information Theory, 58(6):3309–3338, 2012

2012
[31]

H. H. Permuter, T. Weissman, and A. J. Goldsmith. Finite state channels with time-invariant deterministic feedback.IEEE Transactions on Information Theory, 55(2):644–662, 2009

2009
[32]

Zaidi, I

A. Zaidi, I. E. Aguerri, and S. Shamaï. On the information bottleneck problems: Models, connections, applications, and information theoretic views.Entropy, 22(2):151, 2020

2020
[33]

P. A. Stavrou and M. Kountouris. A rate distortion approach to goal-oriented communication. InProc. IEEE Int. Symp. Information Theory (ISIT), 2022

2022
[34]

Arthur and S

D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. InSODA, pages 1027–1035, 2007

2007
[35]

R. E. Blahut. Computation of channel capacity and rate-distortion functions.IEEE Transactions on Information Theory, 18(4):460–473, 1972

1972
[36]

S. Arimoto. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Transactions on Information Theory, 18(1):14–20, 1972

1972
[37]

S. C. Draper and G. W. Wornell. Side information aware coding strategies for sensor networks. IEEE JSAC, 22(6):966–976, 2004

2004
[38]

P. A. Stavrou and M. Kountouris. The role of fidelity in goal-oriented semantic communication: A rate-distortion approach.IEEE Trans. Commun., 71(7):3918–3931, 2023

2023
[39]

C. V . Goldman and S. Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis.J. Artif. Intell. Res., 22:143–174, 2004

2004
[40]

H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang. Deep learning enabled semantic communication systems.IEEE Trans. Signal Process., 69:2663–2675, 2021

2021
[41]

H. S. Witsenhausen. A counterexample in stochastic optimum control.SIAM J. Control, 6(1):131–147, 1968

1968
[42]

H. H. Permuter, P. Cuff, B. Van Roy, and T. Weissman. Capacity of the trapdoor channel with feedback.IEEE Trans. Inf. Theory, 54(7):3150–3165, 2008

2008
[43]

M. S. Derpich and J. Østergaard. Improved upper bounds to the causal quadratic rate-distortion function for Gaussian stationary sources.IEEE Trans. Inf. Theory, 58(5):3131–3152, 2012

2012
[44]

Nayyar, A

A. Nayyar, A. Mahajan, and D. Teneketzis. Decentralized stochastic control with partial history sharing: A common information approach.IEEE Transactions on Automatic Control, 58(7):1644–1658, 2013

2013
[45]

Slepian and J

D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources.IEEE Transac- tions on Information Theory, 19(4):471–480, 1973

1973
[46]

P. F. Christiano, J. Leike, T. Brown, M. Milani, S. Gilmer, and D. Amodei. Deep reinforcement learning from human preferences. InNeurIPS, 2017

2017
[47]

D. Abel, D. Arumugam, L. Lehnert, and M. Littman. State abstractions for lifelong reinforce- ment learning. InICML, 2018. 19

2018
[48]

Lazaridou, A

A. Lazaridou, A. Peysakhovich, and M. Baroni. Multi-agent cooperation and the emergence of (natural) language. InICLR, 2017

2017
[49]

Eccles, Y

T. Eccles, Y . Bachrach, G. Lever, A. Lazaridou, and T. Graepel. Biases for emergent communi- cation in multi-agent reinforcement learning. InNeurIPS, 2019

2019
[50]

G. A. Miller. Note on the bias of information estimates. In H. Quastler, editor,Information Theory in Psychology: Problems and Methods, pages 95–100, 1955

1955
[51]

Smith and R

T. Smith and R. Simmons. Heuristic search value iteration for POMDPs. InProceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pages 520–527, 2004. A Wyner-Ziv Corollaries Corollary 36(Benchmark Gap Closure).Under the one-way reduction (Theorem 25), the converse and pipelined achievability are governed by the same WZ benchmark...

2004
[52]

Bridge:Neural representation clusters at layer l of a transformer refine the Myhill–Nerode quotient Qm,T (M)for an effective capacitym(l)determined by the layer’s representational bandwidth
[53]

Smoothness:The quotient lattice for language-based agents is connected and locally smooth— adjacent capacity levels produce O(1) new quotient classes per step, making the chaining error well-controlled. These conjectures are empirically testable via layer-wise probing of language models at varying scales, and if confirmed, would makeR crit estimation poly...