Semantic Rate-Distortion Theory: Deductive Compression and Closure Fidelity
Pith reviewed 2026-05-10 16:40 UTC · model grok-4.3
The pith
Knowledge bases with logical structure can be transmitted at rates strictly below classical entropy when fidelity requires only preserving the deductive closure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under closure fidelity the semantic rate-distortion function depends only on the irredundant core of the knowledge base. Redundant states contribute neither to the required rate nor to the achievable distortion. Consequently the zero-distortion semantic rate is strictly smaller than the classical entropy rate for any base containing redundancies, and a semantic source-channel separation theorem holds with an asymptotic leverage factor strictly greater than one.
What carries the argument
The irredundant core: a canonical generating set obtained from the knowledge base by a fixed-order deletion procedure, from which the entire deductive closure can be rederived.
If this is right
- The semantic rate-distortion function is completely determined by the irredundant core and ignores all redundant states.
- Semantic source-channel separation yields an asymptotic reduction in required channel uses by a factor greater than one.
- A strengthened Fano inequality holds that exploits the structure of the core.
- An overlap decomposition supplies necessary and sufficient conditions for closure-reliable transmission among heterogeneous agents.
- The same leverage effect appears in broadcast settings even when the channel is noiseless.
Where Pith is reading between the lines
- The same compression mechanism may apply to other structured sources such as ontologies or theorem-proving databases.
- The identified semantic bottleneck in broadcast channels could limit efficiency in distributed knowledge systems regardless of channel quality.
- Verification on larger or differently structured logics would test how widely the leverage factor exceeds one.
Load-bearing premise
A fixed-order deletion procedure always extracts a canonical irredundant core from which the full deductive closure can be rederived without loss.
What would settle it
A concrete knowledge base containing redundant states for which the zero-distortion semantic rate equals the classical entropy rate, or for which the fixed-order deletion procedure fails to recover the full deductive closure.
read the original abstract
Shannon's rate-distortion theory treats source symbols as unstructured labels. When the source is a knowledge base equipped with a logical proof system, a natural fidelity criterion is closure fidelity: a reconstruction is acceptable if it preserves the deductive closure of the original. This paper develops a rate-distortion theory under this criterion. Central to the theory is the irredundant core-a canonical generating set extracted by a fixed-order deletion procedure, from which the full deductive closure can be rederived. We prove that the zero-distortion semantic rate equals a quantity that is strictly below the classical entropy rate whenever the knowledge base contains redundant states. More generally, the full semantic rate-distortion function depends only on the core; redundant states are invisible to both rate and distortion. We derive a semantic source-channel separation theorem showing a semantic leverage phenomenon: under closure fidelity, the required source rate is reduced by an asymptotic leverage factor greater than one, allowing the same knowledge base to be communicated with proportionally fewer channel uses-not by violating Shannon capacity, but because redundant states become free. We also prove a strengthened Fano inequality that exploits core structure. For heterogeneous multi-agent communication, an overlap decomposition gives necessary and sufficient conditions for closure-reliable transmission and identifies a semantic bottleneck in broadcast settings that persists even over noiseless channels. All results are verified on Datalog instances with up to 24,000 base facts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a rate-distortion theory for knowledge-base sources under a closure-fidelity criterion, where a reconstruction is acceptable if it preserves the deductive closure of the original knowledge base. It introduces the irredundant core extracted via a fixed-order deletion procedure as the canonical generating set, proves that the zero-distortion semantic rate is strictly below the classical entropy rate when redundancies exist, shows that the full semantic rate-distortion function depends only on the core (rendering redundant states invisible), derives a semantic source-channel separation theorem with a leverage phenomenon, a strengthened Fano inequality, and overlap-decomposition conditions for multi-agent closure-reliable transmission, with all results verified on Datalog instances up to 24,000 facts.
Significance. If the core claims hold, the work supplies a new structured source model and fidelity criterion that allows deductive redundancies to be exploited for compression gains, yielding a source-channel separation result with asymptotic leverage greater than one and a semantic bottleneck in broadcast settings. The concrete verification on Datalog instances and the derivation of the separation theorem and strengthened Fano inequality are positive features that ground the theory in a computable setting.
major comments (2)
- [Abstract] Abstract: The central claims that 'the zero-distortion semantic rate equals a quantity that is strictly below the classical entropy rate whenever the knowledge base contains redundant states' and that 'the full semantic rate-distortion function depends only on the core' rest on the fixed-order deletion procedure yielding a canonical irredundant core for each deductive closure. In Datalog, however, the same closure can be generated by multiple distinct irredundant bases; because deletion is order-dependent, the procedure can output different cores for knowledge bases realizing the identical closure. Consequently the minimal rate achieving zero closure fidelity is the entropy of the closure random variable (satisfying H(closure) ≤ H(core)), not necessarily a quantity derived from the core distribution, so the asserted strict inequality and core-only dependence do not follow.
- [Abstract] The semantic source-channel separation theorem and the associated leverage claim presuppose that the rate-distortion function is fully determined by the core distribution. If the deletion procedure fails to produce a unique canonical core, the achievable rate region under closure fidelity would instead be governed by the closure entropy, altering the leverage factor and the conditions for reliable transmission in the multi-agent overlap decomposition.
minor comments (1)
- [Abstract] The abstract and main text would benefit from an explicit equation defining the semantic rate-distortion function R_s(D) to make the dependence on the core distribution immediately visible.
Simulated Author's Rebuttal
We thank the referee for the detailed and insightful review of our manuscript. The comments raise important points about the canonicity of the irredundant core and its implications for the rate-distortion function and separation theorem. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claims that 'the zero-distortion semantic rate equals a quantity that is strictly below the classical entropy rate whenever the knowledge base contains redundant states' and that 'the full semantic rate-distortion function depends only on the core' rest on the fixed-order deletion procedure yielding a canonical irredundant core for each deductive closure. In Datalog, however, the same closure can be generated by multiple distinct irredundant bases; because deletion is order-dependent, the procedure can output different cores for knowledge bases realizing the identical closure. Consequently the minimal rate achieving zero closure fidelity is the entropy of the closure random variable (satisfying H(closure) ≤ H(core)), not necessarily a quantity derived from the core distribution, so the asserted strict inequality and core-only dependence do not follow.
Authors: We appreciate the referee's precise identification of this subtlety. The fixed-order deletion procedure is defined to produce a deterministic core for any given knowledge-base realization, so that the core is a well-defined random variable induced by the source distribution over knowledge bases. Our proofs establish that closure fidelity renders redundant facts invisible, allowing the semantic rate-distortion function to be expressed in terms of the core distribution. The claimed strict inequality with classical entropy holds because H(core) < H(KB) whenever redundant states occur with positive probability. Nevertheless, we agree that the information-theoretically minimal rate for zero closure fidelity is H(closure(X)), which satisfies H(closure(X)) ≤ H(core(X)). We will revise the abstract and the statements of the main theorems to clarify that the semantic R(0) equals H(closure(X)) and that the full semantic rate-distortion function is governed by the distribution of the closure (which is generated from the core). This preserves the core-only dependence in the operational sense that the encoder may extract and transmit the core while the decoder reconstructs any valid generator of the same closure. revision: yes
-
Referee: [Abstract] The semantic source-channel separation theorem and the associated leverage claim presuppose that the rate-distortion function is fully determined by the core distribution. If the deletion procedure fails to produce a unique canonical core, the achievable rate region under closure fidelity would instead be governed by the closure entropy, altering the leverage factor and the conditions for reliable transmission in the multi-agent overlap decomposition.
Authors: The separation theorem is derived by showing that any code achieving a given closure-fidelity distortion can be reduced to an equivalent code whose rate is determined by the core. The leverage factor greater than one follows directly from the reduction in effective source entropy once redundancies are removed by the fixed-order procedure. We maintain that the procedure supplies the necessary structure for the source model. To address the referee's concern, we will add a clarifying lemma relating H(core) and H(closure) and revise the statement of the separation theorem to indicate that the achievable rate region is bounded by the closure entropy while the operational leverage is realized through the core. The overlap-decomposition conditions for multi-agent settings will be updated to reflect this relationship, ensuring the semantic bottleneck characterization remains valid. revision: partial
Circularity Check
No circularity: core extraction is independent of the rate function
full rationale
The semantic rate-distortion function is defined using the irredundant core obtained from a fixed-order deletion procedure that operates on the knowledge base prior to and independently of any rate or distortion calculation. Zero-distortion semantic rate is stated to equal a quantity derived from this externally extracted core distribution, not constructed from the rate expression itself. No equations equate a prediction to a fitted parameter by construction, and no load-bearing step reduces to a self-citation or ansatz smuggled from prior work by the same author. Verification on concrete Datalog instances supplies external grounding. The modeling choice that the deletion procedure produces a canonical representative is an assumption about the source model, not a definitional loop inside the derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Knowledge bases are equipped with a logical proof system that defines a deductive closure.
- ad hoc to paper A fixed-order deletion procedure extracts a canonical irredundant core from which the full deductive closure can be rederived.
invented entities (2)
-
irredundant core
no independent evidence
-
closure fidelity
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the irredundant core Atom(S_O)—a canonical irredundant generating set, extracted by a fixed-order deletion procedure, from which the full deductive closure can be re-derived
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
R_sem(0; d_Cn, P_O) = P_A H(π_A)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A mathematical theory of communication,
C. E. Shannon, “A mathematical theory of communication,”Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948
work page 1948
-
[2]
T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. John Wiley & Sons, 2006
work page 2006
-
[3]
I. Csisz ´ar and J. K ¨orner,Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011
work page 2011
-
[4]
Recent contributions to the mathematical theory of communication,
W. Weaver, “Recent contributions to the mathematical theory of communication,”ETC: a review of general semantics, vol. 74, no. 1/2, pp. 136–157, 2017
work page 2017
-
[5]
An outline of a theory of semantic information,
R. Carnap and Y . Bar-Hillel, “An outline of a theory of semantic information,” Research Laboratory of Electronics, MIT, Tech. Rep. Technical Report 247, 1952
work page 1952
-
[6]
Outline of a theory of strongly semantic information,
L. Floridi, “Outline of a theory of strongly semantic information,”Minds and machines, vol. 14, no. 2, pp. 197–221, 2004
work page 2004
-
[7]
Semantic information, autonomous agency and non-equilibrium statistical physics,
A. Kolchinsky and D. H. Wolpert, “Semantic information, autonomous agency and non-equilibrium statistical physics,”Interface focus, vol. 8, no. 6, p. 20180041, 2018
work page 2018
-
[8]
A mathematical theory of semantic communication,
K. Niu and P. Zhang, “A mathematical theory of semantic communication,”Journal on Communications, vol. 45, no. 6, pp. 7–59, 2024
work page 2024
-
[9]
Modern semantic communication and 6g intellicise network theory and technology system,
P. Zhang, X. Xu, K. Niu, W. Xu, S. Han, M. Sun, C. Dong, N. Ma, and Z. Zhang, “Modern semantic communication and 6g intellicise network theory and technology system,”Journal of Beijing University of Posts and Telecommunications, 2025
work page 2025
-
[10]
A theory for semantic channel coding with many-to-one source,
S. Ma, C. Zhang, H. Qi, H. Li, Y . Bi, G. Shi, and N. Al-Dhahir, “A theory for semantic channel coding with many-to-one source,” IEEE Transactions on Cognitive Communications and Networking, 2025
work page 2025
-
[11]
Extended blahut-arimoto algorithm for semantic rate-distortion function,
Y . Han, Y . Liu, Y . Sun, K. Niu, N. Ma, S. Cui, and P. Zhang, “Extended blahut-arimoto algorithm for semantic rate-distortion function,” Entropy, vol. 27, no. 6, p. 651, 2025
work page 2025
-
[12]
Semantic arithmetic coding using synonymous mappings,
Z. Liang, J. Xu, K. Niu, and P. Zhang, “Semantic arithmetic coding using synonymous mappings,”Entropy, vol. 27, no. 4, p. 429, 2025
work page 2025
-
[13]
Bayesian inverse contextual reasoning for heterogeneous semantics-native communication,
H. Seo, Y . Kang, M. Bennis, and W. Choi, “Bayesian inverse contextual reasoning for heterogeneous semantics-native communication,” IEEE Transactions on Communications, vol. 72, no. 2, pp. 1092–1107, 2023
work page 2023
-
[14]
Logic-driven semantic communication for resilient multi-agent systems,
T. Alshammari and M. Bennis, “Logic-driven semantic communication for resilient multi-agent systems,”IEEE Open Journal of the Communications Society, vol. 7, pp. 620–644, 2026
work page 2026
-
[15]
Toward goal-oriented semantic communications: New metrics, framework, and open challenges,
A. Li, S. Wu, S. Meng, R. Lu, S. Sun, and Q. Zhang, “Toward goal-oriented semantic communications: New metrics, framework, and open challenges,”IEEE Wireless Communications, 2024
work page 2024
-
[16]
Toward effective and interpretable semantic communications,
Y . Wu, Y . Shi, S. Ma, C. Jiang, W. Zhang, and K. B. Letaief, “Toward effective and interpretable semantic communications,”IEEE Communications Magazine, 2024. 40
work page 2024
-
[17]
Deep learning enabled semantic communication systems,
H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE transactions on signal processing, vol. 69, pp. 2663–2675, 2021
work page 2021
-
[18]
Semantic communications: Principles and challenges,
Z. Qin, X. Tao, J. Lu, W. Tong, and G. Y . Li, “Semantic communications: Principles and challenges,”arXiv preprint arXiv:2201.01389, 2021
-
[19]
Beyond transmitting bits: Context, semantics, and task-oriented communications,
D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 5–41, 2022
work page 2022
-
[20]
Semantic communications: Overview, open issues, and future research directions,
X. Luo, H.-H. Chen, and Q. Guo, “Semantic communications: Overview, open issues, and future research directions,”IEEE Wireless communications, vol. 29, no. 1, pp. 210–219, 2022
work page 2022
-
[21]
Immerman,Descriptive Complexity, ser
N. Immerman,Descriptive Complexity, ser. Graduate Texts in Computer Science. Springer, 1999
work page 1999
-
[22]
What you always wanted to know about Datalog (and never dared to ask),
S. Ceri, G. Gottlob, and L. Tanca, “What you always wanted to know about Datalog (and never dared to ask),”IEEE Transactions on Knowledge and Data Engineering, vol. 1, no. 1, pp. 146–166, 1989
work page 1989
-
[23]
S. Abiteboul, R. Hull, and V . Vianu,Foundations of Databases. Addison-Wesley, 1995
work page 1995
-
[24]
Complexity and expressive power of logic programming,
E. Dantsin, T. Eiter, G. Gottlob, and A. V oronkov, “Complexity and expressive power of logic programming,”ACM Computing Surveys (CSUR), vol. 33, no. 3, pp. 374–425, 2001
work page 2001
-
[25]
Tractable hypergraph properties for constraint satisfaction and conjunctive queries,
D. Marx, “Tractable hypergraph properties for constraint satisfaction and conjunctive queries,”Journal of the ACM (JACM), vol. 60, no. 6, pp. 1–51, 2013
work page 2013
-
[26]
Jaguar: A primal algorithm for conjunctive query evaluation in submodular-width time,
M. Abo Khamis and H. Chen, “Jaguar: A primal algorithm for conjunctive query evaluation in submodular-width time,”Proceedings of the ACM on Management of Data, vol. 3, no. 2, pp. 1–21, 2025
work page 2025
-
[27]
K. Mu, “Identifying roles of formulas in inconsistency under priest’s minimally inconsistent logic of paradox,”Artificial Intelligence, vol. 335, p. 104199, 2024
work page 2024
-
[28]
Objective information theory: A sextuple model and 9 kinds of metrics,
J. Xu, J. Tang, X. Ma, B. Xu, S. Yanli, and Q. Yongjie, “Objective information theory: A sextuple model and 9 kinds of metrics,” in 2014 Science and information conference. IEEE, 2014, pp. 793–802
work page 2014
-
[29]
Research and application of general information measures based on a unified model,
J.Xu, “Research and application of general information measures based on a unified model,”IEEE Transactions on Computers, 2024
work page 2024
-
[30]
General information metrics for improving ai model training efficiency,
J. Xu, C. Liu, X. Tan, X. Zhu, A. Wu, H. Wan, W. Kong, C. Li, H. Xu, K. Kuang, and F. Wu, “General information metrics for improving ai model training efficiency,”Artificial Intelligence Review, vol. 58, p. 289, 2025
work page 2025
-
[31]
Research on a general state formalization method from the perspective of logic,
S. Qiu and J. Xu, “Research on a general state formalization method from the perspective of logic,”Mathematics, vol. 13, no. 20, p. 3324, 2025
work page 2025
-
[32]
From semantic communication to semantic-aware networking: Model, architecture, and open problems,
G. Shi, Y . Xiao, Y . Li, and X. Xie, “From semantic communication to semantic-aware networking: Model, architecture, and open problems,”IEEE Communications Magazine, vol. 59, no. 8, pp. 44–50, 2021
work page 2021
-
[33]
Towards a theory of semantic communication,
J. Bao, P. Basu, M. Dean, C. Partridge, A. Swami, W. Leland, and J. A. Hendler, “Towards a theory of semantic communication,” in 2011 IEEE Network Science Workshop. IEEE, 2011, pp. 110–117
work page 2011
-
[34]
Exploring network structure, dynamics, and function using NetworkX,
A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using NetworkX,” inProceedings of the 7th Python in Science Conference (SciPy 2008), 2008, pp. 11–15. [Online]. Available: https: //conference.scipy.org/proceedings/scipy2008/paper 2/full text.pdf
work page 2008
-
[35]
Logical depth and physical complexity,
C. H. Bennett, “Logical depth and physical complexity,”The Universal Turing Machine: A Half-Century Survey, pp. 227–257, 1988
work page 1988
-
[36]
The rate-distortion function for source coding with side information at the decoder,
A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,”IEEE Trans. Inform. Theory, vol. 22, no. 1, pp. 1–10, 1976
work page 1976
-
[37]
Relational queries computable in polynomial time,
N. Immerman, “Relational queries computable in polynomial time,” inProceedings of the fourteenth annual ACM symposium on Theory of computing, 1982, pp. 147–152
work page 1982
-
[38]
The complexity of relational query languages,
M. Y . Vardi, “The complexity of relational query languages,” inProceedings of the fourteenth annual ACM symposium on Theory of computing, 1982, pp. 137–146
work page 1982
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.