pith. sign in

arxiv: 2604.11204 · v1 · submitted 2026-04-13 · 💻 cs.IT · cs.MA· math.IT

Semantic Rate-Distortion Theory: Deductive Compression and Closure Fidelity

Pith reviewed 2026-05-10 16:40 UTC · model grok-4.3

classification 💻 cs.IT cs.MAmath.IT
keywords semantic rate-distortionclosure fidelityirredundant coredeductive compressionknowledge basesource-channel separationDatalog
0
0 comments X p. Extension

The pith

Knowledge bases with logical structure can be transmitted at rates strictly below classical entropy when fidelity requires only preserving the deductive closure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a rate-distortion theory for sources that consist of knowledge bases equipped with a logical proof system. It replaces the usual exact-reproduction criterion with closure fidelity, under which a reconstruction is acceptable if it yields exactly the same set of logical consequences as the original base. The theory centers on an irredundant core extracted by a fixed-order deletion process; this core alone determines the minimal rate and the distortion function, rendering all redundant facts invisible to both quantities. As a direct result, the zero-distortion semantic rate lies strictly below the classical entropy rate whenever redundancies are present. The paper also derives a semantic source-channel separation theorem that quantifies an asymptotic leverage factor greater than one in the number of channel uses required.

Core claim

Under closure fidelity the semantic rate-distortion function depends only on the irredundant core of the knowledge base. Redundant states contribute neither to the required rate nor to the achievable distortion. Consequently the zero-distortion semantic rate is strictly smaller than the classical entropy rate for any base containing redundancies, and a semantic source-channel separation theorem holds with an asymptotic leverage factor strictly greater than one.

What carries the argument

The irredundant core: a canonical generating set obtained from the knowledge base by a fixed-order deletion procedure, from which the entire deductive closure can be rederived.

If this is right

  • The semantic rate-distortion function is completely determined by the irredundant core and ignores all redundant states.
  • Semantic source-channel separation yields an asymptotic reduction in required channel uses by a factor greater than one.
  • A strengthened Fano inequality holds that exploits the structure of the core.
  • An overlap decomposition supplies necessary and sufficient conditions for closure-reliable transmission among heterogeneous agents.
  • The same leverage effect appears in broadcast settings even when the channel is noiseless.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same compression mechanism may apply to other structured sources such as ontologies or theorem-proving databases.
  • The identified semantic bottleneck in broadcast channels could limit efficiency in distributed knowledge systems regardless of channel quality.
  • Verification on larger or differently structured logics would test how widely the leverage factor exceeds one.

Load-bearing premise

A fixed-order deletion procedure always extracts a canonical irredundant core from which the full deductive closure can be rederived without loss.

What would settle it

A concrete knowledge base containing redundant states for which the zero-distortion semantic rate equals the classical entropy rate, or for which the fixed-order deletion procedure fails to recover the full deductive closure.

read the original abstract

Shannon's rate-distortion theory treats source symbols as unstructured labels. When the source is a knowledge base equipped with a logical proof system, a natural fidelity criterion is closure fidelity: a reconstruction is acceptable if it preserves the deductive closure of the original. This paper develops a rate-distortion theory under this criterion. Central to the theory is the irredundant core-a canonical generating set extracted by a fixed-order deletion procedure, from which the full deductive closure can be rederived. We prove that the zero-distortion semantic rate equals a quantity that is strictly below the classical entropy rate whenever the knowledge base contains redundant states. More generally, the full semantic rate-distortion function depends only on the core; redundant states are invisible to both rate and distortion. We derive a semantic source-channel separation theorem showing a semantic leverage phenomenon: under closure fidelity, the required source rate is reduced by an asymptotic leverage factor greater than one, allowing the same knowledge base to be communicated with proportionally fewer channel uses-not by violating Shannon capacity, but because redundant states become free. We also prove a strengthened Fano inequality that exploits core structure. For heterogeneous multi-agent communication, an overlap decomposition gives necessary and sufficient conditions for closure-reliable transmission and identifies a semantic bottleneck in broadcast settings that persists even over noiseless channels. All results are verified on Datalog instances with up to 24,000 base facts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper develops a rate-distortion theory for knowledge-base sources under a closure-fidelity criterion, where a reconstruction is acceptable if it preserves the deductive closure of the original knowledge base. It introduces the irredundant core extracted via a fixed-order deletion procedure as the canonical generating set, proves that the zero-distortion semantic rate is strictly below the classical entropy rate when redundancies exist, shows that the full semantic rate-distortion function depends only on the core (rendering redundant states invisible), derives a semantic source-channel separation theorem with a leverage phenomenon, a strengthened Fano inequality, and overlap-decomposition conditions for multi-agent closure-reliable transmission, with all results verified on Datalog instances up to 24,000 facts.

Significance. If the core claims hold, the work supplies a new structured source model and fidelity criterion that allows deductive redundancies to be exploited for compression gains, yielding a source-channel separation result with asymptotic leverage greater than one and a semantic bottleneck in broadcast settings. The concrete verification on Datalog instances and the derivation of the separation theorem and strengthened Fano inequality are positive features that ground the theory in a computable setting.

major comments (2)
  1. [Abstract] Abstract: The central claims that 'the zero-distortion semantic rate equals a quantity that is strictly below the classical entropy rate whenever the knowledge base contains redundant states' and that 'the full semantic rate-distortion function depends only on the core' rest on the fixed-order deletion procedure yielding a canonical irredundant core for each deductive closure. In Datalog, however, the same closure can be generated by multiple distinct irredundant bases; because deletion is order-dependent, the procedure can output different cores for knowledge bases realizing the identical closure. Consequently the minimal rate achieving zero closure fidelity is the entropy of the closure random variable (satisfying H(closure) ≤ H(core)), not necessarily a quantity derived from the core distribution, so the asserted strict inequality and core-only dependence do not follow.
  2. [Abstract] The semantic source-channel separation theorem and the associated leverage claim presuppose that the rate-distortion function is fully determined by the core distribution. If the deletion procedure fails to produce a unique canonical core, the achievable rate region under closure fidelity would instead be governed by the closure entropy, altering the leverage factor and the conditions for reliable transmission in the multi-agent overlap decomposition.
minor comments (1)
  1. [Abstract] The abstract and main text would benefit from an explicit equation defining the semantic rate-distortion function R_s(D) to make the dependence on the core distribution immediately visible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and insightful review of our manuscript. The comments raise important points about the canonicity of the irredundant core and its implications for the rate-distortion function and separation theorem. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims that 'the zero-distortion semantic rate equals a quantity that is strictly below the classical entropy rate whenever the knowledge base contains redundant states' and that 'the full semantic rate-distortion function depends only on the core' rest on the fixed-order deletion procedure yielding a canonical irredundant core for each deductive closure. In Datalog, however, the same closure can be generated by multiple distinct irredundant bases; because deletion is order-dependent, the procedure can output different cores for knowledge bases realizing the identical closure. Consequently the minimal rate achieving zero closure fidelity is the entropy of the closure random variable (satisfying H(closure) ≤ H(core)), not necessarily a quantity derived from the core distribution, so the asserted strict inequality and core-only dependence do not follow.

    Authors: We appreciate the referee's precise identification of this subtlety. The fixed-order deletion procedure is defined to produce a deterministic core for any given knowledge-base realization, so that the core is a well-defined random variable induced by the source distribution over knowledge bases. Our proofs establish that closure fidelity renders redundant facts invisible, allowing the semantic rate-distortion function to be expressed in terms of the core distribution. The claimed strict inequality with classical entropy holds because H(core) < H(KB) whenever redundant states occur with positive probability. Nevertheless, we agree that the information-theoretically minimal rate for zero closure fidelity is H(closure(X)), which satisfies H(closure(X)) ≤ H(core(X)). We will revise the abstract and the statements of the main theorems to clarify that the semantic R(0) equals H(closure(X)) and that the full semantic rate-distortion function is governed by the distribution of the closure (which is generated from the core). This preserves the core-only dependence in the operational sense that the encoder may extract and transmit the core while the decoder reconstructs any valid generator of the same closure. revision: yes

  2. Referee: [Abstract] The semantic source-channel separation theorem and the associated leverage claim presuppose that the rate-distortion function is fully determined by the core distribution. If the deletion procedure fails to produce a unique canonical core, the achievable rate region under closure fidelity would instead be governed by the closure entropy, altering the leverage factor and the conditions for reliable transmission in the multi-agent overlap decomposition.

    Authors: The separation theorem is derived by showing that any code achieving a given closure-fidelity distortion can be reduced to an equivalent code whose rate is determined by the core. The leverage factor greater than one follows directly from the reduction in effective source entropy once redundancies are removed by the fixed-order procedure. We maintain that the procedure supplies the necessary structure for the source model. To address the referee's concern, we will add a clarifying lemma relating H(core) and H(closure) and revise the statement of the separation theorem to indicate that the achievable rate region is bounded by the closure entropy while the operational leverage is realized through the core. The overlap-decomposition conditions for multi-agent settings will be updated to reflect this relationship, ensuring the semantic bottleneck characterization remains valid. revision: partial

Circularity Check

0 steps flagged

No circularity: core extraction is independent of the rate function

full rationale

The semantic rate-distortion function is defined using the irredundant core obtained from a fixed-order deletion procedure that operates on the knowledge base prior to and independently of any rate or distortion calculation. Zero-distortion semantic rate is stated to equal a quantity derived from this externally extracted core distribution, not constructed from the rate expression itself. No equations equate a prediction to a fitted parameter by construction, and no load-bearing step reduces to a self-citation or ansatz smuggled from prior work by the same author. Verification on concrete Datalog instances supplies external grounding. The modeling choice that the deletion procedure produces a canonical representative is an assumption about the source model, not a definitional loop inside the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The theory rests on domain assumptions about logical proof systems in knowledge bases and introduces new concepts (closure fidelity, irredundant core) extracted via a deletion procedure. No free parameters are mentioned. Results depend on the canonical extraction of the core and the appropriateness of closure fidelity.

axioms (2)
  • domain assumption Knowledge bases are equipped with a logical proof system that defines a deductive closure.
    Invoked to define closure fidelity and the irredundant core as a generating set.
  • ad hoc to paper A fixed-order deletion procedure extracts a canonical irredundant core from which the full deductive closure can be rederived.
    Central to making redundant states invisible to rate and distortion.
invented entities (2)
  • irredundant core no independent evidence
    purpose: Canonical minimal generating set for the deductive closure
    New entity extracted by deletion procedure to determine the semantic rate.
  • closure fidelity no independent evidence
    purpose: Fidelity criterion based on preservation of deductive closure
    New distortion measure replacing traditional symbol-level distortion.

pith-pipeline@v0.9.0 · 5541 in / 1662 out tokens · 58881 ms · 2026-05-10T16:40:31.400778+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    A mathematical theory of communication,

    C. E. Shannon, “A mathematical theory of communication,”Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948

  2. [2]

    T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. John Wiley & Sons, 2006

  3. [3]

    Csisz ´ar and J

    I. Csisz ´ar and J. K ¨orner,Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011

  4. [4]

    Recent contributions to the mathematical theory of communication,

    W. Weaver, “Recent contributions to the mathematical theory of communication,”ETC: a review of general semantics, vol. 74, no. 1/2, pp. 136–157, 2017

  5. [5]

    An outline of a theory of semantic information,

    R. Carnap and Y . Bar-Hillel, “An outline of a theory of semantic information,” Research Laboratory of Electronics, MIT, Tech. Rep. Technical Report 247, 1952

  6. [6]

    Outline of a theory of strongly semantic information,

    L. Floridi, “Outline of a theory of strongly semantic information,”Minds and machines, vol. 14, no. 2, pp. 197–221, 2004

  7. [7]

    Semantic information, autonomous agency and non-equilibrium statistical physics,

    A. Kolchinsky and D. H. Wolpert, “Semantic information, autonomous agency and non-equilibrium statistical physics,”Interface focus, vol. 8, no. 6, p. 20180041, 2018

  8. [8]

    A mathematical theory of semantic communication,

    K. Niu and P. Zhang, “A mathematical theory of semantic communication,”Journal on Communications, vol. 45, no. 6, pp. 7–59, 2024

  9. [9]

    Modern semantic communication and 6g intellicise network theory and technology system,

    P. Zhang, X. Xu, K. Niu, W. Xu, S. Han, M. Sun, C. Dong, N. Ma, and Z. Zhang, “Modern semantic communication and 6g intellicise network theory and technology system,”Journal of Beijing University of Posts and Telecommunications, 2025

  10. [10]

    A theory for semantic channel coding with many-to-one source,

    S. Ma, C. Zhang, H. Qi, H. Li, Y . Bi, G. Shi, and N. Al-Dhahir, “A theory for semantic channel coding with many-to-one source,” IEEE Transactions on Cognitive Communications and Networking, 2025

  11. [11]

    Extended blahut-arimoto algorithm for semantic rate-distortion function,

    Y . Han, Y . Liu, Y . Sun, K. Niu, N. Ma, S. Cui, and P. Zhang, “Extended blahut-arimoto algorithm for semantic rate-distortion function,” Entropy, vol. 27, no. 6, p. 651, 2025

  12. [12]

    Semantic arithmetic coding using synonymous mappings,

    Z. Liang, J. Xu, K. Niu, and P. Zhang, “Semantic arithmetic coding using synonymous mappings,”Entropy, vol. 27, no. 4, p. 429, 2025

  13. [13]

    Bayesian inverse contextual reasoning for heterogeneous semantics-native communication,

    H. Seo, Y . Kang, M. Bennis, and W. Choi, “Bayesian inverse contextual reasoning for heterogeneous semantics-native communication,” IEEE Transactions on Communications, vol. 72, no. 2, pp. 1092–1107, 2023

  14. [14]

    Logic-driven semantic communication for resilient multi-agent systems,

    T. Alshammari and M. Bennis, “Logic-driven semantic communication for resilient multi-agent systems,”IEEE Open Journal of the Communications Society, vol. 7, pp. 620–644, 2026

  15. [15]

    Toward goal-oriented semantic communications: New metrics, framework, and open challenges,

    A. Li, S. Wu, S. Meng, R. Lu, S. Sun, and Q. Zhang, “Toward goal-oriented semantic communications: New metrics, framework, and open challenges,”IEEE Wireless Communications, 2024

  16. [16]

    Toward effective and interpretable semantic communications,

    Y . Wu, Y . Shi, S. Ma, C. Jiang, W. Zhang, and K. B. Letaief, “Toward effective and interpretable semantic communications,”IEEE Communications Magazine, 2024. 40

  17. [17]

    Deep learning enabled semantic communication systems,

    H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE transactions on signal processing, vol. 69, pp. 2663–2675, 2021

  18. [18]

    Semantic communications: Principles and challenges,

    Z. Qin, X. Tao, J. Lu, W. Tong, and G. Y . Li, “Semantic communications: Principles and challenges,”arXiv preprint arXiv:2201.01389, 2021

  19. [19]

    Beyond transmitting bits: Context, semantics, and task-oriented communications,

    D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 5–41, 2022

  20. [20]

    Semantic communications: Overview, open issues, and future research directions,

    X. Luo, H.-H. Chen, and Q. Guo, “Semantic communications: Overview, open issues, and future research directions,”IEEE Wireless communications, vol. 29, no. 1, pp. 210–219, 2022

  21. [21]

    Immerman,Descriptive Complexity, ser

    N. Immerman,Descriptive Complexity, ser. Graduate Texts in Computer Science. Springer, 1999

  22. [22]

    What you always wanted to know about Datalog (and never dared to ask),

    S. Ceri, G. Gottlob, and L. Tanca, “What you always wanted to know about Datalog (and never dared to ask),”IEEE Transactions on Knowledge and Data Engineering, vol. 1, no. 1, pp. 146–166, 1989

  23. [23]

    Abiteboul, R

    S. Abiteboul, R. Hull, and V . Vianu,Foundations of Databases. Addison-Wesley, 1995

  24. [24]

    Complexity and expressive power of logic programming,

    E. Dantsin, T. Eiter, G. Gottlob, and A. V oronkov, “Complexity and expressive power of logic programming,”ACM Computing Surveys (CSUR), vol. 33, no. 3, pp. 374–425, 2001

  25. [25]

    Tractable hypergraph properties for constraint satisfaction and conjunctive queries,

    D. Marx, “Tractable hypergraph properties for constraint satisfaction and conjunctive queries,”Journal of the ACM (JACM), vol. 60, no. 6, pp. 1–51, 2013

  26. [26]

    Jaguar: A primal algorithm for conjunctive query evaluation in submodular-width time,

    M. Abo Khamis and H. Chen, “Jaguar: A primal algorithm for conjunctive query evaluation in submodular-width time,”Proceedings of the ACM on Management of Data, vol. 3, no. 2, pp. 1–21, 2025

  27. [27]

    Identifying roles of formulas in inconsistency under priest’s minimally inconsistent logic of paradox,

    K. Mu, “Identifying roles of formulas in inconsistency under priest’s minimally inconsistent logic of paradox,”Artificial Intelligence, vol. 335, p. 104199, 2024

  28. [28]

    Objective information theory: A sextuple model and 9 kinds of metrics,

    J. Xu, J. Tang, X. Ma, B. Xu, S. Yanli, and Q. Yongjie, “Objective information theory: A sextuple model and 9 kinds of metrics,” in 2014 Science and information conference. IEEE, 2014, pp. 793–802

  29. [29]

    Research and application of general information measures based on a unified model,

    J.Xu, “Research and application of general information measures based on a unified model,”IEEE Transactions on Computers, 2024

  30. [30]

    General information metrics for improving ai model training efficiency,

    J. Xu, C. Liu, X. Tan, X. Zhu, A. Wu, H. Wan, W. Kong, C. Li, H. Xu, K. Kuang, and F. Wu, “General information metrics for improving ai model training efficiency,”Artificial Intelligence Review, vol. 58, p. 289, 2025

  31. [31]

    Research on a general state formalization method from the perspective of logic,

    S. Qiu and J. Xu, “Research on a general state formalization method from the perspective of logic,”Mathematics, vol. 13, no. 20, p. 3324, 2025

  32. [32]

    From semantic communication to semantic-aware networking: Model, architecture, and open problems,

    G. Shi, Y . Xiao, Y . Li, and X. Xie, “From semantic communication to semantic-aware networking: Model, architecture, and open problems,”IEEE Communications Magazine, vol. 59, no. 8, pp. 44–50, 2021

  33. [33]

    Towards a theory of semantic communication,

    J. Bao, P. Basu, M. Dean, C. Partridge, A. Swami, W. Leland, and J. A. Hendler, “Towards a theory of semantic communication,” in 2011 IEEE Network Science Workshop. IEEE, 2011, pp. 110–117

  34. [34]

    Exploring network structure, dynamics, and function using NetworkX,

    A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using NetworkX,” inProceedings of the 7th Python in Science Conference (SciPy 2008), 2008, pp. 11–15. [Online]. Available: https: //conference.scipy.org/proceedings/scipy2008/paper 2/full text.pdf

  35. [35]

    Logical depth and physical complexity,

    C. H. Bennett, “Logical depth and physical complexity,”The Universal Turing Machine: A Half-Century Survey, pp. 227–257, 1988

  36. [36]

    The rate-distortion function for source coding with side information at the decoder,

    A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,”IEEE Trans. Inform. Theory, vol. 22, no. 1, pp. 1–10, 1976

  37. [37]

    Relational queries computable in polynomial time,

    N. Immerman, “Relational queries computable in polynomial time,” inProceedings of the fourteenth annual ACM symposium on Theory of computing, 1982, pp. 147–152

  38. [38]

    The complexity of relational query languages,

    M. Y . Vardi, “The complexity of relational query languages,” inProceedings of the fourteenth annual ACM symposium on Theory of computing, 1982, pp. 137–146