Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables

Chenguang Lu

arxiv: 2408.13122 · v2 · submitted 2024-08-12 · 💻 cs.LG · cs.AI· cs.IT· math.IT

Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables

Chenguang Lu This is my paper

Pith reviewed 2026-05-23 22:10 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.ITmath.IT

keywords semantic variational bayesrate-fidelity functionlatent variable inferenceinformation efficiencysemantic informationvariational methodsmixture modelsreinforcement learning

0 comments

The pith

Semantic Variational Bayes solves latent variable distributions by maximizing information efficiency G/R instead of minimizing free energy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Semantic Variational Bayes as a method to find probability distributions over latent variables. It derives the approach from the rate-fidelity function in semantic information theory, which extends the classic rate-distortion tradeoff to semantic mutual information G. SVB optimizes parameters using the maximum efficiency criterion G over R while applying iterative techniques to the channel. A reader would care because the author states this yields simpler computation than standard variational Bayes for identical tasks and directly incorporates constraints such as likelihood or distortion. Demonstrations cover mixture model convergence, data compression under error ranges, and control tasks that balance purposiveness with efficiency.

Core claim

SVB comes from the parameter solution of the rate-fidelity function R(G), where R is the minimum mutual information required for a given semantic mutual information G. The method uses the maximum information efficiency criterion G/R, which includes maximizing semantic information to optimize model parameters and minimizing mutual information to optimize the Shannon channel. Constraint functions include likelihood, truth, membership, similarity, and distortion. Variational and iterative techniques carry over from earlier rate-distortion work. For the same tasks, SVB is computationally simpler than VB.

What carries the argument

The rate-fidelity function R(G), which supplies the minimum mutual information R for a prescribed semantic mutual information G and directly yields the variational optimization procedure for SVB.

If this is right

Mixture models converge as the efficiency ratio G/R increases.
SVB supports data compression when a group of error ranges serves as the constraint.
The semantic information measure and SVB enable maximum entropy control and reinforcement learning under given range constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the claimed simplicity holds, SVB could be tested on larger probabilistic models to check whether the advantage scales beyond the reported examples.
The use of semantic constraints such as truth or similarity functions may connect SVB to other inference settings that already incorporate domain knowledge.
The paper notes further work is needed for neural networks, so an immediate extension would be to replace free-energy terms in existing deep variational autoencoders with the G/R objective.

Load-bearing premise

The rate-fidelity function R(G) from semantic information theory directly supplies the parameter solution method for SVB, so that variational and iterative techniques transfer without further justification.

What would settle it

A side-by-side count of arithmetic operations or iterations on a standard mixture-model task where SVB requires more computation than VB to reach the same accuracy or where the model fails to converge as G/R rises.

Figures

Figures reproduced from arXiv: 2408.13122 by Chenguang Lu.

**Figure 2.** Figure 2: The information rate-fidelity function R(G) for binary communication. Any R(G) function is bowl-like and has a point where s=1 and R = G. For given R, there are two anti-functions, G- (R) and G+ (R). The shape of any R(G) function is a bowl-like curve, which may be asymmetric [12], with the second derivative ≥ 0. There is s= dR/dG. When s = 1, R equals G. G/R indicates the optimized information efficiency.… view at source ↗

**Figure 4.** Figure 4: Comparing EM and E3M algorithms with an example that is hard to converge. The EM algorithm needs about 340 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Finding P(y|x) conveying MMI for given constraint ranges. (a) The truth functions of four labels over ages; (b) The convergent Shannon channel P(y|x); (c) The changes of I(X; Yθ) and I(X; Y) during the iterative process. Figure 5b shows that the four transition probability functions cover four areas almost the same as those covered by the four truth functions; however, their maximum values differ. Figure 5… view at source ↗

read the original abstract

The Variational Bayesian method (VB) is used to solve the probability distributions of latent variables with the minimum free energy criterion. This criterion is not easy to understand, and the computation is complex. For these reasons, this paper proposes the Semantic Variational Bayes' method (SVB). The Semantic Information Theory the author previously proposed extends the rate-distortion function R(D) to the rate-fidelity function R(G), where R is the minimum mutual information for given semantic mutual information G. SVB came from the parameter solution of R(G), where the variational and iterative methods originated from Shannon et al.'s research on the rate-distortion function. The constraint functions SVB uses include likelihood, truth, membership, similarity, and distortion functions. SVB uses the maximum information efficiency (G/R) criterion, including the maximum semantic information criterion for optimizing model parameters and the minimum mutual information criterion for optimizing the Shannon channel. For the same tasks, SVB is computationally simpler than VB. The computational experiments in the paper include 1) using a mixture model as an example to show that the mixture model converges as G/R increases; 2) demonstrating the application of SVB in data compression with a group of error ranges as the constraint; 3) illustrating how the semantic information measure and SVB can be used for maximum entropy control and reinforcement learning in control tasks with given range constraints, providing numerical evidence for balancing control's purposiveness and efficiency. Further research is needed to apply SVB to neural networks and deep learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SVB is a direct application of the author's prior G theory to variational methods, with illustrative examples but no evidence shown for the claimed simplicity gain.

read the letter

The core of this paper is taking the rate-fidelity function R(G) from the author's earlier Semantic Information G Theory and using its parameter solution as the basis for a variational Bayes variant called SVB. The claim is that this makes the method simpler than standard VB for tasks with certain constraints. It does a decent job laying out how the G/R efficiency criterion works with different constraint functions like likelihood and distortion. The three examples are straightforward: one tracks how a mixture model behaves as G/R increases, another applies it to compression with error range limits, and the third uses it for control tasks to balance information use and goal achievement. These give concrete illustrations of the approach. The soft spots are around the lack of evidence for the simplicity claim. No direct comparisons to VB implementations appear, so we don't see whether the overhead of computing G actually reduces total work. The reliance on the prior theory means this is more of an extension than a standalone result, and the experiments stay at the level of demonstration without error bars or statistical tests. This paper is for people who follow information-theoretic methods in machine learning and are open to semantic extensions of rate-distortion ideas. It won't appeal much to those focused on scalable deep learning approximations. I think it deserves peer review. The framework is internally consistent and the applications are relevant enough that referees could usefully check the derivations and suggest where more validation is needed.

Referee Report

3 major / 1 minor

Summary. The paper proposes Semantic Variational Bayes (SVB) as a computationally simpler alternative to standard Variational Bayes (VB) for inferring distributions over latent variables. SVB is obtained directly from the parameter solution of the rate-fidelity function R(G) in the author's prior Semantic Information G Theory, employing the maximum information efficiency (G/R) criterion together with constraint functions (likelihood, truth, membership, similarity, distortion). The manuscript illustrates the method on three tasks: convergence of a mixture model as G/R increases, data compression under error-range constraints, and maximum-entropy control / reinforcement learning under range constraints.

Significance. If the claimed reduction in computational complexity relative to free-energy VB can be substantiated and the R(G) extension is shown to be valid without hidden overhead, SVB would supply an alternative optimization criterion that incorporates semantic constraints explicitly. The numerical illustrations on mixture models, compression, and control provide concrete examples of the G/R trade-off, but the absence of any complexity metrics or baseline comparisons prevents a firm assessment of practical advantage.

major comments (3)

[Abstract] Abstract: the assertion that 'For the same tasks, SVB is computationally simpler than VB' is load-bearing for the central contribution yet is unsupported by any runtime counts, iteration counts, arithmetic-operation tallies, or side-by-side comparison against a standard evidence-lower-bound VB implementation on identical models.
[Abstract] Abstract (experiments 1–3): the three reported demonstrations (mixture-model convergence, error-range compression, max-entropy control) contain no error analysis, convergence-rate data, or quantitative validation that the parameter solutions obtained from R(G) are correct or cheaper than those obtained from the free-energy objective.
[Abstract] Abstract: the claim that variational and iterative methods 'originated from Shannon et al.'s research on the rate-distortion function' and carry over without additional justification is presented without an explicit mapping showing how the overhead of defining and computing the semantic mutual information G is offset by the claimed simplicity.

minor comments (1)

[Abstract] Abstract: the final sentence states that further research is needed for neural networks but does not identify the concrete obstacles (e.g., scaling of G computation) that currently prevent such application.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger empirical support and clarification in the manuscript. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'For the same tasks, SVB is computationally simpler than VB' is load-bearing for the central contribution yet is unsupported by any runtime counts, iteration counts, arithmetic-operation tallies, or side-by-side comparison against a standard evidence-lower-bound VB implementation on identical models.

Authors: We acknowledge that the claim of computational simplicity lacks quantitative support such as runtime or operation counts in the current manuscript. The argument for simplicity rests on SVB being obtained directly from the parameter solution of R(G) under the maximum G/R criterion, thereby avoiding iterative minimization of the free-energy functional. However, without explicit benchmarks this remains unsubstantiated. We will revise the abstract to qualify or remove the assertion. revision: yes
Referee: [Abstract] Abstract (experiments 1–3): the three reported demonstrations (mixture-model convergence, error-range compression, max-entropy control) contain no error analysis, convergence-rate data, or quantitative validation that the parameter solutions obtained from R(G) are correct or cheaper than those obtained from the free-energy objective.

Authors: The three examples serve to illustrate the application of the G/R criterion and the effect of semantic constraints rather than to provide rigorous quantitative benchmarks. We agree that they lack error analysis, convergence rates, and direct comparisons to standard VB. In revision we will add convergence metrics for the mixture-model case and clarify the illustrative nature of the other examples. revision: partial
Referee: [Abstract] Abstract: the claim that variational and iterative methods 'originated from Shannon et al.'s research on the rate-distortion function' and carry over without additional justification is presented without an explicit mapping showing how the overhead of defining and computing the semantic mutual information G is offset by the claimed simplicity.

Authors: The reference is to the historical origin of the Blahut-Arimoto-style iterative updates used for rate-distortion functions, which SVB adapts for the rate-fidelity function R(G) with semantic constraints. The overhead of G is incurred through the supplied constraint functions, but we agree an explicit discussion of the resulting computational trade-off is missing. We will insert a short explanatory paragraph in the revised manuscript. revision: yes

Circularity Check

1 steps flagged

SVB parameter solution and G/R criterion imported wholesale from author's prior Semantic Information G Theory via self-citation

specific steps

self citation load bearing [Abstract]
"The Semantic Information Theory the author previously proposed extends the rate-distortion function R(D) to the rate-fidelity function R(G), where R is the minimum mutual information for given semantic mutual information G. SVB came from the parameter solution of R(G), where the variational and iterative methods originated from Shannon et al.'s research on the rate-distortion function. ... SVB uses the maximum information efficiency (G/R) criterion, including the maximum semantic information criterion for optimizing model parameters and the minimum mutual information criterion for optimizing 1"

The load-bearing step is the assertion that SVB is obtained directly from the parameter solution of R(G) in the author's prior Semantic Information G Theory. No new derivation of that solution or explicit mapping showing reduced arithmetic operations relative to standard VB free-energy optimization appears in the present paper; the method and the G/R optimality criterion are therefore equivalent to the inputs supplied by the self-citation.

full rationale

The paper states outright that SVB 'came from the parameter solution of R(G)' and that the extension of R(D) to R(G) originates in the author's previous work. The central claims (simpler computation than VB, use of G/R criterion, constraint functions) therefore rest on that self-cited framework rather than an independent derivation or explicit complexity reduction shown in this manuscript. This matches self-citation load-bearing with no external verification or new equations supplied here.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The method rests on the author's previously proposed Semantic Information G Theory as the source of the R(G) function and the G measure; no independent evidence or machine-checked support for that foundation is referenced.

free parameters (1)

G (semantic mutual information)
Central quantity defined in prior work; used as the fidelity measure whose maximization drives parameter updates.

axioms (1)

domain assumption The rate-fidelity function R(G) extends the classical rate-distortion function and supplies the variational solution method for latent-variable inference.
Invoked in the abstract as the origin of SVB without additional derivation.

invented entities (1)

Semantic mutual information G no independent evidence
purpose: To quantify semantic fidelity between distributions as an extension beyond ordinary mutual information.
Introduced in the author's prior Semantic Information G Theory; no independent falsifiable handle is provided in the current abstract.

pith-pipeline@v0.9.0 · 5807 in / 1672 out tokens · 31655 ms · 2026-05-23T22:10:01.925553+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 4 internal anchors

[1]

M. J. Beal, Variational algorithms for approximate Bayesian inference. Doctoral thesis (Ph.D), University College London, 2003

work page 2003
[2]

Inferring Parameters and Structure of Latent Variable Models by Variational Bayes

H. Attias, "Inferring parameters and structure of latent variable models by variational Bayes." [Online]. Available: https://arxiv.org/abs/1301.6676

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Variational Bayesian methods,

Wikipedia, " Variational Bayesian methods," [Online]. Available: https://en.wikipedia.org/wiki/Variational_Bayesian_methods

work page
[4]

A view of the EM algorithm that justifies incremental, sparse, and other variants

R. Neal and G. Hinton, "A view of the EM algorithm that justifies incremental, sparse, and other variants." In: Learning in Graphical Models, edited by Michael I. Jordan, PP. 355–368, MIT Press, Cambridge, 1999

work page 1999
[5]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," [Online], Available: https://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv
[6]

The free-energy principle: a unified brain theory? , volume =

K. Friston, "The free-energy principle: a unified brain theory?" Nat Rev Neurosci, vol. 11, no. 2, pp. 127–138, Feb. 2010, doi: 10.1038/NRN2787

work page doi:10.1038/nrn2787 2010
[7]

Variational Bayes: A report on approaches and applications

M. S. Yellapragada and C. P. Konkimalla, "Variational Bayes: A report on approaches and applications," [Online]. Available: https://arxiv.org/abs/1905.10744

work page internal anchor Pith review Pith/arXiv arXiv 1905
[8]

Information-theoretic regularization for learning global features by sequential V AE,

K. Akuzawa, Y . Iwasawa, and Y . Matsuo, "Information-theoretic regularization for learning global features by sequential V AE," Mach Learn, vol. 110, no. 8, pp. 2239-2266, 2021, doi: 10.1007/s10994-021-06032-4

work page doi:10.1007/s10994-021-06032-4 2021
[9]

Robust Multi-agent Communication with Graph Information Bottleneck Optimization,

S. Ding, W. Du, L. Ding, J. Zhang, L. Guo, and B. An, "Robust Multi-agent Communication with Graph Information Bottleneck Optimization," IEEE Trans Pattern Anal Mach Intell, vol. 46, no. 6, pp. 3096-3107, 2023, doi: 10.1109/TPAMI.2023.3337534

work page doi:10.1109/tpami.2023.3337534 2023
[10]

A mathematical theory of communication,

C. E. Shannon, "A mathematical theory of communication," Bell Syst. Tech. J., vol. 27, 379–429, 623–656, 1948

work page 1948
[11]

Coding theorems for a discrete source with a fidelity criterion,

C. E. Shannon, "Coding theorems for a discrete source with a fidelity criterion," IRE Nat. Conv. Rec. vol. 4, 142–163, 1959

work page 1959
[12]

A generalization of Shannon's information theory,

C. G. Lu, "A generalization of Shannon's information theory," International Journal of General System, vol. 28, no. 6, pp. 453–490, 1999

work page 1999
[13]

Semantic information G theory and logical Bayesian inference for machine learning,

C. Lu, "Semantic information G theory and logical Bayesian inference for machine learning," Information, vol. 10, no. 8, p. 261, Aug. 2019, doi: 10.3390/INFO10080261

work page doi:10.3390/info10080261 2019
[14]

Berger, Rate Distortion Theory, Enklewood Cliffs, NJ, USA：Prentice-Hall, 1971

T. Berger, Rate Distortion Theory, Enklewood Cliffs, NJ, USA：Prentice-Hall, 1971

work page 1971
[15]

Lossy source coding,

T. Berger and J. D. Gibson, "Lossy source coding," IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2693–2723, 1998

work page 1998
[16]

J. P. Zhou et al., Fundamentals of information theory, Beijing, China: People's Posts and Telecommunications Press, 1983

work page 1983
[17]

Meanings of generalized entropy and generalized mutual information for coding,

C. Lu, "Meanings of generalized entropy and generalized mutual information for coding," (Chinese:广义熵和广义互信息的编码意义), J. of China Institute of Communication(通信学报), vol. 5, no. 6, pp. 37-44, June 1994

work page 1994
[18]

Lu, A Generalized Information Theory (Chinese: 广义信息论), Hefei, China: China Science and Technology University Press(中国科学技术大学出版）, 1993

C. Lu, A Generalized Information Theory (Chinese: 广义信息论), Hefei, China: China Science and Technology University Press(中国科学技术大学出版）, 1993. ISBN 7-312-00501-2

work page 1993
[19]

The P–T probability framework for semantic communication, falsification, confirmation, and Bayesian reasoning,

C. Lu, "The P–T probability framework for semantic communication, falsification, confirmation, and Bayesian reasoning," Philosophies, vol. 5, no. 4, p. 25, Oct. 2020, doi: 10.3390/philosophies5040025

work page doi:10.3390/philosophies5040025 2020
[20]

Using the semantic information G measure to explain and extend rate-distortion functions and maximum entropy distributions,

C. Lu, "Using the semantic information G measure to explain and extend rate-distortion functions and maximum entropy distributions," Entropy, vol. 23, no. 8, Aug. 2021, doi: 10.3390/E23081050

work page doi:10.3390/e23081050 2021
[21]

A. N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitrechnung; Ergebnisse Der Mathematik (1933); translated as Foundations of Probability; Chelsea Publishing Company: New York, NY, USA, 1950

work page 1933
[22]

von Mises, Probability, Statistics and Truth, 2nd ed.; George Allen and Unwin Ltd.: London, UK, 1957

R. von Mises, Probability, Statistics and Truth, 2nd ed.; George Allen and Unwin Ltd.: London, UK, 1957

work page 1957
[23]

Fuzzy sets,

L. A. Zadeh, "Fuzzy sets," Information and Control, vol. 8, no. 3, pp. 338–53,1965

work page 1965
[24]

Probability measures of fuzzy events,

L. A. Zadeh, "Probability measures of fuzzy events," J. of Mathematical, Analysis and Applications, vol. 23, pp. 421-427, 1962

work page 1962
[25]

Truth and meaning,

D. Davidson, "Truth and meaning," Synthese, vol. 17, no. 3, pp. 304-323, 1967

work page 1967
[26]

Popper, Conjectures and Refutations, 1st ed.; London and New York: Routledge, 2002

K. Popper, Conjectures and Refutations, 1st ed.; London and New York: Routledge, 2002

work page 2002
[27]

Reviewing evolution of learning functions and semantic information measures for understanding deep learning,

C. Lu, "Reviewing evolution of learning functions and semantic information measures for understanding deep learning," Entropy, vol. 25, no. 5. 2023. doi: 10.3390/e25050802

work page doi:10.3390/e25050802 2023
[28]

Representation Learning with Contrastive Predictive Coding

A. V . D. Oord, Y . Li, and O. Vinyals, "Representation Learning with Contrastive Predictive Coding," [Online]. Available: https://arxiv.org/abs/1807.03748

work page internal anchor Pith review Pith/arXiv arXiv
[29]

MINE: Mutual information neural estimation,

M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, S., Y . Bengio, A. Courville, and R. D. Hjelm, "MINE: Mutual information neural estimation," in Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 1-44, https://doi.org/10.48550/arXiv.1801.04062

work page doi:10.48550/arxiv.1801.04062 2018
[30]

On information and Sufficiency,

S. Kullback and R. Leibler, "On information and Sufficiency," Annals of Mathematical Statistics, vol 22, pp. 79–86, 1951

work page 1951
[31]

When Did Bayesian Inference Become

S. E. Fienberg, "When Did Bayesian Inference Become "Bayesian?" Bayesian Analysis, vol. 1, no. 1, pp. 1-37, 2003

work page 2003
[32]

Wikipedia, Copula, [online], Available: https://en.wikipedia.org/wiki/Copula_(probability_theory)

work page
[33]

Mutual information is copula entropy,

J. Ma and Z. Sun, “Mutual information is copula entropy,” Tsinghua Sci. Technol. V ol. 16, no. 1, pp. 51–54, 2011

work page 2011
[34]

Approximate likelihood with proxy variables for parameter estimation in high-dimensional factor copula models,

P. Krupskii and H. Joe, "Approximate likelihood with proxy variables for parameter estimation in high-dimensional factor copula models, " Statistical Papers, vol. 63, pp. 543–569, 2022

work page 2022
[35]

Truthlikeness,

G. Oddie, "Truthlikeness," in The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), Edward N. Zalta, Ed. [online], Available: https://plato.stanford.edu/archives/win2016/entries/truthlikeness/

work page 2016
[36]

T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons: New York, USA, 2006

work page 2006
[37]

Understanding and accelerating EM algorithm's convergence by fair competition principle and rate-verisimilitude function,

C. Lu, "Understanding and accelerating EM algorithm's convergence by fair competition principle and rate-verisimilitude function," [online]. Available: https://arxiv.org/abs/2104.12592

work page arXiv
[38]

Maximum likelihood from incomplete data via the EM algorithm,

A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1–38, 1997

work page 1997
[39]

Deterministic annealing EM algorithm,

N. Ueda and R. Nakano, "Deterministic annealing EM algorithm," Neural Networks, vol. 11, no. 2, pp. 271-282, 1998

work page 1998

[1] [1]

M. J. Beal, Variational algorithms for approximate Bayesian inference. Doctoral thesis (Ph.D), University College London, 2003

work page 2003

[2] [2]

Inferring Parameters and Structure of Latent Variable Models by Variational Bayes

H. Attias, "Inferring parameters and structure of latent variable models by variational Bayes." [Online]. Available: https://arxiv.org/abs/1301.6676

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Variational Bayesian methods,

Wikipedia, " Variational Bayesian methods," [Online]. Available: https://en.wikipedia.org/wiki/Variational_Bayesian_methods

work page

[4] [4]

A view of the EM algorithm that justifies incremental, sparse, and other variants

R. Neal and G. Hinton, "A view of the EM algorithm that justifies incremental, sparse, and other variants." In: Learning in Graphical Models, edited by Michael I. Jordan, PP. 355–368, MIT Press, Cambridge, 1999

work page 1999

[5] [5]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," [Online], Available: https://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

The free-energy principle: a unified brain theory? , volume =

K. Friston, "The free-energy principle: a unified brain theory?" Nat Rev Neurosci, vol. 11, no. 2, pp. 127–138, Feb. 2010, doi: 10.1038/NRN2787

work page doi:10.1038/nrn2787 2010

[7] [7]

Variational Bayes: A report on approaches and applications

M. S. Yellapragada and C. P. Konkimalla, "Variational Bayes: A report on approaches and applications," [Online]. Available: https://arxiv.org/abs/1905.10744

work page internal anchor Pith review Pith/arXiv arXiv 1905

[8] [8]

Information-theoretic regularization for learning global features by sequential V AE,

K. Akuzawa, Y . Iwasawa, and Y . Matsuo, "Information-theoretic regularization for learning global features by sequential V AE," Mach Learn, vol. 110, no. 8, pp. 2239-2266, 2021, doi: 10.1007/s10994-021-06032-4

work page doi:10.1007/s10994-021-06032-4 2021

[9] [9]

Robust Multi-agent Communication with Graph Information Bottleneck Optimization,

S. Ding, W. Du, L. Ding, J. Zhang, L. Guo, and B. An, "Robust Multi-agent Communication with Graph Information Bottleneck Optimization," IEEE Trans Pattern Anal Mach Intell, vol. 46, no. 6, pp. 3096-3107, 2023, doi: 10.1109/TPAMI.2023.3337534

work page doi:10.1109/tpami.2023.3337534 2023

[10] [10]

A mathematical theory of communication,

C. E. Shannon, "A mathematical theory of communication," Bell Syst. Tech. J., vol. 27, 379–429, 623–656, 1948

work page 1948

[11] [11]

Coding theorems for a discrete source with a fidelity criterion,

C. E. Shannon, "Coding theorems for a discrete source with a fidelity criterion," IRE Nat. Conv. Rec. vol. 4, 142–163, 1959

work page 1959

[12] [12]

A generalization of Shannon's information theory,

C. G. Lu, "A generalization of Shannon's information theory," International Journal of General System, vol. 28, no. 6, pp. 453–490, 1999

work page 1999

[13] [13]

Semantic information G theory and logical Bayesian inference for machine learning,

C. Lu, "Semantic information G theory and logical Bayesian inference for machine learning," Information, vol. 10, no. 8, p. 261, Aug. 2019, doi: 10.3390/INFO10080261

work page doi:10.3390/info10080261 2019

[14] [14]

Berger, Rate Distortion Theory, Enklewood Cliffs, NJ, USA：Prentice-Hall, 1971

T. Berger, Rate Distortion Theory, Enklewood Cliffs, NJ, USA：Prentice-Hall, 1971

work page 1971

[15] [15]

Lossy source coding,

T. Berger and J. D. Gibson, "Lossy source coding," IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2693–2723, 1998

work page 1998

[16] [16]

J. P. Zhou et al., Fundamentals of information theory, Beijing, China: People's Posts and Telecommunications Press, 1983

work page 1983

[17] [17]

Meanings of generalized entropy and generalized mutual information for coding,

C. Lu, "Meanings of generalized entropy and generalized mutual information for coding," (Chinese:广义熵和广义互信息的编码意义), J. of China Institute of Communication(通信学报), vol. 5, no. 6, pp. 37-44, June 1994

work page 1994

[18] [18]

Lu, A Generalized Information Theory (Chinese: 广义信息论), Hefei, China: China Science and Technology University Press(中国科学技术大学出版）, 1993

C. Lu, A Generalized Information Theory (Chinese: 广义信息论), Hefei, China: China Science and Technology University Press(中国科学技术大学出版）, 1993. ISBN 7-312-00501-2

work page 1993

[19] [19]

The P–T probability framework for semantic communication, falsification, confirmation, and Bayesian reasoning,

C. Lu, "The P–T probability framework for semantic communication, falsification, confirmation, and Bayesian reasoning," Philosophies, vol. 5, no. 4, p. 25, Oct. 2020, doi: 10.3390/philosophies5040025

work page doi:10.3390/philosophies5040025 2020

[20] [20]

Using the semantic information G measure to explain and extend rate-distortion functions and maximum entropy distributions,

C. Lu, "Using the semantic information G measure to explain and extend rate-distortion functions and maximum entropy distributions," Entropy, vol. 23, no. 8, Aug. 2021, doi: 10.3390/E23081050

work page doi:10.3390/e23081050 2021

[21] [21]

A. N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitrechnung; Ergebnisse Der Mathematik (1933); translated as Foundations of Probability; Chelsea Publishing Company: New York, NY, USA, 1950

work page 1933

[22] [22]

von Mises, Probability, Statistics and Truth, 2nd ed.; George Allen and Unwin Ltd.: London, UK, 1957

R. von Mises, Probability, Statistics and Truth, 2nd ed.; George Allen and Unwin Ltd.: London, UK, 1957

work page 1957

[23] [23]

Fuzzy sets,

L. A. Zadeh, "Fuzzy sets," Information and Control, vol. 8, no. 3, pp. 338–53,1965

work page 1965

[24] [24]

Probability measures of fuzzy events,

L. A. Zadeh, "Probability measures of fuzzy events," J. of Mathematical, Analysis and Applications, vol. 23, pp. 421-427, 1962

work page 1962

[25] [25]

Truth and meaning,

D. Davidson, "Truth and meaning," Synthese, vol. 17, no. 3, pp. 304-323, 1967

work page 1967

[26] [26]

Popper, Conjectures and Refutations, 1st ed.; London and New York: Routledge, 2002

K. Popper, Conjectures and Refutations, 1st ed.; London and New York: Routledge, 2002

work page 2002

[27] [27]

Reviewing evolution of learning functions and semantic information measures for understanding deep learning,

C. Lu, "Reviewing evolution of learning functions and semantic information measures for understanding deep learning," Entropy, vol. 25, no. 5. 2023. doi: 10.3390/e25050802

work page doi:10.3390/e25050802 2023

[28] [28]

Representation Learning with Contrastive Predictive Coding

A. V . D. Oord, Y . Li, and O. Vinyals, "Representation Learning with Contrastive Predictive Coding," [Online]. Available: https://arxiv.org/abs/1807.03748

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

MINE: Mutual information neural estimation,

M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, S., Y . Bengio, A. Courville, and R. D. Hjelm, "MINE: Mutual information neural estimation," in Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 1-44, https://doi.org/10.48550/arXiv.1801.04062

work page doi:10.48550/arxiv.1801.04062 2018

[30] [30]

On information and Sufficiency,

S. Kullback and R. Leibler, "On information and Sufficiency," Annals of Mathematical Statistics, vol 22, pp. 79–86, 1951

work page 1951

[31] [31]

When Did Bayesian Inference Become

S. E. Fienberg, "When Did Bayesian Inference Become "Bayesian?" Bayesian Analysis, vol. 1, no. 1, pp. 1-37, 2003

work page 2003

[32] [32]

Wikipedia, Copula, [online], Available: https://en.wikipedia.org/wiki/Copula_(probability_theory)

work page

[33] [33]

Mutual information is copula entropy,

J. Ma and Z. Sun, “Mutual information is copula entropy,” Tsinghua Sci. Technol. V ol. 16, no. 1, pp. 51–54, 2011

work page 2011

[34] [34]

Approximate likelihood with proxy variables for parameter estimation in high-dimensional factor copula models,

P. Krupskii and H. Joe, "Approximate likelihood with proxy variables for parameter estimation in high-dimensional factor copula models, " Statistical Papers, vol. 63, pp. 543–569, 2022

work page 2022

[35] [35]

Truthlikeness,

G. Oddie, "Truthlikeness," in The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), Edward N. Zalta, Ed. [online], Available: https://plato.stanford.edu/archives/win2016/entries/truthlikeness/

work page 2016

[36] [36]

T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons: New York, USA, 2006

work page 2006

[37] [37]

Understanding and accelerating EM algorithm's convergence by fair competition principle and rate-verisimilitude function,

C. Lu, "Understanding and accelerating EM algorithm's convergence by fair competition principle and rate-verisimilitude function," [online]. Available: https://arxiv.org/abs/2104.12592

work page arXiv

[38] [38]

Maximum likelihood from incomplete data via the EM algorithm,

A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1–38, 1997

work page 1997

[39] [39]

Deterministic annealing EM algorithm,

N. Ueda and R. Nakano, "Deterministic annealing EM algorithm," Neural Networks, vol. 11, no. 2, pp. 271-282, 1998

work page 1998