data_processing_inequality
plain-language theorem explainer
The data processing inequality asserts that Kullback-Leibler divergence between pushed-forward distributions is at most the original divergence. Researchers establishing the second law in information-theoretic thermodynamics cite this when showing free energy decreases under coarse-graining. The proof applies the fiberwise log-sum inequality to bound each coarse term, interchanges summation order, and concludes via linear arithmetic.
Claim. Let $p,q$ be strictly positive probability distributions on a finite set $Ω$ with $∑p=1=∑q$, and let $φ:Ω→Ω'$ be any function. Then the relative entropy satisfies $D(φ_#p∥φ_#q)≤D(p∥q)$, where $φ_#$ denotes the push-forward that aggregates mass over preimages.
background
The Free Energy Monotonicity module proves Recognition Free Energy is non-increasing under RS dynamics such as coarse-graining, supplying the Recognition Science version of the second law. Push-forward aggregates probability mass over fibers of $φ$, while KL divergence is the expected log-ratio of the two densities. The local setting quotes the module insight that monotonicity of free energy is equivalent to the data processing inequality for KL divergence, with references to Cover & Thomas and Recognition-Science-Full-Theory.txt.
proof idea
The tactic proof first derives non-negativity of both distributions from the strict positivity hypotheses. For each output state it cases on nonempty fibers and invokes log_sum_inequality_fiber to bound the coarse KL term by the summed fine-grained contributions; empty fibers contribute zero by direct sum evaluation. The double sum is commuted via sum_comm and collapsed with sum_ite_eq to recover the original KL expression, after which linarith closes the inequality.
why it matters
This theorem supplies the information-theoretic core for coarse_graining_decreases_free_energy, which states that reducing state resolution cannot increase Recognition Free Energy and is the statistical mechanics version of the second law. It underwrites the arrow of time via dF_R/dt≤0 in the module and appears in the status list as a scaffold toward h_theorem_recognition. The result aligns with the Recognition Composition Law and phi-ladder structure without invoking specific J-cost details.
Switch to Lean above to see the machine-checked source, dependencies, and usage graph.