pith. machine review for the scientific record. sign in
def definition def or abbrev high

englishLetterEntropy

show as:
view Lean formalization →

The declaration supplies the conventional numerical value 4.2 bits per letter for the single-letter entropy of English text. Information theorists working on source coding and Recognition Science researchers bounding J-cost for natural language data would cite it when applying Shannon limits to real-world alphabets. It is introduced as a fixed real constant without derivation or proof steps.

claimThe single-letter entropy of English text is defined to be $4.2$ bits per symbol.

background

The Information.Compression module derives fundamental limits on data compression from J-cost, the Recognition Science cost function that penalizes disorganized representations. Entropy is defined via Shannon's formula $H(X) = -∑ p(x) log₂ p(x)$, and the source coding theorem asserts that average code length cannot fall below this value. The supplied definition gives the empirical single-letter entropy for English, while the module doc notes that the entropy rate accounting for correlations drops to roughly 1.0-1.5 bits per letter.

proof idea

This is a direct definition that assigns the constant 4.2 to the single-letter entropy of English text.

why it matters in Recognition Science

It anchors concrete numerical discussion of compression limits inside the Recognition Science framework by supplying a standard value for English entropy. The module links this value to the principle that entropy equals the minimum J-cost for faithful representation and that compression is J-cost minimization. No downstream theorems are recorded, so the definition serves as a reference point rather than a lemma in a larger chain.

scope and limits

formal statement (Lean)

 213noncomputable def englishLetterEntropy : ℝ := 4.2  -- bits