Estimating Mutual Information

Alexander Kraskov; Harald Stoegbauer; Peter Grassberger

arxiv: cond-mat/0305641 · v1 · submitted 2003-05-28 · ❄️ cond-mat.stat-mech · cond-mat.dis-nn

Estimating Mutual Information

Alexander Kraskov , Harald Stoegbauer , Peter Grassberger This is my paper

classification ❄️ cond-mat.stat-mech cond-mat.dis-nn

keywords estimatorsalgorithmsbiasdatadensitydistributionsentropyestimates

0 comments

read the original abstract

We present two classes of improved estimators for mutual information $M(X,Y)$, from samples of random points distributed according to some joint probability density $\mu(x,y)$. In contrast to conventional estimators based on binnings, they are based on entropy estimates from $k$-nearest neighbour distances. This means that they are data efficient (with $k=1$ we resolve structures down to the smallest possible scales), adaptive (the resolution is higher where data are more numerous), and have minimal bias. Indeed, the bias of the underlying entropy estimates is mainly due to non-uniformity of the density at the smallest resolved scale, giving typically systematic errors which scale as functions of $k/N$ for $N$ points. Numerically, we find that both families become {\it exact} for independent distributions, i.e. the estimator $\hat M(X,Y)$ vanishes (up to statistical fluctuations) if $\mu(x,y) = \mu(x) \mu(y)$. This holds for all tested marginal distributions and for all dimensions of $x$ and $y$. In addition, we give estimators for redundancies between more than 2 random variables. We compare our algorithms in detail with existing algorithms. Finally, we demonstrate the usefulness of our estimators for assessing the actual independence of components obtained from independent component analysis (ICA), for improving ICA, and for estimating the reliability of blind source separation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Task Relevance Is Not Local Replaceability: A Two-Axis View of Channel Information
cs.CV 2026-05 unverdicted novelty 7.0

Channel importance splits into task relevance and local replaceability; local-axis metrics predict safe removal under pruning better than target-axis metrics across multiple CNNs and datasets.
Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection
cs.LG 2026-05 unverdicted novelty 7.0

MP-IB uses an 8x information asymmetry via FP16 trait heads and INT4 state heads to disentangle speaker identity from agitation in voice biomarkers, outperforming larger models on edge devices with low latency and sup...
Quantum Causal Discovery via Amplitude Estimation of Kullback-Leibler Divergence
quant-ph 2026-04 unverdicted novelty 7.0

QKLA achieves quadratic query-complexity improvement for clipped KL estimation, yielding 2.7-7.4x fewer oracle queries than classical methods when embedded in the PC causal-discovery algorithm at moderate precision.
The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
cs.AI 2026-04 unverdicted novelty 7.0

The Accountability Incompleteness Theorem demonstrates that human-AI collectives above the Accountability Horizon with feedback cycles cannot simultaneously meet attributability, foreseeability, non-vacuity, and compl...
Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations
cs.AI 2026-05 unverdicted novelty 6.0

Spectral partitioning on pairwise mutual-information graphs from agent hidden states detects representational coalitions that behavioral measures miss in multi-agent AI.
Feature Selection via Mutual Information: New Theoretical Insights
cs.LG 2019-07 unverdicted novelty 6.0

Conditional mutual information bounds ideal prediction errors for feature subsets and supplies a stopping condition for greedy selection algorithms.
Inferring stellar metallicity and elemental abundances from kinematic and spectroscopic data using machine learning -- Implications for exoplanet host stars
astro-ph.EP 2026-05 unverdicted novelty 5.0

ML regressors trained on APOGEE DR17 red giants predict C, O, Mg, Si abundances from kinematics and [Fe/H] more accurately than [Fe/H] baseline, with external validation on HARPS FGK dwarfs and reproduction of Galacti...
Data-Driven Reduction of Fault Location Errors in Onshore Wind Farm Collectors
eess.SY 2025-11 unverdicted novelty 4.0

A Gated Residual Network correction model reduces fault location error by 76% in simulated onshore wind farm collector networks compared to state-of-the-art methods.
Information-Theoretic Measures in AI: A Practical Decision Guide
cs.AI 2026-04 unverdicted novelty 3.0

A practical guide that organizes seven IT measures around three questions each—what it answers in AI, suitable estimators, and dangerous misuses—complete with flowchart, table, and worked examples.