arxiv: 2604.24662 · v1 · submitted 2026-04-27 · ⚛️ physics.data-an · cs.AI· cs.IT· math.IT

Recognition: unknown

Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data

K. Michael Martini , Eslam Abdelaleem , Paarth Gulati , Ilya Nemenman

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:59 UTC · model grok-4.3

classification ⚛️ physics.data-an cs.AIcs.ITmath.IT

keywords information bottleneckphase space reconstructiondynamical systemsunsupervised learningtime-series analysislatent representationspredictive mutual informationexperimental video data

0 comments

The pith

DySIB recovers the two-dimensional phase space of a pendulum from high-dimensional video data by maximizing predictive mutual information in latent space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DySIB as a dynamical symmetric information bottleneck method to learn low-dimensional representations of time-series data. It does so by maximizing the predictive mutual information between past and future observation windows while penalizing the complexity of the representation, all without reconstructing the original observations. When applied to experimental video footage of a physical pendulum whose true state space is known, the method produces a two-dimensional latent representation whose dimensionality, topology, and geometry match those of the pendulum phase space, with coordinates that align smoothly with angle and angular velocity. A reader would care because extracting interpretable dynamical variables from raw high-dimensional measurements without supervision remains a central unsolved problem across the physical sciences.

Core claim

DySIB recovers a two-dimensional representation that matches the dimensionality, topology, and geometry of the pendulum phase space, with the learned coordinates aligning smoothly with the canonical angle and angular velocity. Hyperparameters are set self-consistently by the data, and the entire procedure operates in latent space to demonstrate that predictive information alone can yield interpretable dynamical coordinates directly from high-dimensional observations.

What carries the argument

The Dynamical Symmetric Information Bottleneck (DySIB), an objective that maximizes predictive mutual information between past and future latent windows while penalizing representation complexity.

If this is right

The method recovers interpretable dynamical state variables from high-dimensional time-series data without any reconstruction loss or external labels.
Hyperparameters of the encoder and bottleneck can be chosen self-consistently from the data itself.
Success on a well-characterized experimental system shows that predictive information in latent space is enough to extract physically meaningful coordinates.
The approach avoids direct reconstruction of observations, focusing computation entirely on the latent dynamical representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same objective works on systems whose phase space dimension is unknown in advance, it could serve as a general tool for discovering hidden state variables in experimental recordings.
Applying the method to time series from chaotic or high-dimensional attractors would test whether it can recover non-trivial topology and geometry without prior knowledge.
One could combine DySIB coordinates with downstream control algorithms to perform model-free stabilization or prediction directly from video.
Extending the bottleneck to include multiple future horizons might improve robustness when the underlying dynamics contain multiple timescales.

Load-bearing premise

Maximizing predictive mutual information between past and future observation windows in latent space is sufficient to recover the true underlying dynamical state variables without additional supervision or reconstruction.

What would settle it

Running DySIB on the pendulum video dataset and verifying whether the resulting two-dimensional latent coordinates fail to vary smoothly with independently measured angle and angular velocity or produce a non-cylindrical topology.

Figures

Figures reproduced from arXiv: 2604.24662 by Eslam Abdelaleem, Ilya Nemenman, K. Michael Martini, Paarth Gulati.

**Figure 1.** Figure 1: FIG. 1 view at source ↗

**Figure 2.** Figure 2: FIG. 2 view at source ↗

**Figure 3.** Figure 3: FIG. 3 view at source ↗

**Figure 4.** Figure 4: FIG. 4 view at source ↗

**Figure 5.** Figure 5: FIG. 5 view at source ↗

**Figure 6.** Figure 6: FIG. 6 view at source ↗

**Figure 7.** Figure 7: FIG. 7 view at source ↗

read the original abstract

Identifying the dynamical state variables of a system from high-dimensional observations is a central problem across physical sciences. The challenge is that the state variables are not directly observable and must be inferred from raw high-dimensional data without supervision. Here we introduce DySIB (Dynamical Symmetric Information Bottleneck) as a method to learn low-dimensional representations of time-series data by maximizing predictive mutual information between past and future observation windows while penalizing representation complexity. This objective operates entirely in latent space and avoids reconstruction of the observations. We apply DySIB to an experimental video dataset of a physical pendulum, where the underlying state space is known. The method, with hyperparameters of the learning architecture set self-consistently by the data, recovers a two-dimensional representation that matches the dimensionality, topology, and geometry of the pendulum phase space, with the learned coordinates aligning smoothly with the canonical angle and angular velocity. These results demonstrate, on a well-characterized experimental system, that predictive information in latent space can be used to recover interpretable dynamical coordinates directly from high-dimensional data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DySIB recovers the right dimension and rough geometry for the pendulum phase space from video, but the smooth alignment with canonical angle and velocity is not clearly forced by the information-bottleneck objective alone.

read the letter

The new piece here is DySIB: an information-bottleneck objective that works entirely in latent space by maximizing predictive mutual information between past and future windows while adding a complexity penalty, with no reconstruction term. They run it on real experimental pendulum video and get a two-dimensional latent representation whose dimension, topology, and overall geometry line up with the known phase space, and the coordinates track angle and angular velocity reasonably well. Hyperparameters are chosen in a data-driven way rather than by hand. That is a concrete, unsupervised demonstration on messy experimental input, which is still rare enough to be useful for people trying to extract dynamical coordinates from high-dimensional time series like video or sensor arrays. The method avoids some of the usual reconstruction headaches and shows that pure predictive information can be enough for this system. The soft spot is the one the stress-test note raises. The objective is preserved under any smooth invertible re-labeling of the latent variables, so many different coordinate systems are equally optimal. The abstract reports smooth alignment with the canonical pair, but does not spell out a symmetry-breaking term, a uniqueness argument, or quantitative checks that would rule out architecture bias, initialization effects, or post-training inspection. If the full paper only shows qualitative plots without alignment metrics or ablation on the coordinate choice, that part of the claim rests on weaker ground than the dimensionality and topology recovery. This paper is aimed at researchers in data-driven dynamical systems who already know the information-bottleneck literature and want to see it applied to experimental video without reconstruction. A reader working on phase-space inference from raw observations will get a practical example and a new objective to try. It is worth sending to peer review: the experimental result is grounded enough and the method is simple enough that referees can check the details and ask for the missing controls on coordinate uniqueness.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DySIB, a dynamical symmetric information bottleneck method for learning low-dimensional latent representations of time-series data. The approach maximizes predictive mutual information between past and future observation windows in latent space while applying a complexity penalty, operating without direct reconstruction of the high-dimensional observations. Applied to an experimental video dataset of a physical pendulum with known underlying state space, the method (with data-driven hyperparameter selection) is claimed to recover a two-dimensional representation whose dimensionality, topology, and geometry match the pendulum phase space, with the learned coordinates aligning smoothly to the canonical angle and angular velocity.

Significance. If the central claims hold under scrutiny, the work would represent a meaningful contribution to unsupervised extraction of interpretable dynamical coordinates from high-dimensional experimental data, with potential applications across physics and related fields. The avoidance of reconstruction and the use of self-consistent hyperparameter setting are positive features. However, the invariance of the predictive mutual information objective to diffeomorphisms of the latent variables means that specific smooth alignment with canonical coordinates is not automatically guaranteed by the information-bottleneck principle, which weakens the interpretability claim unless additional mechanisms are demonstrated.

major comments (2)

[Abstract] Abstract and results description: the claim that the learned coordinates align smoothly with the canonical angle and angular velocity is load-bearing for the interpretability result, yet the objective (maximizing I(Z_past; Z_future) subject to a complexity penalty) is preserved under any invertible reparametrization of the latent variables. No mechanism (symmetry-breaking term, canonicalization step, or uniqueness argument) is identified in the provided description that would select this particular gauge over other equally optimal coordinate systems; the observed alignment could therefore stem from architectural biases, initialization, or post-hoc choices rather than the DySIB objective itself.
[Results] Methods and results: the abstract reports successful recovery but provides no quantitative metrics (e.g., alignment error between learned and canonical coordinates, topological invariants such as winding numbers, or geometry measures such as curvature or metric distortion). Without these, or explicit external benchmarks independent of the learned representation, it is impossible to verify that the recovered 2D space matches the true phase space beyond qualitative visual inspection.

minor comments (2)

[Abstract] The acronym DySIB is introduced without an immediate expansion in the abstract; this should be corrected for clarity.
[Methods] Notation for the latent variables (Z_past, Z_future) and the precise form of the complexity penalty should be defined explicitly at first use in the methods section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major comment below and have revised the manuscript to clarify the claims and strengthen the supporting evidence.

read point-by-point responses

Referee: [Abstract] Abstract and results description: the claim that the learned coordinates align smoothly with the canonical angle and angular velocity is load-bearing for the interpretability result, yet the objective (maximizing I(Z_past; Z_future) subject to a complexity penalty) is preserved under any invertible reparametrization of the latent variables. No mechanism (symmetry-breaking term, canonicalization step, or uniqueness argument) is identified in the provided description that would select this particular gauge over other equally optimal coordinate systems; the observed alignment could therefore stem from architectural biases, initialization, or post-hoc choices rather than the DySIB objective itself.

Authors: We agree that the predictive mutual information objective is invariant under diffeomorphisms of the latent variables, and the original manuscript does not provide an explicit symmetry-breaking term, canonicalization procedure, or uniqueness theorem that would guarantee selection of the canonical gauge. The reported alignment is an empirical outcome of the training procedure. In the revised manuscript we have added a dedicated paragraph in the methods section acknowledging this invariance and discussing how the observed alignment arises consistently from the combination of the convolutional architecture, random initialization, and data-driven hyperparameter selection. We have also included results from ten independent training runs with different random seeds, showing that the alignment with angle and angular velocity is robust (with alignment error remaining below a stated threshold after optimal affine matching). revision: yes
Referee: [Results] Methods and results: the abstract reports successful recovery but provides no quantitative metrics (e.g., alignment error between learned and canonical coordinates, topological invariants such as winding numbers, or geometry measures such as curvature or metric distortion). Without these, or explicit external benchmarks independent of the learned representation, it is impossible to verify that the recovered 2D space matches the true phase space beyond qualitative visual inspection.

Authors: We concur that quantitative metrics are necessary to move beyond qualitative visual assessment. The revised manuscript now includes three explicit quantitative measures: (i) the root-mean-square alignment error between the learned coordinates and the canonical angle/angular-velocity after determining the optimal affine transformation, (ii) the winding numbers of closed orbits in the latent space to confirm topological equivalence, and (iii) a local geometry comparison that quantifies metric distortion relative to the known pendulum phase-space metric. These statistics are reported in a new table and are computed on held-out test trajectories independent of the training data. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical demonstration of DySIB

full rationale

The paper defines DySIB via maximization of predictive mutual information between past and future latent windows (with complexity penalty) and applies it to experimental pendulum video data. The central result is an empirical match between the learned 2D latent representation and the known pendulum phase space (dimension, topology, geometry, and smooth alignment with angle/velocity). This match is validated against an external, independently known ground truth rather than being derived from the training objective by construction. Hyperparameter selection is described as self-consistent with the data, but this is a standard model-selection step and does not reduce the reported alignment to a tautology. No load-bearing self-citations, uniqueness theorems, or ansatzes that presuppose the target coordinates appear in the abstract or claimed derivation; the method remains falsifiable against the physical system.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that predictive mutual information in latent space captures the true state variables; no explicit free parameters, axioms, or invented entities are stated in the abstract, but the self-consistent hyperparameter setting implies data-dependent choices whose independence from the target result is unverified.

pith-pipeline@v0.9.0 · 5498 in / 1178 out tokens · 21174 ms · 2026-05-07T16:59:02.162121+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

88 extracted references · 16 canonical work pages · 6 internal anchors

[1]

At each timet, we construct a pair of variables by taking consecutive segments of this trajectory in the observation space

Delayed embeddings and the shared encoder We start with observations of a dynamical system as a sequence of high-dimensional frames (for example, image frames of a video){F 1, F2,· · · }, with each frameF t ∈ RD, whereDis the dimensionality of the observation space. At each timet, we construct a pair of variables by taking consecutive segments of this tra...
[2]

A. L. Hodgkin and A. F. Huxley, A quantitative descrip- tion of membrane current and its application to conduc- tion and excitation in nerve, The Journal of Physiology 117, 500 (1952)

1952
[3]

Toner and Y

J. Toner and Y. Tu, Long-range order in a two- dimensional dynamical XY model: how birds fly to- gether, Physical Review Letters75, 4326 (1995)

1995
[4]

Goldenfeld,Lectures on Phase Transitions and the Renormalization Group(CRC Press, 2018)

N. Goldenfeld,Lectures on Phase Transitions and the Renormalization Group(CRC Press, 2018)

2018
[5]

Cavagna, L

A. Cavagna, L. Di Carlo, I. Giardina, T. S. Grigera, S. Melillo, L. Parisi, G. Pisegna, and M. Scandolo, Natu- ral swarms in 3.99 dimensions, Nature Physics19, 1043 (2023)

2023
[6]

B. C. Daniels, W. S. Ryu, and I. Nemenman, Automated, predictive, and interpretable inference ofCaenorhabdi- tis elegansescape dynamics, Proceedings of the National Academy of Sciences116, 7226 (2019)

2019
[7]

Bapst, T

V. Bapst, T. Keck, A. Grabska-Barwinska, C. Donner, E. D. Cubuk, S. S. Schoenholz, A. Obika, A. W. R. Nel- son, T. Back, D. Hassabis, and P. Kohli, Unveiling the predictive power of static structure in glassy systems, Na- ture Physics16, 448 (2020)

2020
[8]

M. S. Schmitt, J. Colen, S. Sala, J. Devany, S. Seethara- man, A. Caillier, M. L. Gardel, P. W. Oakes, and V. Vitelli, Machine learning interpretable models of cell mechanics from protein images, Cell187, 481 (2024)

2024
[9]

W. Yu, E. Abdelaleem, I. Nemenman, and J. C. Burton, Physics-tailored machine learning reveals unexpected physics in dusty plasmas, Proceedings of the National Academy of Sciences122, e2505725122 (2025)

2025
[10]

G. J. Stephens, B. Johnson-Kerner, W. Bialek, and W. S. Ryu, Dimensionality and dynamics in the behavior of C. elegans, PLOS Computational Biology4, e1000028 (2008)

2008
[11]

J. P. Cunningham and B. M. Yu, Dimensionality re- duction for large-scale neural recordings, Nature Neuro- science17, 1500 (2014)

2014
[12]

S. S. Schoenholz, E. D. Cubuk, D. M. Sussman, E. Kaxi- ras, and A. J. Liu, A structural approach to relaxation in glassy liquids, Nature Physics12, 469 (2016)

2016
[13]

No´ e and C

F. No´ e and C. Clementi, Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods, Current Opinion in Structural Biology43, 141 (2017)

2017
[14]

E. D. Cubuk, R. J. S. Ivancic, S. S. Schoenholz, D. J. Strickland, A. Basu, Z. S. Davidson, J. Fontaine, J. L. Hor, Y.-R. Huang, Y. Jiang, N. C. Keim, K. D. Koshi- gan, J. A. Lefever, T. Liu, X.-G. Ma, D. J. Magagnosc, E. Morrow, C. P. Ortiz, J. M. Rieser, A. Shavit, T. Still, Y. Xu, Y. Zhang, K. N. Nordstrom, P. E. Arratia, R. W. Carpick, D. J. Durian, Z...

2017
[15]

Pandarinath, D

C. Pandarinath, D. J. O’Shea, J. Collins, R. Jozefow- icz, S. D. Stavisky, J. C. Kao, E. M. Trautmann, M. T. Kaufman, S. I. Ryu, L. R. Hochberg, J. M. Henderson, K. V. Shenoy, L. F. Abbott, and D. Sussillo, Inferring single-trial neural population dynamics using sequential auto-encoders, Nature Methods15, 805 (2018). 11

2018
[16]

Ahamed, A

T. Ahamed, A. C. Costa, and G. J. Stephens, Capturing the continuous complexity of behaviour inCaenorhabditis elegans, Nature Physics17, 275 (2021)

2021
[17]

Colen, M

J. Colen, M. Han, R. Zhang, S. A. Redford, L. M. Lemma, L. Morgan, P. V. Ruijgrok, R. Adkins, Z. Bryant, Z. Dogic, M. L. Gardel, J. J. de Pablo, and V. Vitelli, Machine learning active-nematic hydrodynam- ics, Proceedings of the National Academy of Sciences 118, e2016708118 (2021)

2021
[18]

Supekar, B

R. Supekar, B. Song, A. Hastewell, G. P. T. Choi, A. Mi- etke, and J. Dunkel, Learning hydrodynamic equations for active matter from particle simulations and experi- ments, Proceedings of the National Academy of Sciences 120, e2206994120 (2023)

2023
[19]

Schmidt and H

M. Schmidt and H. Lipson, Distilling free-form natural laws from experimental data, Science324, 81 (2009)

2009
[20]

B. C. Daniels and I. Nemenman, Automated adaptive inference of phenomenological dynamical models, Nature Communications6, 8133 (2015)

2015
[21]

S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discover- ing governing equations from data by sparse identifica- tion of nonlinear dynamical systems, Proceedings of the National Academy of Sciences113, 3932 (2016)

2016
[22]

N. M. Mangan, T. Askham, S. L. Brunton, J. N. Kutz, and J. L. Proctor, Model selection for hybrid dynamical systems via sparse regression, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sci- ences475, 20180534 (2019)

2019
[23]

Frishman and P

A. Frishman and P. Ronceray, Learning force fields from stochastic trajectories, Physical Review X10, 021009 (2020)

2020
[24]

P. A. K. Reinbold, L. M. Kageorge, M. F. Schatz, and R. O. Grigoriev, Robust learning from noisy, incomplete, high-dimensional experimental data via physically con- strained symbolic regression, Nature Communications 12, 3219 (2021)

2021
[25]

D. R. Gurevich, M. R. Golden, P. A. K. Reinbold, and R. O. Grigoriev, Learning fluid physics from highly tur- bulent data using sparse physics-informed discovery of empirical relations (SPIDER), Journal of Fluid Mechan- ics996, A25 (2024)

2024
[26]

Lusch, J

B. Lusch, J. N. Kutz, and S. L. Brunton, Deep learning for universal linear embeddings of nonlinear dynamics, Nature Communications9, 4950 (2018)

2018
[27]

Champion, B

K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton, Data-driven discovery of coordinates and governing equa- tions, Proceedings of the National Academy of Sciences 116, 22445 (2019)

2019
[28]

A. J. Linot and M. D. Graham, Deep learning to discover and predict dynamics on an inertial manifold, Physical Review E101, 062209 (2020)

2020
[29]

J. Page, M. P. Brenner, and R. R. Kerswell, Revealing the state space of turbulence using machine learning, Physi- cal Review Fluids6, 034402 (2021)

2021
[30]

B. Chen, K. Huang, S. Raghupathi, I. Chandratreya, Q. Du, and H. Lipson, Automated discovery of funda- mental variables hidden in experimental data, Nature Computational Science2, 433 (2022)

2022
[31]

P. R. Vlachas, G. Arampatzis, C. Uhler, and P. Koumoutsakos, Multiscale simulations of complex sys- tems by learning their effective dynamics, Nature Ma- chine Intelligence4, 359 (2022)

2022
[32]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas- try, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D....

work page internal anchor Pith review arXiv 2020
[33]

J. Lai, A. Bao, and W. Gilpin, Panda: A pretrained forecast model for chaotic dynamics, arXiv:2505.13755 [cs.LG] (2025), arXiv:2505.13755 [cs.LG]

work page arXiv 2025
[34]

R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirns- berger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, A. Merose, S. Hoyer, G. Hol- land, O. Vinyals, J. Stott, A. Pritzel, S. Mohamed, and P. Battaglia, Learning skillful medium-range global weather forecasting, Science382, 1416 (2023)

2023
[35]

Abdelaleem, A

E. Abdelaleem, A. Roman, K. M. Martini, and I. Ne- menman, Simultaneous dimensionality reduction: A data efficient approach for multimodal representations learn- ing, Transactions on Machine Learning Research (2024), arXiv:2310.04458

work page arXiv 2024
[36]

Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data

A. Swain, S. A. Ridout, and I. Nemenman, Better together: Cross and joint covariances enhance signal detectability in undersampled data, arXiv:2507.22207 [cond-mat.dis-nn] (2025), arXiv:2507.22207 [cond- mat.dis-nn]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Mergny and L

P. Mergny and L. Zdeborov´ a, Spectral thresholds in cor- related spiked models and fundamental limits of partial least squares, inProceedings of the 29th International Conference on Artificial Intelligence and Statistics (AIS- TATS)(2026) arXiv:2510.17561 [math.ST]

work page arXiv 2026
[38]

T. Z. Baharav, P. B. Nicol, R. A. Irizarry, and R. Ma, Stacked SVD or SVD stacked? A random matrix the- ory perspective on data integration, arXiv:2507.22170 [stat.ML] (2025), arXiv:2507.22170 [stat.ML]

work page arXiv 2025
[39]

Wiskott and T

L. Wiskott and T. J. Sejnowski, Slow feature analysis: Unsupervised learning of invariances, Neural Computa- tion14, 715 (2002)

2002
[40]

Lejepa: Provable and scalable self-supervised learning without the heuristics, 2025

R. Balestriero and Y. LeCun, LeJEPA: Provable and scalable self-supervised learning without the heuris- tics, arXiv:2511.08544 [cs.LG] (2025), arXiv:2511.08544 [cs.LG]

work page arXiv 2025
[41]

L. Maes, Q. Le Lidec, D. Scieur, Y. LeCun, and R. Balestriero, LeWorldModel: Stable end-to- end joint-embedding predictive architecture from pix- els, arXiv:2603.19312 [cs.LG] (2026), arXiv:2603.19312 [cs.LG]

work page internal anchor Pith review arXiv 2026
[42]

K. M. Martini and I. Nemenman, Data efficiency, dimen- sionality reduction, and the generalized symmetric infor- mation bottleneck, Neural Computation36, 1353 (2024)

2024
[43]

Joint embedding vs reconstruction: Provable benefits of latent space prediction for self supervised learning,

H. Van Assel, M. Ibrahim, T. Biancalani, A. Regev, and R. Balestriero, Joint embedding vs reconstruction: Provable benefits of latent space prediction for self- supervised learning, arXiv:2505.12477 [cs.LG] (2025), arXiv:2505.12477 [cs.LG]

work page arXiv 2025
[44]

Representation Learning with Contrastive Predictive Coding

A. van den Oord, Y. Li, and O. Vinyals, Rep- resentation learning with contrastive predictive cod- ing, arXiv:1807.03748 [cs.LG] (2018), arXiv:1807.03748 [cs.LG]

work page internal anchor Pith review arXiv 2018
[45]

M. S. Schmitt, M. Koch-Janusz, M. Fruchart, D. S. Seara, M. Rust, and V. Vitelli, Information the- ory for dimensionality reduction in dynamical sys- tems, arXiv:2312.06608 [cond-mat.stat-mech] (2023), 12 arXiv:2312.06608 [cond-mat.stat-mech]

work page arXiv 2023
[46]

M. S. Schmitt, M. Koch-Janusz, M. Fruchart, D. S. Seara, M. Rust, and V. Vitelli, Infor- mation theory for data-driven model reduction in physics and biology, bioRxiv:2024.04.19.590281 (2024), bioRxiv:2024.04.19.590281

2024
[47]

Meng and K

R. Meng and K. E. Bouchard, Bayesian inference of struc- tured latent spaces from neural population activity with the orthogonal stochastic linear mixing model, PLOS Computational Biology20, e1011975 (2024)

2024
[48]

Abdelaleem, I

E. Abdelaleem, I. Nemenman, and K. M. Martini, Deep variational multivariate information bottleneck— a framework for variational losses, Journal of Machine Learning Research26, 1 (2025)

2025
[49]

Tishby, F

N. Tishby, F. C. Pereira, and W. Bialek, The information bottleneck method, in37th Annual Allerton Conference on Communication, Control, and Computing(1999) pp. 368–377

1999
[50]

arXiv preprint arXiv:1612.00410 , year=

A. Alemi, I. Fischer, J. Dillon, and K. Murphy, Deep variational information bottleneck, inInterna- tional Conference on Learning Representations(2017) arXiv:1612.00410 [cs.LG]

work page arXiv 2017
[51]

T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. (Wiley-Interscience, 2006)

2006
[52]

Friedman, O

N. Friedman, O. Mosenzon, N. Slonim, and N. Tishby, Multivariate information bottleneck, inProceedings of the 17th Conference on Uncertainty in Artificial Intel- ligence (UAI)(2001) pp. 152–161

2001
[53]

Studen´ y and J

M. Studen´ y and J. Vejnarov´ a, The multiinformation function as a tool for measuring stochastic dependence, inLearning in Graphical Models, edited by M. I. Jordan (Springer, 1998) pp. 261–297

1998
[54]

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, An introduction to variational methods for graphi- cal models, Machine Learning37, 183 (1999)

1999
[55]

D. P. Kingma and M. Welling, Auto-encoding variational bayes, inInternational Conference on Learning Represen- tations(2014) arXiv:1312.6114 [stat.ML]

work page internal anchor Pith review arXiv 2014
[56]

Abdelaleem, K

E. Abdelaleem, K. M. Martini, and I. Nemenman, Ac- curate estimation of mutual information in high dimen- sional data, arXiv:2506.00330 [physics.data-an] (2025), arXiv:2506.00330 [physics.data-an]

work page arXiv 2025
[57]

Takens, Detecting strange attractors in turbulence, inDynamical Systems and Turbulence, Warwick 1980, Lecture Notes in Mathematics, Vol

F. Takens, Detecting strange attractors in turbulence, inDynamical Systems and Turbulence, Warwick 1980, Lecture Notes in Mathematics, Vol. 898 (Springer, 1981) pp. 366–381

1980
[58]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, Ad- vances in Neural Information Processing Systems31 (2018), arXiv:1806.07366 [cs.LG]

work page internal anchor Pith review arXiv 2018
[59]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learn- ing for image recognition, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) pp. 770–778

2016
[60]

Gulati, E

P. Gulati, E. Abdelaleem, A. Sederberg, and I. Nemen- man, Mutual information and task-relevant latent dimen- sionality, inICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling(2026) arXiv:2602.08105 [cs.LG]

work page arXiv 2026
[61]

N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw, Geometry from a time series, Physical Review Let- ters45, 712 (1980)

1980
[62]

Eckmann and D

J.-P. Eckmann and D. Ruelle, Ergodic theory of chaos and strange attractors, Reviews of Modern Physics57, 617 (1985)

1985
[63]

J. P. Crutchfield and B. S. McNamara, Equations of mo- tion from a data series, Complex Systems1, 417 (1987)

1987
[64]

Sugihara and R

G. Sugihara and R. M. May, Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series, Nature344, 734 (1990)

1990
[65]

M. B. Kennel, R. Brown, and H. D. I. Abarbanel, Deter- mining embedding dimension for phase-space reconstruc- tion using a geometrical construction, Physical Review A 45, 3403 (1992)

1992
[66]

Ushio, C.-H

M. Ushio, C.-H. Hsieh, R. Masuda, E. R. Deyle, H. Ye, C.-W. Chang, G. Sugihara, and M. Kondoh, Fluctuat- ing interaction network and time-varying stability of a natural fish community, Nature554, 360 (2018)

2018
[67]

Grassberger, Toward a quantitative theory of self- generated complexity, International Journal of Theoreti- cal Physics25, 907 (1986)

P. Grassberger, Toward a quantitative theory of self- generated complexity, International Journal of Theoreti- cal Physics25, 907 (1986)

1986
[68]

Bialek, I

W. Bialek, I. Nemenman, and N. Tishby, Predictability, complexity, and learning, Neural Computation13, 2409 (2001)

2001
[69]

Creutzig, A

F. Creutzig, A. Globerson, and N. Tishby, Past-future information bottleneck in dynamical systems, Physical Review E79, 041925 (2009)

2009
[70]

Assran, Q

M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, Self-supervised learning from images with a joint-embedding predictive architecture, inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition(2023) pp. 15619–15629

2023
[71]

M. M. Peixoto, Structural stability on two-dimensional manifolds, Topology1, 101 (1962)

1962
[72]

Hyv¨ arinen and P

A. Hyv¨ arinen and P. Pajunen, Nonlinear independent component analysis: Existence and uniqueness results, Neural Networks12, 429 (1999)

1999
[73]

O. Yair, R. Talmon, R. R. Coifman, and I. G. Kevrekidis, Reconstruction of normal forms by learning informed ob- servation geometries from data, Proceedings of the Na- tional Academy of Sciences114, E7865 (2017)

2017
[74]

Li, C.-X

S.-H. Li, C.-X. Dong, L. Zhang, and L. Wang, Neural canonical transformation with symplectic flows, Physical Review X10, 021020 (2020)

2020
[75]

M. D. Donsker and S. R. S. Varadhan, Asymptotic eval- uation of certain markov process expectations for large time. IV, Communications on Pure and Applied Mathe- matics36, 183 (1983)

1983
[76]

Poole, S

B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, and G. Tucker, On variational bounds of mutual information, inInt. Conf. Machine Learning(2019) pp. 5171–5180

2019
[77]

Levina and P

E. Levina and P. Bickel, Maximum likelihood estimation of intrinsic dimension, Advances in Neural Information Processing Systems17(2004)

2004
[78]

Facco, M

E. Facco, M. d’Errico, A. Rodriguez, and A. Laio, Esti- mating the intrinsic dimension of datasets by a minimal neighborhood information, Scientific Reports7, 12140 (2017). 13 Appendix A: Implementation and evaluation

2017
[79]

The concatenated delayed embedding [Φ(Ft),

Architecture The shared encoder Φ is a three-layer MLP with hid- den width 256 and ReLU activations that maps each frameF t ∈R 784 to a per-frame embedding of dimen- siond F = 32. The concatenated delayed embedding [Φ(Ft), . . . ,Φ(Ft+nF −1)]∈R 32nF is passed through two parallel linear headsW µ andW ℓ producing the mean µ(x)∈R kz and log-varianceℓ(x)∈R k...
[80]

We downsample the original 128×128 RGB frames to 28×28 grayscale (D= 784); each video containsT= 60 frames, resulting inT−2n F + 1 valid past-future pairs per trajectory

Training We train on the experimental pendulum dataset [29], using up to the first 1000 videos for training and the final 200 for held-out evaluation. We downsample the original 128×128 RGB frames to 28×28 grayscale (D= 784); each video containsT= 60 frames, resulting inT−2n F + 1 valid past-future pairs per trajectory. We train with the Adam optimizer wi...

Showing first 80 references.