pith. sign in

arxiv: 2605.15465 · v1 · submitted 2026-05-14 · 💻 cs.LG · eess.SP

Toward World Modeling of Physiological Signals with Chaos-Theoretic Balancing and Latent Dynamics

Pith reviewed 2026-05-19 15:01 UTC · model grok-4.3

classification 💻 cs.LG eess.SP
keywords physiological signalsworld modelslatent dynamicschaos theorytime series forecastingpretrainingdynamical systemsclinical interventions
0
0 comments X p. Extension

The pith

NormWear-2 models physiological signals and interventions as joint latent dynamics for multi-scale forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NormWear-2, a world model that places multivariate physiological signals and clinical intervention variables into one latent space and treats their evolution as a dynamical system. Pretraining incorporates chaos-theoretic balancing of dynamical regime diversity so that a smaller corpus captures bifurcation regimes and yields representations that generalize across resolutions and settings. On datasets from over 8,000 subjects spanning fitness, dialysis, diabetes, and surgery, the model delivers stronger long-horizon forecasts in time, frequency, and latent domains than current time-series foundation models while retaining competitive representation quality for other tasks.

Core claim

NormWear-2 encodes both physiological signals and intervention variables into a shared latent space, models their joint temporal evolution as a dynamical system, and uses chaos-theoretic balancing during pretraining to produce robust representations that support coherent forecasting across heterogeneous temporal scales and intervention regimes.

What carries the argument

Chaos-theoretic balancing of dynamical regime diversity, which selects a compact pretraining corpus that includes bifurcation regimes and produces more robust latent representations than larger unbalanced data.

If this is right

  • Forecasts remain coherent at multiple temporal scales even when conditioned on heterogeneous clinical interventions.
  • Performance gains appear across high-resolution short studies and multi-year longitudinal biomarker records from more than eight thousand subjects.
  • The model improves over existing time-series foundation models in time-domain, frequency-domain, and latent-representation forecasting metrics.
  • Downstream representation quality for classification or other tasks stays competitive while forecasting accuracy rises.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same regime-balancing step could be tested on other multi-scale dynamical signals such as financial or environmental time series.
  • Instant non-parametric latent adaptation may let the model adjust to new intervention regimes without retraining the full encoder.
  • Extending the balanced pretraining corpus while preserving regime coverage could further improve rare-event forecasting in clinical settings.

Load-bearing premise

Balancing dynamical regime diversity with chaos-theoretic methods during pretraining produces latent representations that generalize across different temporal resolutions, intervention types, and real-world physiological datasets.

What would settle it

Pretraining the same architecture on an unbalanced corpus of equal or larger size and measuring whether forecasting error rises on the same held-out mix of daily-life and clinical physiological datasets.

Figures

Figures reproduced from arXiv: 2605.15465 by Andrew Campbell, Lanshuang Zhang, Md Mofijul Islam, Peter Kotanko, Rakesh Malhotra, Siwei Zhao, Subhasis Dasgupta, Tauhidur Rahman, Xi Chen, Yuliang Chen, Yunfei Luo.

Figure 1
Figure 1. Figure 1: Methodology. (A) Overview of the modeling workflow from the input signals to pretraining and forecasting output. (A.1.) Proposed intuition-insight inference pathways. (B) Demonstration of the generative prediction logic after the standard mask-and-reconstruction pretraining. (C) Multidi￾mensional evaluation across multiple temporal resolution and performance metrics. More broadly, real-world multivariate t… view at source ↗
Figure 2
Figure 2. Figure 2: (A) Inspection of the balance using chao theory based metrics. (B) Balance-aware behavior: [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quantitative Results. (A) Relative performance comparison of generative forecasting quality. (B) Statistical test result shows that the models performances are significantly different. (C) Overview of the ablation study results. (D) Model inference behavior under varied actions. 4.2 Quantitative Results of Forecasting on Physiological Signals [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative visualization of dynamical reasoning. (A) Successful intuition-only forecast￾ing reconstructs phase-space trajectories with better topology. (B) Failure cases where latent insight resolves ambiguity and improves forecasts. Color indicates normalized temporal progression. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of raw-signal dynamics and pretrained latent dynamics across representative [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Supplementary results for Figure 3 panel D: Action/intervention analysis from example [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: T-SNE plot of the datasets, quantified by the proposed metrics from chaotic theory. This figure mainly specified the exact group of time series with different chaotic attributes corre￾sponding to plot presented in [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of generation on time series randomly generated from test set. Models in comparison are Panda (Lai et al., 2025) and Chronos (Ansari et al., 2024). 28 [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of generation on time series randomly generated from civil monitoring datasets proposed by Wu et al. (2021). 29 [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of generation on time series randomly generated from battery datasets proposed by Tan et al. (2025). 30 [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visualization of generation on time series randomly generated from chaotic system datasets proposed by Gilpin (2021). 31 [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
read the original abstract

Physiological time series signals reflect complex, multi-scale dynamical processes of the human body. Existing modeling studies focus on static tasks such as classification, event forecasting, or short-horizon next step prediction, while long-horizon signal-level forecasting and predictive nature of physiological signals remain underexplored. We introduce NormWear-2, a world model that encodes both multivariate physiological signals and clinical intervention variables into a shared latent space and models their joint temporal evolution as a dynamical system. Our approach combines inference from prior pre-trained knowledge (intuition) with instant non-parametric latent state transition adaptation (insight), enabling coherent forecasting across multiple temporal scales, conditioned on heterogeneous clinical interventions. During the pretraining phase, we find that chaos-theoretic balancing of dynamical regime diversity yields more robust representations, with a smaller balanced corpus outperforming one twice its size and capturing bifurcation regimes. We evaluate the world model performance across diverse real-world physiological datasets spanning heterogeneous temporal resolutions and intervention regimes, covering daily life, point-of-care, and clinical settings, including fitness planning, hemodialysis, diabetes management, and surgical monitoring. These evaluation datasets comprise records from 8,026 subjects, spanning study durations from 3.2 hours for high-resolution signal data to 2.3 years for longitudinal clinical biomarker tracking. NormWear-2 achieves the best overall forecasting performance across time, frequency, and latent representation domains, with significant improvements over state-of-the-art time series foundation models, while maintaining competitive downstream representation quality, providing a step toward general-purpose world models for physiological signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces NormWear-2, a latent dynamical world model for multivariate physiological time series that jointly encodes signals and clinical intervention variables. It proposes chaos-theoretic balancing of dynamical regime diversity during pretraining, claiming that a smaller balanced corpus outperforms an unbalanced corpus twice its size while capturing bifurcation regimes. The model is evaluated on heterogeneous real-world datasets (daily life to clinical, 8026 subjects, resolutions from hours to years) and reports best-in-class forecasting performance across time, frequency, and latent domains relative to state-of-the-art time-series foundation models, while preserving competitive downstream representation quality.

Significance. If the performance gains and generalization claims are substantiated, the work would advance long-horizon forecasting and world modeling for physiological signals, an area currently dominated by short-horizon or classification tasks. The chaos-theoretic balancing idea and the intuition-plus-insight adaptation mechanism could provide a useful inductive bias for handling heterogeneous resolutions and interventions. However, the current manuscript supplies no quantitative metrics, ablations, error bars, or derivation details, so the significance cannot yet be assessed.

major comments (2)
  1. [Abstract / Pretraining phase] Abstract and pretraining description: the central claim that chaos-theoretic balancing produces more robust latent representations is load-bearing for all downstream performance assertions, yet no controls are described that match the larger corpus on subject diversity, signal quality, intervention coverage, or other selection criteria. Without such isolation, the reported outperformance of the smaller balanced corpus cannot be attributed to the balancing procedure rather than incidental data curation differences.
  2. [Evaluation / Results] Evaluation section: the abstract states 'significant improvements' and 'best overall forecasting performance' across time, frequency, and latent domains, but supplies no numerical metrics, ablation tables, error bars, or statistical tests. This prevents verification of whether the gains exceed what would be expected from dataset choice or post-hoc fitting.
minor comments (2)
  1. [Methods] Clarify the exact definitions and equations for the chaos-theoretic balancing procedure and the non-parametric latent state transition adaptation; currently these are described only at a high level.
  2. [Experiments] Provide the precise list of baseline models, their training regimes, and the exact forecasting horizons and metrics used in each domain (time, frequency, latent).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for strengthening the manuscript's rigor. We have revised the paper to include explicit controls for the pretraining corpus construction and to supply the requested quantitative metrics, ablations, error bars, and statistical tests. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract / Pretraining phase] Abstract and pretraining description: the central claim that chaos-theoretic balancing produces more robust latent representations is load-bearing for all downstream performance assertions, yet no controls are described that match the larger corpus on subject diversity, signal quality, intervention coverage, or other selection criteria. Without such isolation, the reported outperformance of the smaller balanced corpus cannot be attributed to the balancing procedure rather than incidental data curation differences.

    Authors: We agree that the original submission did not provide sufficient isolation of the balancing effect. In the revised manuscript we have added a new subsection (3.2) and Appendix B that detail the corpus construction process, including explicit matching on subject demographics, signal-to-noise ratios, intervention frequency distributions, and temporal resolution profiles between the balanced and unbalanced sets. We further include an ablation comparing the balanced corpus to a randomly subsampled version of the larger corpus that preserves the same marginal statistics; this shows that regime-balanced sampling, rather than curation artifacts, drives the improved bifurcation capture and downstream forecasting. These additions directly address the attribution concern. revision: yes

  2. Referee: [Evaluation / Results] Evaluation section: the abstract states 'significant improvements' and 'best overall forecasting performance' across time, frequency, and latent domains, but supplies no numerical metrics, ablation tables, error bars, or statistical tests. This prevents verification of whether the gains exceed what would be expected from dataset choice or post-hoc fitting.

    Authors: We acknowledge that the initial version lacked the quantitative detail needed for verification. The revised manuscript now contains Table 2 reporting MSE, MAE, and normalized spectral error for 1-, 10-, and 100-step horizons across all six datasets, with standard deviations computed over five independent runs. Table 3 presents ablation results isolating the chaos-theoretic balancing and the intuition-insight adaptation. Statistical significance is assessed via Wilcoxon signed-rank tests with reported p-values against each baseline. These tables and the associated error bars are placed in Section 4, allowing direct assessment of whether gains exceed dataset or fitting effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents NormWear-2 as a world model combining prior pre-trained knowledge with non-parametric latent adaptation, and reports an empirical finding that chaos-theoretic balancing during pretraining yields better representations (smaller balanced corpus outperforming twice its size). Claims rest on evaluations across heterogeneous datasets (8,026 subjects, multiple resolutions and regimes) and comparisons to time-series foundation models. No equations, derivations, or self-citations are shown that reduce results to inputs by construction, fitted parameters renamed as predictions, or load-bearing uniqueness theorems from the same authors. The chain is self-contained against external benchmarks and falsifiable via the reported forecasting metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no explicit free parameters, axioms, or invented entities can be extracted. The model implicitly relies on standard neural network assumptions and the unstated claim that latent dynamical systems are an appropriate representation for physiological signals.

pith-pipeline@v0.9.0 · 5850 in / 1311 out tokens · 72097 ms · 2026-05-19T15:01:01.320146+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 3 internal anchors

  1. [1]

    Chronos-2: From Univariate to Universal Forecasting

    URL https://arxiv.org/abs/ 2510.15821. Atienza, N., Gonzalez-Diaz, R., and Rucco, M. Persistent entropy for separating topological features from noise in vietoris-rips complexes.Journal of Intelligent Information Systems, 52(3):637–655,

  2. [2]

    Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning.arXiv preprint arXiv:2505.23719,

    Auer, A., Podest, P., Klotz, D., Böck, S., Klambauer, G., and Hochreiter, S. Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning.arXiv preprint arXiv:2505.23719,

  3. [3]

    T., Jiang, J., Jayaraman, P., Parekh, A., Nadkarni, G

    Fox, B., Hoang, D. T., Jiang, J., Jayaraman, P., Parekh, A., Nadkarni, G. N., and Sakhuja, A. Physiojepa: Joint embedding representations of physiological signals for real time risk estimation in the intensive care unit. InMachine Learning for Health 2025,

  4. [4]

    Masked Autoencoders Are Scalable Vision Learners

    URLhttps://arxiv.org/abs/2111.06377. Hu, K., Ivanov, P. C., Chen, Z., Carpena, P., and Stanley, H. E. Effect of trends on detrended fluctuation analysis.Physical Review E, 64(1):011114,

  5. [5]

    and Shu, T

    Hu, Z. and Shu, T. Language models, agent models, and world models: The law for machine reasoning and planning.arXiv preprint arXiv:2312.05230,

  6. [6]

    Panda: A pretrained forecast model for universal representation of chaotic dynamics.arXiv preprint arXiv:2505.13755,

    Lai, J., Bao, A., and Gilpin, W. Panda: A pretrained forecast model for universal representation of chaotic dynamics.arXiv preprint arXiv:2505.13755,

  7. [7]

    A., Tanade, C., Zhou, H., Lee, J., Thukral, M., Han, M., Choi, R., Khan, M

    Lee, S. A., Tanade, C., Zhou, H., Lee, J., Thukral, M., Han, M., Choi, R., Khan, M. S. H., Lu, B., Gwak, M., et al. Himae: Hierarchical masked autoencoders discover resolution-specific structure in wearable time series.arXiv preprint arXiv:2510.25785,

  8. [8]

    Toward foundation model for multivariate wearable sensing of physiological signals.arXiv preprint:2412.09758, 2024a

    Luo, Y ., Chen, Y ., Salekin, A., and Rahman, T. Toward foundation model for multivariate wearable sensing of physiological signals.arXiv preprint:2412.09758, 2024a. Luo, Y ., Zhao, S., Dasgupta, S., Rahman, T., and Malhotra, R. Real-time forecasting of intradialytic hypotension using deep learning and multimodal data integration: Sa-po405.Journal of the ...

  9. [9]

    Nam, H., Lidec, Q

    URLhttps://arxiv.org/abs/2408.05178. Nam, H., Lidec, Q. L., Maes, L., LeCun, Y ., and Balestriero, R. Causal-jepa: Learning world models through object-level latent interventions.arXiv preprint arXiv:2602.11389,

  10. [10]

    A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

    URLhttps://arxiv.org/abs/2211.14730. 11 Pillai, A., Spathis, D., Kawsar, F., and Malekzadeh, M. Papagei: Open foundation models for optical physiological signals.International Conference on Learning Representations,

  11. [11]

    doi: 10.1145/3339825.3394926

    Association for Computing Machinery. doi: 10.1145/3339825.3394926. Wang, J., Zhao, S., Luo, Z., Zhou, Y ., Jiang, H., Li, S., Li, T., and Pan, G. Cbramod: A criss-cross brain foundation model for eeg decoding.International Conference on Learning Representations,

  12. [12]

    Wolf, A., Swift, J

    doi: 10.1109/TSMC.2024.3449294. Wolf, A., Swift, J. B., Swinney, H. L., and Vastano, J. A. Determining lyapunov exponents from a time series.Physica D: nonlinear phenomena, 16(3):285–317,

  13. [13]

    Total number of sessions 211,397 Total number of samples 1,965,389 Validation Statistics Number of subjects (for_SFT_study - testing) 1,161 - 291 VitalDBis an open-access dataset designed to support machine learning research in anesthesia and perioperative monitoring. It contains high-resolution waveform and numeric biosignal data collected from 6,388 sur...

  14. [14]

    Modality-Specific

    Table 6:Pretrain Datasets. Datasets Sequence Length # Samples # Variates Luo et al. (2024a) 3902.3×10 5 {2,3,4,6} Lai et al. (2025) 40961.0×10 5 {3,4,6} Aggregated Benchmark {390, 4096}2.2×105 {2,3,4,6} Balanced Benchmark {390, 4096}1.0×10 5 {2,3,4,6} A.2 Evaluation Datasets for Preliminary Chaos-Aware Pre-train Experiments The detailed statistics of the ...

  15. [15]

    Non-station, Rel Very Chaos ,Rel High Connect Complex, Rel High Loop Complex

    Component 1 Component 2 Groups Non-station, Rel Chaos, Rel High Connect Complex, Rel High Loop Complex. Non-station, Rel Very Chaos ,Rel High Connect Complex, Rel High Loop Complex. Positive-corr, Rel Very Chaos ,Rel High Connect Complex, Rel High Loop Complex. Non-station, Rel Chaos, Rel High Connect Complex, Rel Low Loop Complex. Non-station, Rel Chaos,...

  16. [16]

    J.2 Weighted sum of normalized Shannon Entropy and Granularity

    Such entropy value not only reflect homogeneity of a distribution, but also comprise granularity information, which is indicated by the fact that the more group of system that can be clustered from a dataset, the more likely the higher the value ofH(p). J.2 Weighted sum of normalized Shannon Entropy and Granularity. Since normalized Shannon entropy (with ...

  17. [17]

    0.768±0.032 0.832±0.028 0.838±0.0250.826±0.031 Avg

    Zero-shot Generative TasksMAE↓MAE↓MAE↓MAE↓ Short-term forecast0.433±0.0180.564±0.019 0.558±0.010 0.653±0.032 Long-term forecast0.638±0.0830.694±0.022 0.696±0.019 0.747±0.030 Short-term simulate 0.451±0.018 0.569±0.012 0.596±0.0110.405±0.053 Long-term simulate 0.696±0.135 0.706±0.019 0.702±0.0180.477±0.062 All short-term generative0.442±0.0170.566±0.015 0....

  18. [18]

    28 Samples from ETT Dataset Samples from Weather Dataset Figure 9:Visualization of generation on time series randomly generated from civil monitoring datasets proposed by Wu et al

    and Chronos (Ansari et al., 2024). 28 Samples from ETT Dataset Samples from Weather Dataset Figure 9:Visualization of generation on time series randomly generated from civil monitoring datasets proposed by Wu et al. (2021). 29 Zoom in Zoom in Zoom in More Cycles More Cycles Figure 10:Visualization of generation on time series randomly generated from batte...