pith. sign in

arxiv: 2602.12828 · v2 · submitted 2026-02-13 · 💻 cs.LG · cs.AI

Risk Horizons: Structured Hypothesis Spaces for Longitudinal Clinical Prediction

Pith reviewed 2026-05-15 22:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords longitudinal EHR predictionhyperbolic geometryhypothesis spacesclinical event predictionnext-visit forecastingstructured retrievalrisk conesMIMIC-IV
0
0 comments X

The pith

Risk Horizons builds patient-specific hypothesis spaces in hyperbolic geometry to predict future clinical events from sparse records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Risk Horizons, a framework that constructs compact sets of plausible next clinical events for each patient by merging standard medical coding hierarchies with data-inferred lagged connections across diagnoses, procedures, and medications. It embeds the resulting graph in hyperbolic space and retrieves candidates using directional risk cones, turning an open-ended prediction task into ranking inside a small, medically coherent collection. Experiments on MIMIC-IV and eICU show competitive accuracy with stronger alignment to medical hierarchies than baseline approaches. The work indicates that the structured hyperbolic retrieval step accounts for most gains, while language models add value mainly by reranking the retrieved candidates at inference time.

Core claim

Risk Horizons reframes longitudinal clinical prediction as ranking within patient-specific hypothesis spaces formed by combining deterministic coding hierarchies with data-driven lagged cross-modal associations. These spaces are embedded in hyperbolic geometry and queried with directional risk cones to select plausible futures, yielding competitive next-visit performance on MIMIC-IV and eICU while improving hierarchy consistency across diagnoses, procedures, and medications.

What carries the argument

Hyperbolic embedding of clinical graphs built from coding hierarchies and lagged associations, queried via directional risk cones to retrieve candidates from structured hypothesis spaces.

If this is right

  • Hyperbolic structured candidate retrieval is the primary driver of performance gains.
  • Language models function effectively as rerankers when restricted to clinically grounded candidate sets.
  • Predictions maintain greater consistency with established medical coding hierarchies.
  • The approach handles sparse observations by inferring temporal cross-modal links from historical data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar geometry-aware pruning of candidate spaces could make large-vocabulary prediction feasible in other hierarchical domains such as legal outcomes or financial events.
  • Replacing the data-inferred associations with expert-curated clinical links might further strengthen or alter the quality of the hypothesis spaces.
  • The method points toward a general pattern for scaling prediction models by first building compact, geometry-respecting candidate sets before applying heavier inference.

Load-bearing premise

Data-driven lagged cross-modal associations inferred from sparse longitudinal records accurately reflect the true clinical relationships needed to build useful patient hypothesis spaces.

What would settle it

A controlled experiment showing that swapping the learned lagged associations for random connections or replacing hyperbolic embedding with Euclidean space produces no gain in accuracy or hierarchy consistency would falsify the central claim.

read the original abstract

Predicting future clinical events from longitudinal electronic health records (EHRs) requires selecting plausible outcomes from a large and structured event space under sparse observations. While clinical coding systems provide hierarchical organization of events, cross-modal and temporal relationships are not explicitly specified and must instead be inferred from data, making prediction difficult for weakly observed longitudinal transitions. We introduce Risk Horizons, a geometry-aware framework for constructing patient-specific candidate spaces for multi-modal next-visit prediction. Risk Horizons combines deterministic coding hierarchies with data-driven lagged cross-modal associations, embeds the resulting clinical graph in hyperbolic space, and retrieves candidate futures using directional risk cones. This reframes longitudinal prediction as ranking within a compact, clinically coherent hypothesis space rather than scoring an unconstrained vocabulary. Experiments on MIMIC-IV and eICU demonstrate competitive next-visit prediction performance, with consistently improved hierarchy consistency across diagnoses, procedures, and medications. Further analysis suggests that hyperbolic structured candidate retrieval is the primary driver of performance, while LLMs are effective as constrained inference-time rerankers operating over clinically grounded candidate sets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Risk Horizons, a geometry-aware framework for longitudinal EHR prediction. It builds patient-specific hypothesis spaces by combining deterministic coding hierarchies with data-driven lagged cross-modal associations inferred from co-occurrence statistics, embeds the resulting graph in hyperbolic space, and retrieves candidate futures via directional risk cones. This reframes next-visit prediction as ranking within a compact, clinically coherent space rather than an unconstrained vocabulary. Experiments on MIMIC-IV and eICU report competitive performance with improved hierarchy consistency across diagnoses, procedures, and medications; further analysis attributes gains primarily to the hyperbolic structured retrieval, with LLMs serving as constrained inference-time rerankers.

Significance. If the central claims hold, the work could advance sparse longitudinal prediction by reducing the effective output space to clinically grounded candidates while preserving hierarchical structure. The use of hyperbolic geometry for clinical hierarchies is a natural fit given the tree-like nature of coding systems, and the separation of candidate retrieval from reranking offers a modular approach. However, the absence of quantitative metrics, ablations, and external validation in the manuscript limits the assessed significance to tentative at present.

major comments (2)
  1. [§3] §3: The construction of directional risk cones depends on lagged cross-modal associations inferred via co-occurrence statistics on sparse sequences. No external validation of these edges against curated clinical knowledge graphs or expert review is reported. This is load-bearing for the claim that the resulting hypothesis spaces are clinically useful rather than artifacts of documentation patterns.
  2. [Experiments] Experiments section: The claims of competitive next-visit prediction performance and that 'hyperbolic structured candidate retrieval is the primary driver' are unsupported by any reported metrics, ablation results, error bars, or direct comparisons (e.g., full model vs. hierarchy-only baseline). Without these, the assertion that the geometric component drives gains cannot be evaluated.
minor comments (1)
  1. [Abstract] Abstract: Statements of 'competitive performance' and 'improved hierarchy consistency' are made without numerical values or pointers to specific tables/figures, reducing clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for strengthening the manuscript. We address each major point below and outline specific revisions.

read point-by-point responses
  1. Referee: §3: The construction of directional risk cones depends on lagged cross-modal associations inferred via co-occurrence statistics on sparse sequences. No external validation of these edges against curated clinical knowledge graphs or expert review is reported. This is load-bearing for the claim that the resulting hypothesis spaces are clinically useful rather than artifacts of documentation patterns.

    Authors: We agree that the absence of external validation against curated knowledge graphs or expert review is a limitation. The lagged associations are derived from co-occurrence statistics to capture temporal and cross-modal patterns not present in standard coding hierarchies. In the revised manuscript we will add an explicit limitations paragraph discussing the risk of documentation bias and will include sensitivity analyses on co-occurrence thresholds. We will also outline a clear path for future expert validation or alignment with resources such as UMLS. revision: partial

  2. Referee: Experiments section: The claims of competitive next-visit prediction performance and that 'hyperbolic structured candidate retrieval is the primary driver' are unsupported by any reported metrics, ablation results, error bars, or direct comparisons (e.g., full model vs. hierarchy-only baseline). Without these, the assertion that the geometric component drives gains cannot be evaluated.

    Authors: The referee correctly notes that the submitted manuscript does not contain the quantitative details needed to substantiate these claims. Although the abstract and text assert competitive performance and attribute gains to the hyperbolic component, the Experiments section lacks the supporting tables, ablations, and statistical reporting. In the revision we will expand the Experiments section to include full performance metrics (Recall@K, NDCG, hierarchy consistency) on both MIMIC-IV and eICU, ablation studies comparing the full model to hierarchy-only and Euclidean baselines, error bars across multiple seeds, and direct comparisons isolating the contribution of risk-cone retrieval. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external hierarchies and dataset-inferred associations without self-reduction

full rationale

The paper constructs hypothesis spaces from deterministic coding hierarchies (external clinical coding systems) plus lagged co-occurrence statistics computed on public longitudinal datasets (MIMIC-IV, eICU). This is not a self-definitional loop, fitted input renamed as prediction, or load-bearing self-citation chain. No equations or sections in the provided text reduce the claimed performance or candidate retrieval to a parameter defined by the method itself. Experiments report competitive results and improved hierarchy consistency on held-out data, keeping the chain self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the assumption that hyperbolic geometry preserves clinical hierarchy structure and that lagged associations extracted from EHR data are reliable signals; no free parameters or invented entities beyond the named components are quantified in the abstract.

axioms (2)
  • domain assumption Clinical coding systems supply usable hierarchical organization of events
    Invoked as the deterministic base for candidate construction
  • domain assumption Hyperbolic space is appropriate for embedding the resulting clinical graph
    Used to justify the embedding step for hierarchy-aware retrieval
invented entities (1)
  • directional risk cones no independent evidence
    purpose: Retrieve candidate futures from the hyperbolic embedding
    New retrieval mechanism introduced by the framework

pith-pipeline@v0.9.0 · 5471 in / 1305 out tokens · 73105 ms · 2026-05-15T22:37:09.653526+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.