Risk Horizons: Structured Hypothesis Spaces for Longitudinal Clinical Prediction

Michael F\"arber; Zhan Qu

arxiv: 2602.12828 · v2 · submitted 2026-02-13 · 💻 cs.LG · cs.AI

Risk Horizons: Structured Hypothesis Spaces for Longitudinal Clinical Prediction

Zhan Qu , Michael F\"arber This is my paper

Pith reviewed 2026-05-15 22:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords longitudinal EHR predictionhyperbolic geometryhypothesis spacesclinical event predictionnext-visit forecastingstructured retrievalrisk conesMIMIC-IV

0 comments

The pith

Risk Horizons builds patient-specific hypothesis spaces in hyperbolic geometry to predict future clinical events from sparse records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Risk Horizons, a framework that constructs compact sets of plausible next clinical events for each patient by merging standard medical coding hierarchies with data-inferred lagged connections across diagnoses, procedures, and medications. It embeds the resulting graph in hyperbolic space and retrieves candidates using directional risk cones, turning an open-ended prediction task into ranking inside a small, medically coherent collection. Experiments on MIMIC-IV and eICU show competitive accuracy with stronger alignment to medical hierarchies than baseline approaches. The work indicates that the structured hyperbolic retrieval step accounts for most gains, while language models add value mainly by reranking the retrieved candidates at inference time.

Core claim

Risk Horizons reframes longitudinal clinical prediction as ranking within patient-specific hypothesis spaces formed by combining deterministic coding hierarchies with data-driven lagged cross-modal associations. These spaces are embedded in hyperbolic geometry and queried with directional risk cones to select plausible futures, yielding competitive next-visit performance on MIMIC-IV and eICU while improving hierarchy consistency across diagnoses, procedures, and medications.

What carries the argument

Hyperbolic embedding of clinical graphs built from coding hierarchies and lagged associations, queried via directional risk cones to retrieve candidates from structured hypothesis spaces.

If this is right

Hyperbolic structured candidate retrieval is the primary driver of performance gains.
Language models function effectively as rerankers when restricted to clinically grounded candidate sets.
Predictions maintain greater consistency with established medical coding hierarchies.
The approach handles sparse observations by inferring temporal cross-modal links from historical data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar geometry-aware pruning of candidate spaces could make large-vocabulary prediction feasible in other hierarchical domains such as legal outcomes or financial events.
Replacing the data-inferred associations with expert-curated clinical links might further strengthen or alter the quality of the hypothesis spaces.
The method points toward a general pattern for scaling prediction models by first building compact, geometry-respecting candidate sets before applying heavier inference.

Load-bearing premise

Data-driven lagged cross-modal associations inferred from sparse longitudinal records accurately reflect the true clinical relationships needed to build useful patient hypothesis spaces.

What would settle it

A controlled experiment showing that swapping the learned lagged associations for random connections or replacing hyperbolic embedding with Euclidean space produces no gain in accuracy or hierarchy consistency would falsify the central claim.

read the original abstract

Predicting future clinical events from longitudinal electronic health records (EHRs) requires selecting plausible outcomes from a large and structured event space under sparse observations. While clinical coding systems provide hierarchical organization of events, cross-modal and temporal relationships are not explicitly specified and must instead be inferred from data, making prediction difficult for weakly observed longitudinal transitions. We introduce Risk Horizons, a geometry-aware framework for constructing patient-specific candidate spaces for multi-modal next-visit prediction. Risk Horizons combines deterministic coding hierarchies with data-driven lagged cross-modal associations, embeds the resulting clinical graph in hyperbolic space, and retrieves candidate futures using directional risk cones. This reframes longitudinal prediction as ranking within a compact, clinically coherent hypothesis space rather than scoring an unconstrained vocabulary. Experiments on MIMIC-IV and eICU demonstrate competitive next-visit prediction performance, with consistently improved hierarchy consistency across diagnoses, procedures, and medications. Further analysis suggests that hyperbolic structured candidate retrieval is the primary driver of performance, while LLMs are effective as constrained inference-time rerankers operating over clinically grounded candidate sets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Risk Horizons structures longitudinal EHR prediction by layering coding hierarchies with data-driven lagged associations in hyperbolic space, but the lack of external validation on those associations leaves the clinical usefulness open.

read the letter

The core contribution is a framework that shrinks the massive output space for next-visit prediction by building patient-specific candidate sets: it starts with deterministic clinical coding hierarchies, adds lagged cross-modal associations inferred from data, embeds the graph hyperbolically, and retrieves futures via directional risk cones. This turns the problem into ranking inside a compact coherent space rather than scoring an unconstrained vocabulary, and the experiments on MIMIC-IV and eICU show competitive performance plus better hierarchy consistency across diagnoses, procedures, and medications. The claim that the structured hyperbolic retrieval drives most of the gains, with LLMs serving mainly as constrained rerankers, is a practical observation worth noting. The approach is new in its specific combination of these elements for multi-modal longitudinal clinical data. The main soft spot is the reliance on data-driven lagged associations extracted from sparse records; without checks against curated clinical knowledge graphs or expert review, those edges could reflect documentation artifacts rather than real relationships, which would make the risk cones retrieve less useful candidates. The abstract gives no quantitative metrics, ablations, or error bars, so the strength of the improvements is hard to judge from the summary alone. This paper is for researchers working on structured prediction and geometry-aware methods in healthcare ML. A reader focused on practical ways to constrain large output spaces in EHR models would find the candidate-retrieval step useful. It deserves peer review because the framework is concrete, the datasets are standard, and the experiments exist, even though revisions will need to address validation of the inferred associations.

Referee Report

2 major / 1 minor

Summary. The paper introduces Risk Horizons, a geometry-aware framework for longitudinal EHR prediction. It builds patient-specific hypothesis spaces by combining deterministic coding hierarchies with data-driven lagged cross-modal associations inferred from co-occurrence statistics, embeds the resulting graph in hyperbolic space, and retrieves candidate futures via directional risk cones. This reframes next-visit prediction as ranking within a compact, clinically coherent space rather than an unconstrained vocabulary. Experiments on MIMIC-IV and eICU report competitive performance with improved hierarchy consistency across diagnoses, procedures, and medications; further analysis attributes gains primarily to the hyperbolic structured retrieval, with LLMs serving as constrained inference-time rerankers.

Significance. If the central claims hold, the work could advance sparse longitudinal prediction by reducing the effective output space to clinically grounded candidates while preserving hierarchical structure. The use of hyperbolic geometry for clinical hierarchies is a natural fit given the tree-like nature of coding systems, and the separation of candidate retrieval from reranking offers a modular approach. However, the absence of quantitative metrics, ablations, and external validation in the manuscript limits the assessed significance to tentative at present.

major comments (2)

[§3] §3: The construction of directional risk cones depends on lagged cross-modal associations inferred via co-occurrence statistics on sparse sequences. No external validation of these edges against curated clinical knowledge graphs or expert review is reported. This is load-bearing for the claim that the resulting hypothesis spaces are clinically useful rather than artifacts of documentation patterns.
[Experiments] Experiments section: The claims of competitive next-visit prediction performance and that 'hyperbolic structured candidate retrieval is the primary driver' are unsupported by any reported metrics, ablation results, error bars, or direct comparisons (e.g., full model vs. hierarchy-only baseline). Without these, the assertion that the geometric component drives gains cannot be evaluated.

minor comments (1)

[Abstract] Abstract: Statements of 'competitive performance' and 'improved hierarchy consistency' are made without numerical values or pointers to specific tables/figures, reducing clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for strengthening the manuscript. We address each major point below and outline specific revisions.

read point-by-point responses

Referee: §3: The construction of directional risk cones depends on lagged cross-modal associations inferred via co-occurrence statistics on sparse sequences. No external validation of these edges against curated clinical knowledge graphs or expert review is reported. This is load-bearing for the claim that the resulting hypothesis spaces are clinically useful rather than artifacts of documentation patterns.

Authors: We agree that the absence of external validation against curated knowledge graphs or expert review is a limitation. The lagged associations are derived from co-occurrence statistics to capture temporal and cross-modal patterns not present in standard coding hierarchies. In the revised manuscript we will add an explicit limitations paragraph discussing the risk of documentation bias and will include sensitivity analyses on co-occurrence thresholds. We will also outline a clear path for future expert validation or alignment with resources such as UMLS. revision: partial
Referee: Experiments section: The claims of competitive next-visit prediction performance and that 'hyperbolic structured candidate retrieval is the primary driver' are unsupported by any reported metrics, ablation results, error bars, or direct comparisons (e.g., full model vs. hierarchy-only baseline). Without these, the assertion that the geometric component drives gains cannot be evaluated.

Authors: The referee correctly notes that the submitted manuscript does not contain the quantitative details needed to substantiate these claims. Although the abstract and text assert competitive performance and attribute gains to the hyperbolic component, the Experiments section lacks the supporting tables, ablations, and statistical reporting. In the revision we will expand the Experiments section to include full performance metrics (Recall@K, NDCG, hierarchy consistency) on both MIMIC-IV and eICU, ablation studies comparing the full model to hierarchy-only and Euclidean baselines, error bars across multiple seeds, and direct comparisons isolating the contribution of risk-cone retrieval. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external hierarchies and dataset-inferred associations without self-reduction

full rationale

The paper constructs hypothesis spaces from deterministic coding hierarchies (external clinical coding systems) plus lagged co-occurrence statistics computed on public longitudinal datasets (MIMIC-IV, eICU). This is not a self-definitional loop, fitted input renamed as prediction, or load-bearing self-citation chain. No equations or sections in the provided text reduce the claimed performance or candidate retrieval to a parameter defined by the method itself. Experiments report competitive results and improved hierarchy consistency on held-out data, keeping the chain self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the assumption that hyperbolic geometry preserves clinical hierarchy structure and that lagged associations extracted from EHR data are reliable signals; no free parameters or invented entities beyond the named components are quantified in the abstract.

axioms (2)

domain assumption Clinical coding systems supply usable hierarchical organization of events
Invoked as the deterministic base for candidate construction
domain assumption Hyperbolic space is appropriate for embedding the resulting clinical graph
Used to justify the embedding step for hierarchy-aware retrieval

invented entities (1)

directional risk cones no independent evidence
purpose: Retrieve candidate futures from the hyperbolic embedding
New retrieval mechanism introduced by the framework

pith-pipeline@v0.9.0 · 5471 in / 1305 out tokens · 73105 ms · 2026-05-15T22:37:09.653526+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We embed nodes into the d-dimensional Poincaré ball of curvature −c … risk cones … cos∠(u;vT)=⟨logc0(zu),dT⟩/∥logc0(zu)∥∥dT∥
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

lagged pointwise mutual information (PMI) … stability filtering … typed edge reconstruction loss

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.