Geometric Characterisation and Structured Trajectory Surrogates for Clinical Dataset Condensation
Pith reviewed 2026-05-09 22:50 UTC · model grok-4.3
The pith
Quadratic Bezier surrogates replace full SGD trajectories to overcome representability limits in clinical dataset condensation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Geometric analysis establishes that a fixed synthetic dataset spans only a limited portion of the parameter changes induced by training on real data, imposing a conditional representability bottleneck whenever the SGD supervision signal is spectrally broad. Quadratic Bezier trajectory surrogates, optimized to minimize average loss along the path between initial and final model states, substitute for full SGD trajectories with a lower-rank structured signal that aligns more closely with the optimization constraints of the synthetic set.
What carries the argument
Quadratic Bezier trajectory surrogates: curves between initial and final model states, optimized to reduce average loss along the path, that replace broad SGD supervision with a lower-rank signal aligned to fixed synthetic data constraints.
If this is right
- BTM matches or exceeds standard trajectory matching on five clinical datasets.
- Gains are largest in low-prevalence and low-synthetic-budget regimes.
- Trajectory storage requirements drop substantially.
- Effective trajectory matching relies on structuring the supervision signal rather than reproducing stochastic optimization paths.
Where Pith is reading between the lines
- The same geometric bottleneck may limit other trajectory-based condensation techniques that rely on full SGD histories.
- The method could be tested on non-clinical datasets to determine whether the benefit of structured surrogates generalizes beyond healthcare.
- Optimizing low-rank path surrogates might extend to compressing optimization histories in broader machine-learning settings.
Load-bearing premise
Quadratic Bezier trajectory surrogates optimized to reduce average loss along the path will replace broad SGD-derived supervision with a more structured, lower-rank signal better aligned with the optimisation constraints of a fixed synthetic dataset.
What would settle it
If models trained on BTM-generated synthetic data show no accuracy improvement over standard trajectory matching in low-prevalence clinical tasks with small synthetic budgets, the claim that the structured surrogates overcome the representability bottleneck would be falsified.
Figures
read the original abstract
Dataset condensation constructs compact synthetic datasets that retain the training utility of large real-world datasets, enabling efficient model development and potentially supporting downstream research in governed domains such as healthcare. Trajectory matching (TM) is a widely used condensation approach that supervises synthetic data using changes in model parameters observed during training on real data, yet the structure of this supervision signal remains poorly understood. In this paper, we provide a geometric characterisation of trajectory matching, showing that a fixed synthetic dataset can only reproduce a limited span of such training-induced parameter changes. When the resulting supervision signal is spectrally broad, this creates a conditional representability bottleneck. Motivated by this mismatch, we propose Bezier Trajectory Matching (BTM), which replaces SGD trajectories with quadratic Bezier trajectory surrogates between initial and final model states. These surrogates are optimised to reduce average loss along the path while replacing broad SGD-derived supervision with a more structured, lower-rank signal that is better aligned with the optimisation constraints of a fixed synthetic dataset, and they substantially reduce trajectory storage. Experiments on five clinical datasets demonstrate that BTM consistently matches or improves upon standard trajectory matching, with the largest gains in low-prevalence and low-synthetic-budget settings. These results indicate that effective trajectory matching depends on structuring the supervision signal rather than reproducing stochastic optimisation paths.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper provides a geometric characterisation of trajectory matching (TM) for dataset condensation, arguing that fixed synthetic datasets can only span a limited portion of SGD-induced parameter changes, creating a representability bottleneck when the supervision signal is spectrally broad. It proposes Bezier Trajectory Matching (BTM), which substitutes SGD trajectories with quadratic Bezier curve surrogates between initial and final model states; these surrogates are optimised to minimise average loss along the path, yielding a more structured, lower-rank signal aligned with synthetic data constraints and substantially reducing storage. Experiments across five clinical datasets show BTM matching or exceeding standard TM, with the largest gains reported in low-prevalence and low-synthetic-budget regimes.
Significance. If the geometric argument and empirical gains hold, the work offers a principled way to improve dataset condensation for clinical applications, where privacy, scarcity, and low-prevalence conditions are common. The reduction in trajectory storage and the shift to structured supervision are practical strengths that could aid efficient model development in governed domains. The approach builds on TM while addressing its structural limitations, and the reported regime-specific improvements suggest targeted utility.
major comments (3)
- [§3] §3 (Geometric Characterisation): The central claim that a fixed synthetic dataset reproduces only a limited span of training-induced parameter changes, leading to a conditional representability bottleneck, lacks the explicit derivation, spectral analysis, or theorem establishing the span limitation. This is load-bearing for motivating BTM over standard TM.
- [§4] §4 (Bezier Trajectory Matching): The optimisation of quadratic Bezier surrogates to reduce average loss along the path is described at a high level but provides no details on the loss formulation, optimisation procedure, hyperparameters, or validation against SGD trajectories. This directly affects the claim that the surrogates supply a better-aligned, lower-rank signal.
- [§5] §5 (Experiments): Results are summarised as 'consistent improvements' and 'largest gains' in low-prevalence/low-budget settings across five clinical datasets, yet no quantitative metrics, error bars, statistical tests, dataset characteristics (e.g., prevalence rates), or ablation on the Bezier order are reported. This prevents assessment of whether the gains are robust or merely match TM.
minor comments (2)
- [Abstract] The abstract states that BTM 'substantially reduce[s] trajectory storage' but supplies no quantitative comparison (e.g., bytes or number of points) relative to standard TM.
- [§4] Notation for the quadratic Bezier curves (control points, parameterisation) would benefit from an explicit equation in §4 to clarify how the surrogates are constructed and optimised.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The comments highlight opportunities to strengthen the rigor of the geometric analysis, implementation details, and experimental reporting. We address each major comment below and will incorporate the suggested additions in the revised manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Geometric Characterisation): The central claim that a fixed synthetic dataset reproduces only a limited span of training-induced parameter changes, leading to a conditional representability bottleneck, lacks the explicit derivation, spectral analysis, or theorem establishing the span limitation. This is load-bearing for motivating BTM over standard TM.
Authors: We agree that §3 would benefit from greater formality. In the revision we will add an explicit derivation: we model the synthetic dataset's effect on parameter updates as a linear map whose image is at most rank-k (where k equals the number of synthetic samples times output dimension), while SGD trajectories span a higher-dimensional subspace. We will include a spectral analysis of the covariance of trajectory increments and a theorem stating that the representability gap is bounded below by the sum of eigenvalues beyond rank k. This material will appear as a new subsection with supporting lemmas. revision: yes
-
Referee: [§4] §4 (Bezier Trajectory Matching): The optimisation of quadratic Bezier surrogates to reduce average loss along the path is described at a high level but provides no details on the loss formulation, optimisation procedure, hyperparameters, or validation against SGD trajectories. This directly affects the claim that the surrogates supply a better-aligned, lower-rank signal.
Authors: We will expand §4 with the precise loss L = ∫_0^1 ℓ(Bezier(γ(t); θ)) dt approximated by 10-point quadrature, where γ(t) is the quadratic Bezier parameterised by control points. Optimisation uses Adam (lr=0.01, 200 epochs) on the control-point coordinates only. A hyperparameter table and pseudocode will be added. We will also include a validation subsection comparing Bezier surrogate gradients to SGD trajectories on a 2-D toy problem, demonstrating lower effective rank (via singular-value decay) and closer alignment with the synthetic-data constraint. revision: yes
-
Referee: [§5] §5 (Experiments): Results are summarised as 'consistent improvements' and 'largest gains' in low-prevalence/low-budget settings across five clinical datasets, yet no quantitative metrics, error bars, statistical tests, dataset characteristics (e.g., prevalence rates), or ablation on the Bezier order are reported. This prevents assessment of whether the gains are robust or merely match TM.
Authors: We will augment §5 with a results table reporting mean accuracy ± std over five independent runs, error bars on all figures, and p-values from Wilcoxon signed-rank tests against TM. A supplementary table will list prevalence rates, class imbalance ratios, and sample sizes for each of the five clinical datasets. Finally, we will add an ablation study varying Bezier order (linear, quadratic, cubic) and report the corresponding condensation performance, confirming quadratic as the best trade-off. revision: yes
Circularity Check
Minor self-citation present but derivation remains independent
full rationale
The paper derives a geometric characterisation of trajectory matching by arguing that fixed synthetic datasets span only a limited portion of SGD-induced parameter trajectories, then introduces quadratic Bezier surrogates optimised for average loss along the path. This leads to BTM as a lower-rank supervision signal. Experiments on five clinical datasets compare BTM directly to standard trajectory matching, with gains reported in low-prevalence and low-budget regimes. No equation reduces the claimed performance to a quantity fitted from the same evaluation data, and no load-bearing premise collapses to a self-citation chain. The single minor self-citation (likely to prior trajectory-matching work) is not used to justify uniqueness or forbid alternatives. The central claim is therefore supported by an independent geometric argument plus external empirical comparison.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Therefore sup t∈[0,1] ∥Φ(t)−c(t)∥2 = κ 8.(55) B.5 Proof of Theorem 4 Proof.We prove the two claims separately. (i) Smooth curvature.Consider the optimised quadratic Bézier surrogate Φ(t) = (1−t)2θ0 + 2t(1−t)ϕ⋆+t 2θT.(56) Differentiating twice with respect totyields Φ′′(t) = 2(θ0−2ϕ⋆+θT ),(57) which is constant int. Hence sup t∈[0,1] ∥Φ′′(t)∥2 = 2∥θ0−2ϕ⋆+θ...
-
[2]
Capillary refill rate-0.0
-
[3]
Capillary refill rate-1.0
-
[4]
Diastolic blood pressure
-
[5]
Fraction inspired oxygen
-
[6]
Glascow coma scale eye opening-2 To Pain
-
[7]
Glascow coma scale eye opening-3 To speech
-
[8]
Glascow coma scale eye opening-1 No Response
-
[9]
Glascow coma scale eye opening-4 Spontaneously
-
[10]
Glascow coma scale eye opening-0 None
-
[11]
Glascow coma scale motor response-1 No Movement
-
[12]
Glascow coma scale motor response-3 Abnormal flex- ion
-
[13]
Glascow coma scale motor response-2 Abnormal ex- tension
-
[14]
Glascow coma scale motor response-4 Flex-withdraws
-
[15]
Glascow coma scale motor response-5 Localizes Pain
-
[16]
Glascow coma scale mo- tor response-6 Obeys Com- mands
-
[17]
Glascow coma scale total- 11
-
[18]
Glascow coma scale total- 10
-
[19]
Glascow coma scale total- 13
-
[20]
Glascow coma scale total- 12
-
[21]
Glascow coma scale total- 15
-
[22]
Glascow coma scale total- 14
-
[23]
Glascow coma scale total-3
-
[24]
Glascow coma scale total-5
-
[25]
Glascow coma scale total-4
-
[26]
Glascow coma scale total-7
-
[27]
Glascow coma scale total-6
-
[28]
Glascow coma scale total-9
-
[29]
Glascow coma scale total-8
-
[30]
Glascow coma scale verbal response-1 No Response
-
[31]
Glascow coma scale verbal response-4 Confused
-
[32]
Glascow coma scale verbal response-2 Incomprehensi- ble sounds
-
[33]
Glascow coma scale ver- bal response-3 Inappropri- ate Words
-
[34]
Glascow coma scale verbal response-5 Oriented
-
[35]
Systolic blood pressure
-
[36]
mask-Capillary refill rate
-
[37]
mask-Diastolic blood pres- sure
-
[38]
mask-Fraction inspired oxygen
-
[39]
mask-Glascow coma scale eye opening
-
[40]
mask-Glascow coma scale motor response
-
[41]
mask-Glascow coma scale total
-
[42]
mask-Glascow coma scale verbal response
-
[43]
mask-Mean blood pressure
-
[44]
mask-Oxygen saturation
-
[45]
mask-Respiratory rate
-
[46]
mask-Systolic blood pres- sure
-
[47]
mask-pH List of 25 patient disorders involved in Phenotyping in MIMIC-III dataset
-
[48]
Acute and unspecified re- nal failure
-
[49]
Acute cerebrovascular dis- ease
-
[50]
Acute myocardial infarc- tion
-
[51]
Cardiac dysrhythmias
-
[52]
Chronic kidney disease
-
[53]
Chronic obstructive pul- monary disease
-
[54]
Complications of surgi- cal/medical care
-
[55]
Conduction disorders
-
[56]
Congestive heart failure; non hypertensive
-
[57]
Coronary atherosclerosis and related
-
[58]
Diabetes mellitus with complications
-
[59]
Diabetes mellitus without complication
-
[60]
Disorders of lipid metabolism
-
[61]
Essential hypertension
-
[62]
Fluid and electrolyte disor- ders
-
[63]
Gastrointestinal haemor- rhage
-
[64]
Hypertension with compli- cations
-
[65]
Other liver diseases
-
[66]
Other lower respiratory disease
-
[67]
Other upper respiratory disease
-
[68]
Pleurisy; pneumothorax; pulmonary collapse
-
[69]
Respiratory failure; insuffi- ciency; arrest
-
[70]
Septicemia (except in labour)
-
[71]
Dropout (0.25) is applied after the hidden layer
Shock 30 D IMPLEMENTATION DETAILS D.1 Model Architectures NHS Cohorts and eICU datasets.For tabular datasets, we use a multi-layer perceptron (MLP) (Rumelhart et al., 1986) with a single hidden layer ofh units, ReLU activation, and a sigmoid output layer. Dropout (0.25) is applied after the hidden layer. We seth = 256for eICU and h = 64for the NHS dataset...
work page 1986
-
[72]
as the backbone for dataset condensation. The model consists of a single residual temporal block with 64 channels, kernel size 9, dilation 1, BatchNorm, PReLU activations, and dropout (0.75). The network processes a48 ×60multivariate time series, with temporal features mean-pooled and passed through a linear output layer. For in-hospital mortality predict...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.