pith. sign in

arxiv: 2607.01039 · v1 · pith:FTFXWXLLnew · submitted 2026-07-01 · 💻 cs.CV · cs.AI

EchoRisk: A Multicentre Echocardiography Dataset and Benchmark for Cardio-Oncology

Pith reviewed 2026-07-02 13:56 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords echocardiographycardiotoxicitycardio-oncologydatasetbenchmarkmachine learningleft ventricular ejection fractionbreast cancer
0
0 comments X

The pith

EchoRisk supplies the first multicentre echocardiography dataset with explicit cardiotoxicity labels for breast cancer patients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EchoRisk as the first curated multicentre longitudinal echocardiography dataset drawn from 422 patients across five European sites. It supplies 2159 videos from 1123 exams at up to five time points plus a dedicated 280-patient baseline cohort. Three tasks are defined: ejection-fraction estimation from cine loops, LV-dysfunction classification from longitudinal sequences, and early cardiotoxicity prediction from pre-therapy scans alone. Baselines using an R(2+1)D+LSTM model succeed on the first two tasks yet leave the third task open, positioning the public dataset and evaluation protocol as a shared benchmark for cardio-oncology imaging research.

Core claim

EchoRisk is the first curated, multicentre, longitudinal echocardiography dataset with explicit cardiotoxicity labels from the CARDIOCARE study, released to support three tasks where video models achieve strong performance on ejection-fraction estimation and dysfunction classification but early prediction from a single pre-therapy video remains a significant open problem.

What carries the argument

The EchoRisk dataset of 2159 videos from 422 patients across five sites, together with the three clinically defined tasks and their evaluation protocols.

If this is right

  • Models can now be trained and compared on a standardised multicentre set for ejection-fraction estimation from echocardiography videos.
  • Longitudinal sequences enable classification of left-ventricular dysfunction with the supplied baseline performance as reference.
  • Early cardiotoxicity prediction from baseline scans alone is established as an unsolved problem requiring new methods.
  • Public code and data release supports direct comparison of future task-specific architectures in cardio-oncology.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Wider adoption of the benchmark could accelerate development of tools that flag cardiotoxicity risk before therapy begins, reducing unplanned treatment interruptions.
  • Validation studies comparing the supplied labels against independent expert panels would be needed to confirm cross-site consistency.
  • Adding data from non-European populations would test whether performance generalises beyond the current cohort.

Load-bearing premise

The cardiotoxicity labels assigned at the five sites are clinically accurate and applied consistently.

What would settle it

Independent clinical re-review of the labels showing substantial disagreement with the provided annotations, or a different baseline architecture achieving markedly higher accuracy on the early-prediction task.

Figures

Figures reproduced from arXiv: 2607.01039 by Anastasia Constantinidou, Andri Papakonstantinou, Dimitrios Fotiadis, Dorothea Tsekoura, Georgia Karanasiou, Georgios Manikis, Grigorios Kalliatakis, Kalliopi Keramida, Katerina Naka, Kostas Marias, Lampros Lakkas, Manolis Tsiknakis, Vasileios Bouratzis.

Figure 1
Figure 1. Figure 1: Clinical formulation of the EchoRisk-MICCAI 2026 challenge tasks. Task 1 fo￾cuses on cross-sectional estimation of cardiac function at any given timepoint. Task 2 tracks these parameters across multiple timepoints to classify longitudinal LV dys￾function. Task 3 utilises only the baseline (pre-therapy) echocardiogram to predict the future risk of developing cardiotoxicity. dataset, its annotation protocol,… view at source ↗
Figure 2
Figure 2. Figure 2: Representative echocardiography frames from a single patient (ECHO￾RISK_0001, baseline T1 visit). Each row shows six of 32 evenly sampled frames span￾ning one full cardiac cycle. Top pair: apical four-chamber (A4C) view. Bottom pair: apical two-chamber (A2C) view. Within each pair, raw DICOM frames (native resolu￾tion, upper row) are shown alongside preprocessed frames (lower row) after resizing to 112×112… view at source ↗
Figure 3
Figure 3. Figure 3: EchoRisk baseline architecture. A shared R2+1D ResNet-18 backbone (Kinetics-400 pretrained [14]) encodes a 32-frame echocardiography clip into a tem￾poral feature sequence, which a single-layer LSTM aggregates into a fixed-length rep￾resentation. Three task-specific linear heads branch from the LSTM terminal hidden state for LVEF regression (Task 1), LV dysfunction classification (Task 2), and car￾diotoxic… view at source ↗
Figure 4
Figure 4. Figure 4: Task 3 receiver operating characteristic (ROC) curves evaluated on the test set. The dual-view model enhanced with test-time augmentation (TTA-10) achieves the highest discriminative performance (AUC 0.541). The shaded region indicates the clinical target false positive rate (FPR) of 0.10–0.20, highlighting the operating con￾straints necessary for viable early risk screening [PITH_FULL_IMAGE:figures/full_… view at source ↗
read the original abstract

Therapy-induced cardiotoxicity is the leading non-oncological cause of treatment interruption in breast cancer patients, yet early, automated risk stratification from routine cardiac imaging remains an unsolved problem. We present EchoRisk, the first curated, multicentre, longitudinal echocardiography dataset with explicit cardiotoxicity labels, released as the primary technical reference for the EchoRisk-MICCAI 2026 challenge. The dataset comprises 422 patients enrolled in the EU-funded CARDIOCARE prospective study across five European sites, yielding 2,159 echocardiography videos across 1,123 clinical exams acquired at up to five longitudinal timepoints, alongside a dedicated cohort of 280 patients with baseline imaging for early cardiotoxicity prediction. Three clinically grounded tasks are defined: automated estimation of left ventricular ejection fraction from cine video (Task 1), classification of LV dysfunction from longitudinal imaging (Task 2), and early prediction of therapy-induced cardiotoxicity from pre-therapy baseline echocardiography alone (Task 3). For each task we specify the evaluation protocol, primary and secondary metrics, and ranking procedure. We establish baseline performance using an R(2+1)D video backbone with LSTM aggregation trained from Kinetics-400 pretrained weights, demonstrating strong discriminative performance for cardiac functional assessment and LV dysfunction classification, while early cardiotoxicity prediction from a single pre-therapy video remains a significant open problem for the community. The dataset, evaluation code, and baseline implementations are publicly available to serve as a benchmark for further collaboration, comparison, and the creation of task-specific architectures in cardio-oncology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces EchoRisk, a multicentre longitudinal echocardiography dataset from the CARDIOCARE study (422 patients, 2159 videos from 1123 exams across five European sites, plus a 280-patient baseline cohort). It defines three tasks for the EchoRisk-MICCAI 2026 challenge—Task 1: automated LVEF estimation from cine video; Task 2: classification of LV dysfunction from longitudinal imaging; Task 3: early prediction of therapy-induced cardiotoxicity from pre-therapy baseline video alone—along with evaluation protocols, metrics, and ranking procedures. Baselines using an R(2+1)D video backbone with LSTM aggregation (Kinetics-400 pretrained) are reported, showing strong performance on functional assessment and dysfunction classification but highlighting early prediction as an open problem. The dataset, code, and baselines are released publicly.

Significance. If the cardiotoxicity labels prove clinically accurate and consistent, this would be the first publicly available multicentre echo dataset with explicit cardiotoxicity annotations for breast cancer patients, directly supporting development of automated early-risk tools in cardio-oncology where such data have been scarce. The release as a challenge benchmark with defined tasks, metrics, and reproducible baselines strengthens its utility for community progress on an unsolved clinical problem.

major comments (1)
  1. [Abstract] Abstract: The central claim that EchoRisk supplies 'explicit cardiotoxicity labels' and 'clinically grounded' tasks is load-bearing for the benchmark's validity (especially Tasks 2 and 3), yet no definition of cardiotoxicity (e.g., specific LVEF drop threshold, timing relative to therapy, or composite clinical events), no adjudication process, no exclusion criteria, and no inter-site agreement statistics are supplied. Without these, reported baseline discriminative performance and the conclusion that early prediction remains open cannot be verified or reproduced.
minor comments (1)
  1. [Abstract] The abstract states patient numbers and video counts but does not specify the exact primary/secondary metrics or ranking procedure for each task; these should be enumerated explicitly even if detailed later.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need for explicit details on cardiotoxicity label definitions to support the benchmark's validity. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that EchoRisk supplies 'explicit cardiotoxicity labels' and 'clinically grounded' tasks is load-bearing for the benchmark's validity (especially Tasks 2 and 3), yet no definition of cardiotoxicity (e.g., specific LVEF drop threshold, timing relative to therapy, or composite clinical events), no adjudication process, no exclusion criteria, and no inter-site agreement statistics are supplied. Without these, reported baseline discriminative performance and the conclusion that early prediction remains open cannot be verified or reproduced.

    Authors: We agree this information is essential for reproducibility and clinical grounding of Tasks 2 and 3. The labels derive from the CARDIOCARE study protocol, but the current manuscript does not detail the exact criteria (e.g., LVEF thresholds, timing, or composite events), adjudication, exclusions, or inter-rater/site agreement. In the revision we will add a dedicated subsection in Methods describing: the precise cardiotoxicity definition used (including any LVEF drop thresholds and timing relative to therapy), the adjudication process, exclusion criteria applied, and any available inter-site agreement statistics. If certain statistics were not collected in the original study we will explicitly note this as a limitation. These additions will allow readers to verify the baselines and the claim that early prediction (Task 3) remains challenging. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release and benchmark with no derivations or self-referential claims

full rationale

The paper is a data release and benchmark definition for the EchoRisk-MICCAI 2026 challenge. It describes the CARDIOCARE-derived dataset, defines three tasks (LVEF estimation, LV dysfunction classification, early cardiotoxicity prediction), specifies evaluation protocols, and reports baseline results from a standard R(2+1)D+LSTM model pretrained on Kinetics-400. No equations, fitted parameters presented as predictions, self-citations used for uniqueness theorems, ansatzes, or renamings of known results appear in the provided text. The central claims rest on external clinical study data and standard ML baselines rather than any internal derivation chain that reduces to its own inputs. This is the expected non-finding for a benchmark paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical dataset and benchmark paper. No mathematical model, free parameters, axioms, or invented entities are introduced or required.

pith-pipeline@v0.9.1-grok · 5886 in / 1031 out tokens · 29388 ms · 2026-07-02T13:56:14.870463+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 4 canonical work pages

  1. [1]

    Circula- tion131(22), 1981–1988 (2015)

    Cardinale, D., Colombo, A., Bacchiani, G., Tedeschi, I., Meroni, C.A., Veglia, F., Civelli, M., Lamantia, G., Colombo, N., Curigliano, G., et al.: Early detection of anthracycline cardiotoxicity and improvement with heart failure therapy. Circula- tion131(22), 1981–1988 (2015)

  2. [2]

    CARDIOCARE Consortium: An interdisciplinary approach for the management of the elderly multimorbid patient with breast cancer therapy induced cardiac toxicity.https://cordis.europa.eu/project/id/945175(2021), eU Horizon 2020, Grant Agreement No. 945175

  3. [3]

    npj Digital Medicine3(1), 10 (2020).https://doi.org/10.1038/s41746-019-0216-8

    Ghorbani, A., Ouyang, D., Abid, A., He, B., Chen, J.H., Harrington, R.A., Liang, D.H., Ashley, E.A., Zou, J.Y.: Deep learning interpretation of echocardiograms. npj Digital Medicine3(1), 10 (2020).https://doi.org/10.1038/s41746-019-0216-8

  4. [4]

    European Heart Journal-Cardiovascular Imaging 16(3), 233–271 (2015)

    Lang, R.M., Badano, L.P., Mor-Avi, V., Afilalo, J., Armstrong, A., Ernande, L., Flachskampf, F.A., Foster, E., Goldstein, S.A., Kuznetsova, T., et al.: Recom- mendations for cardiac chamber quantification by echocardiography in adults: an update from the american society of echocardiography and the european associ- ation of cardiovascular imaging. Europea...

  5. [5]

    IEEE Transactions on Medical Imaging 38(9), 2198–2210 (2019).https://doi.org/10.1109/TMI.2019.2900516 16 Grigorios Kalliatakis et al

    Leclerc, S., Smistad, E., Pedrosa, J., Østvik, A., Cervenansky, F., Espinosa, F., Espeland, T., Berg, E.A.R., Jodoin, P.M., Grenier, T., Lartizien, C., D’hooge, J., Lovstakken, L., Bernard, O.: Deep learning for segmentation using an open large-scale dataset in 2D echocardiography. IEEE Transactions on Medical Imaging 38(9), 2198–2210 (2019).https://doi.o...

  6. [6]

    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection.In:ProceedingsoftheIEEEinternationalconferenceoncomputervision. pp. 2980–2988 (2017)

  7. [7]

    European Journal of Heart Failure22(11), 1945–1960 (2020).https: //doi.org/10.1002/ejhf.1920

    Lyon, A.R., Dent, S., Stanway, S., Earl, H., Brezden-Masley, C., Cohen-Solal, A., Tocchetti, C.G., Moslehi, J.J., Melloni, C., Herrmann, J., et al.: Baseline car- diovascular risk assessment in cancer patients scheduled to receive cardiotoxic cancer therapies: A position statement and new risk assessment tools from the Cardio-Oncology study group of the H...

  8. [8]

    European Heart Journal-Cardiovascular Imaging23(10), e333–e465 (2022)

    Lyon, A.R., Lopez-Fernandez, T., Couch, L.S., Asteggiano, R., Aznar, M.C., Bergler-Klein, J., Boriani, G., Cardinale, D., Cordoba, R., Cosyns, B., et al.: 2022 esc guidelines on cardio-oncology developed in collaboration with the european hematology association (eha), the european society for therapeutic radiology and oncology (estro) and the internationa...

  9. [9]

    European Heart Journal- Cardiovascular Imaging26(Supplement_1), jeae333–028 (2025)

    Manikis, G., Kalliatakis, G., Marias, K., Bouratzis, V., Lakkas, L., Naka, A., Karanasiou, G., Tsekoura, D., Kampouroglou, E., Keramida, K., et al.: Asso- ciation of echocardiographic radiomics-based features with cardiotoxicity effect in breast cancer patients from the cardiocare project. European Heart Journal- Cardiovascular Imaging26(Supplement_1), je...

  10. [10]

    In: Proceedings of the AAAI conference on artificial intelligence

    Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabili- ties using bayesian binning. In: Proceedings of the AAAI conference on artificial intelligence. vol. 29 (2015)

  11. [11]

    Nature580(7802), 252–256 (2020)

    Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C.P., Heiden- reich, P.A., Harrington, R.A., Liang, D.H., Ashley, E.A., et al.: Video-based ai for beat-to-beat assessment of cardiac function. Nature580(7802), 252–256 (2020)

  12. [12]

    Steyerberg, E.W., et al.: Clinical prediction models, vol. 201. Springer (2019)

  13. [13]

    Journal of the American College of Cardiology61(1), 77–84 (2013)

    Thavendiranathan, P., Grant, A.D., Negishi, T., Plana, J.C., Popović, Z.B., Mar- wick, T.H.: Reproducibility of echocardiographic techniques for sequential assess- mentofleftventricularejectionfractionandvolumes:applicationtopatientsunder- going cancer chemotherapy. Journal of the American College of Cardiology61(1), 77–84 (2013)

  14. [14]

    In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition

    Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. pp. 6450–6459 (2018)

  15. [15]

    European Heart Journal37(36), 2768–2801 (2016).https://doi.org/10.1093/eurheartj/ehw211

    Zamorano, J.L., Lancellotti, P., Rodriguez Muñoz, D., Aboyans, V., Asteggiano, R., Galderisi, M., Habib, G., Lenihan, D.J., Lip, G.Y.H., Lyon, A.R., Lopez Fer- nandez, T., Mohty, D., Piepoli, M.F., Tamargo, J., Torbicki, A., Suter, T.M., ESC Scientific Document Group: 2016 ESC position paper on cancer treatments and cardiovascular toxicity developed under...