Improved ICH classification using task-dependent learning

Amir Bar; Eldad Elnekave; Michal Mauda; Michal Safadi; Yoni Turner

arxiv: 1907.00148 · v1 · pith:ZEUTABRInew · submitted 2019-06-29 · 💻 cs.CV

Improved ICH classification using task-dependent learning

Amir Bar , Michal Mauda , Yoni Turner , Michal Safadi , Eldad Elnekave This is my paper

Pith reviewed 2026-05-25 13:04 UTC · model grok-4.3

classification 💻 cs.CV

keywords intracranial hemorrhagehead CTdeep learningsegmentationclassificationtask dependencytriageBloodNet

0 comments

The pith

BloodNet improves ICH classification on head CT by making segmentation and classification depend on each other.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BloodNet, a deep learning model for triaging head CT scans to detect intracranial hemorrhage. It treats segmentation and classification as linked tasks instead of independent ones, with the goal of producing more accurate classifications faster. Tests on over 1400 studies from more than 10 hospitals yield AUCs of 0.9493 on a positive-enriched set and 0.9566 on a random sample. These results match earlier work while using larger and more diverse data. A reader would care because quicker, reliable detection could shorten the time between scan and treatment for a critical emergency finding.

Core claim

BloodNet is a deep learning architecture designed for head CT triaging that incorporates dependency between the otherwise independent tasks of segmentation and classification. This linkage produces improved classification performance for intracranial hemorrhage detection, with reported AUCs of 0.9493 and 0.9566 on held-out positive-enriched and randomly sampled sets of over 1400 studies from over 10 hospitals.

What carries the argument

The BloodNet architecture, which enforces dependency between segmentation and classification outputs to improve the classification task.

If this is right

Classification accuracy for ICH rises when segmentation and classification share information.
The model can triage head CTs across data from multiple hospitals without retraining per site.
Results remain comparable to prior single-task methods even with larger and more varied test sets.
Task dependency offers a route to better performance on other paired medical imaging problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same linkage idea could be tested on other radiology tasks that combine localization and diagnosis.
Real-world deployment would require checking whether the AUCs hold when scans arrive in live emergency workflows.
Multi-hospital data reduces the chance that gains are tied to one scanner type or protocol.

Load-bearing premise

Any measured improvement in classification comes from the modeled dependency between the two tasks rather than from other modeling decisions or properties of the datasets.

What would settle it

An ablation that removes the dependency link, retrains on the same data, and shows no drop in AUC on the identical held-out sets would falsify the claim that the dependency drives the gain.

read the original abstract

Head CT is one of the most commonly performed imaging studied in the Emergency Department setting and Intracranial hemorrhage (ICH) is among the most critical and timesensitive findings to be detected on Head CT. We present BloodNet, a deep learning architecture designed for optimal triaging of Head CTs, with the goal of decreasing the time from CT acquisition to accurate ICH detection. The BloodNet architecture incorporates dependency between the otherwise independent tasks of segmentation and classification, achieving improved classification results. AUCs of 0.9493 and 0.9566 are reported on held out positive-enriched and randomly sampled sets comprised of over 1400 studies acquired from over 10 different hospitals. These results are comparable to previously reported results with smaller number of tagged studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BloodNet adds task dependency between segmentation and classification for ICH triage and reports solid AUCs on multi-hospital data, but the abstract supplies no ablations or split details to show the dependency is what drives the gains.

read the letter

The paper's core move is to build a network where segmentation and classification are not independent but linked, and they test it on head CT for intracranial hemorrhage. They get AUC 0.9493 on a positive-enriched held-out set and 0.9566 on a random one, both drawn from more than 1400 studies across over ten hospitals. That data scale and the two test regimes are the parts worth noticing right away. The architecture itself is presented as new relative to the cited prior work on ICH detection. Multi-site collection and the positive-enriched versus random split are practical choices that match real triage needs. The numbers sit in the same range as earlier reports that used fewer cases, so the advance is incremental rather than a jump in performance ceiling. The main gap is that nothing in the abstract shows the performance lift comes from the modeled dependency rather than other architecture or training decisions. No baseline comparisons or ablation results are mentioned, and the validation protocol is not described. If the train-test split was not strictly at the patient or study level across sites, leakage or site-specific effects could explain part of the result. The claim that task dependency is responsible therefore rests on unshown evidence. This work is aimed at groups already building medical imaging pipelines for emergency triage. Someone looking for concrete multi-task examples in radiology might pull the architecture description or the data-handling approach. It is not a foundational methods paper. The data volume and the clinical framing are enough to justify sending the full manuscript to referees rather than desk-rejecting it, provided the methods section actually contains the missing controls and split details.

Referee Report

3 major / 1 minor

Summary. The paper presents BloodNet, a deep learning architecture for triaging head CT scans that models dependency between the tasks of ICH segmentation and classification. It claims this dependency yields improved classification performance, reporting AUCs of 0.9493 on a positive-enriched held-out set and 0.9566 on a randomly sampled held-out set, each comprising over 1400 studies from more than 10 hospitals. These results are stated to be comparable to prior work using fewer labeled studies.

Significance. If the AUC gains are shown to arise specifically from the modeled task dependency (rather than other architectural or training choices) and the multi-hospital held-out evaluation is free of leakage, the work could support faster emergency ICH detection. The scale of the dataset across sites is a potential strength, but the current manuscript supplies no ablations, baselines, or protocol details that would allow assessment of whether the central innovation drives the reported numbers.

major comments (3)

[Abstract] Abstract: The claim that 'the BloodNet architecture incorporates dependency between the otherwise independent tasks of segmentation and classification, achieving improved classification results' is unsupported by any ablation, baseline model, or architectural comparison. No evidence is supplied that the reported AUCs of 0.9493/0.9566 are attributable to task dependency rather than other modeling decisions.
[Abstract] Abstract: No validation protocol details are provided (patient- vs. study-level splits, positive-enrichment procedure, or cross-site stratification), so it is impossible to assess leakage risk or selection effects in the held-out sets of >1400 studies from >10 hospitals.
[Abstract] Abstract: No statistical tests, confidence intervals, or comparison to independent-task baselines are reported, leaving the 'improved' claim unverified.

minor comments (1)

[Abstract] The abstract refers to 'previously reported results' without citations or quantitative comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We will revise the manuscript to address the concerns raised regarding the lack of supporting evidence for our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'the BloodNet architecture incorporates dependency between the otherwise independent tasks of segmentation and classification, achieving improved classification results' is unsupported by any ablation, baseline model, or architectural comparison. No evidence is supplied that the reported AUCs of 0.9493/0.9566 are attributable to task dependency rather than other modeling decisions.

Authors: The current manuscript presents the BloodNet architecture and reports the AUCs, but does not include the requested ablations. We will add baseline comparisons treating the tasks independently and other architectural variants in the revision to demonstrate that the performance gains are due to the modeled task dependency. revision: yes
Referee: [Abstract] Abstract: No validation protocol details are provided (patient- vs. study-level splits, positive-enrichment procedure, or cross-site stratification), so it is impossible to assess leakage risk or selection effects in the held-out sets of >1400 studies from >10 hospitals.

Authors: We agree that additional details are necessary. The revised manuscript will include a comprehensive description of the data splitting protocol, enrichment procedure, and measures taken to ensure no leakage across sites. revision: yes
Referee: [Abstract] Abstract: No statistical tests, confidence intervals, or comparison to independent-task baselines are reported, leaving the 'improved' claim unverified.

Authors: We will incorporate statistical significance tests, confidence intervals for the AUCs, and explicit comparisons to independent-task models in the updated version to verify the improvement. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on held-out data with no derivation chain

full rationale

The paper reports an empirical deep learning model (BloodNet) and measured AUCs (0.9493/0.9566) on held-out multi-hospital datasets. No equations, derivations, or first-principles claims appear; performance is presented as direct measurement rather than a reduction to fitted inputs or self-citations. The central claim of task dependency improving results is an empirical hypothesis tested on external data splits, not a self-definitional or fitted-input construction. This is the normal non-circular outcome for an applied ML paper whose validity rests on data and ablations rather than algebraic identity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available, so the full set of modeling assumptions, hyperparameters, and data-processing choices cannot be audited. The central claim rests on the unexamined premise that task dependency produces the reported AUC lift.

invented entities (1)

BloodNet no independent evidence
purpose: Deep learning architecture that couples segmentation and classification for ICH detection
New model name and design introduced in the abstract; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5657 in / 1317 out tokens · 61717 ms · 2026-05-25T13:04:58.981691+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

[1]

TextRay: Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays

Jonathan Laserson, Christine Dan Lantsman, Michal Cohen-Sfady, Itamar Tamir, Eli Goz, Chen Brestel, Shir Bar, Maya Atar, and Eldad Elnekave, “Textray: Mining clinical reports to gain a broad understanding of chest x-rays,” arXiv preprint arXiv:1806.02121, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Fully-convolutional deep-learning based system for coronary calcium score prediction from non-contrast chest ct,

Ran Shadmi, Victoria Mazo, Orna Bregman-Amitai, and Eldad Elnekave, “Fully-convolutional deep-learning based system for coronary calcium score prediction from non-contrast chest ct,” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on. IEEE, 2018, pp. 24–28

work page 2018
[3]

Radbot-cxr: Classi- ﬁcation of four clinical ﬁnding categories in chest x-ray using deep learning,

Chen Brestel, Ran Shadmi, Itamar Tamir, Michal Cohen-Sfaty, and Eldad Elnekave, “Radbot-cxr: Classi- ﬁcation of four clinical ﬁnding categories in chest x-ray using deep learning,” 2018

work page 2018
[4]

Compression fractures detection on ct,

Amir Bar, Lior Wolf, Orna Bergman Amitai, Eyal Toledano, and Eldad Elnekave, “Compression fractures detection on ct,” in Medical Imaging 2017: Computer- Aided Diagnosis . International Society for Optics and Photonics, 2017, vol. 10134, p. 1013440

work page 2017
[5]

Deep 3d convolution neu- ral network for ct brain hemorrhage classiﬁcation,

Kamal Jnawali, Mohammad R Arbabshirani, Navalgund Rao, and Alpen A Patel, “Deep 3d convolution neu- ral network for ct brain hemorrhage classiﬁcation,” in Medical Imaging 2018: Computer-Aided Diagnosis . In- ternational Society for Optics and Photonics, 2018, vol. 10575, p. 105751C

work page 2018
[6]

Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

Sasank Chilamkurthy, Rohit Ghosh, Swetha Tanamala, Mustafa Biviji, Norbert G Campeau, Vasantha Kumar Venugopal, Vidur Mahajan, Pooja Rao, and Prashant Warier, “Development and validation of deep learning algorithms for detection of critical ﬁndings in head ct scans,” arXiv preprint arXiv:1803.05854, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Radnet: Radi- ologist level accuracy using deep learning for hemor- rhage detection in ct scans,

Monika Grewal, Muktabh Mayank Srivastava, Pulkit Kumar, and Srikrishna Varadarajan, “Radnet: Radi- ologist level accuracy using deep learning for hemor- rhage detection in ct scans,” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on. IEEE, 2018, pp. 281–284

work page 2018
[8]

3d convolutional neural networks for human action recog- nition,

Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu, “3d convolutional neural networks for human action recog- nition,” IEEE transactions on pattern analysis and ma- chine intelligence, vol. 35, no. 1, pp. 221–231, 2013

work page 2013
[9]

U-net: Convolutional networks for biomedical image segmentation,

Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Med- ical image computing and computer-assisted interven- tion. Springer, 2015, pp. 234–241

work page 2015
[10]

Long short- term memory,

Sepp Hochreiter and J ¨urgen Schmidhuber, “Long short- term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997
[11]

Mask r-cnn,

Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Girshick, “Mask r-cnn,” in Computer Vision (ICCV), 2017 IEEE International Conference on . IEEE, 2017, pp. 2980–2988

work page 2017
[12]

Deep residual learning for image recognition,

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016

[1] [1]

TextRay: Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays

Jonathan Laserson, Christine Dan Lantsman, Michal Cohen-Sfady, Itamar Tamir, Eli Goz, Chen Brestel, Shir Bar, Maya Atar, and Eldad Elnekave, “Textray: Mining clinical reports to gain a broad understanding of chest x-rays,” arXiv preprint arXiv:1806.02121, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Fully-convolutional deep-learning based system for coronary calcium score prediction from non-contrast chest ct,

Ran Shadmi, Victoria Mazo, Orna Bregman-Amitai, and Eldad Elnekave, “Fully-convolutional deep-learning based system for coronary calcium score prediction from non-contrast chest ct,” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on. IEEE, 2018, pp. 24–28

work page 2018

[3] [3]

Radbot-cxr: Classi- ﬁcation of four clinical ﬁnding categories in chest x-ray using deep learning,

Chen Brestel, Ran Shadmi, Itamar Tamir, Michal Cohen-Sfaty, and Eldad Elnekave, “Radbot-cxr: Classi- ﬁcation of four clinical ﬁnding categories in chest x-ray using deep learning,” 2018

work page 2018

[4] [4]

Compression fractures detection on ct,

Amir Bar, Lior Wolf, Orna Bergman Amitai, Eyal Toledano, and Eldad Elnekave, “Compression fractures detection on ct,” in Medical Imaging 2017: Computer- Aided Diagnosis . International Society for Optics and Photonics, 2017, vol. 10134, p. 1013440

work page 2017

[5] [5]

Deep 3d convolution neu- ral network for ct brain hemorrhage classiﬁcation,

Kamal Jnawali, Mohammad R Arbabshirani, Navalgund Rao, and Alpen A Patel, “Deep 3d convolution neu- ral network for ct brain hemorrhage classiﬁcation,” in Medical Imaging 2018: Computer-Aided Diagnosis . In- ternational Society for Optics and Photonics, 2018, vol. 10575, p. 105751C

work page 2018

[6] [6]

Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

Sasank Chilamkurthy, Rohit Ghosh, Swetha Tanamala, Mustafa Biviji, Norbert G Campeau, Vasantha Kumar Venugopal, Vidur Mahajan, Pooja Rao, and Prashant Warier, “Development and validation of deep learning algorithms for detection of critical ﬁndings in head ct scans,” arXiv preprint arXiv:1803.05854, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Radnet: Radi- ologist level accuracy using deep learning for hemor- rhage detection in ct scans,

Monika Grewal, Muktabh Mayank Srivastava, Pulkit Kumar, and Srikrishna Varadarajan, “Radnet: Radi- ologist level accuracy using deep learning for hemor- rhage detection in ct scans,” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on. IEEE, 2018, pp. 281–284

work page 2018

[8] [8]

3d convolutional neural networks for human action recog- nition,

Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu, “3d convolutional neural networks for human action recog- nition,” IEEE transactions on pattern analysis and ma- chine intelligence, vol. 35, no. 1, pp. 221–231, 2013

work page 2013

[9] [9]

U-net: Convolutional networks for biomedical image segmentation,

Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Med- ical image computing and computer-assisted interven- tion. Springer, 2015, pp. 234–241

work page 2015

[10] [10]

Long short- term memory,

Sepp Hochreiter and J ¨urgen Schmidhuber, “Long short- term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997

[11] [11]

Mask r-cnn,

Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Girshick, “Mask r-cnn,” in Computer Vision (ICCV), 2017 IEEE International Conference on . IEEE, 2017, pp. 2980–2988

work page 2017

[12] [12]

Deep residual learning for image recognition,

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016