arxiv: 2604.11927 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

A Workflow to Efficiently Generate Dense Tissue Ground Truth Masks for Digital Breast Tomosynthesis

Bruno Barufaldi, Guilherme Muniz de Oliveira, Juhun Lee, Luana de Mero Omena, Margarita Zuley, Oleg Kruglov, Robert Nishikawa, Tamerlan Mustafaev, Vitor de Sousa Franca

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords digital breast tomosynthesisdense tissue segmentationground truth generationsemi-automated annotationfibroglandular tissueDBTDice similarity coefficient

0 comments

The pith

Annotating only the central slice generates accurate dense tissue masks for entire DBT volumes

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a workflow that lets users create binary masks of fibroglandular dense tissue in digital breast tomosynthesis by annotating only the central slice of each volume. A user draws a rough region of interest around the dense tissue and picks a threshold on that middle slice. The algorithm projects the region to every other slice and iteratively tweaks the threshold per slice to keep the masks consistent through the full 3D volume. This approach cuts annotation time and labor while still producing results that match radiologist manual segmentations at a median Dice score of 0.83 across 176 slices from 44 volumes. The method targets the shortage of labeled data that currently limits development of models for personalized breast cancer risk estimation.

Core claim

The framework enables a user to outline a rough ROI enclosing dense tissue on the central reconstructed slice of a DBT volume and select a segmentation threshold. The algorithm then projects the ROI to the remaining slices and iteratively adjusts slice-specific thresholds to maintain consistent dense tissue delineation across the DBT volume. Evaluation on 44 volumes from the DBTex dataset yields a median Dice score of 0.83 against manual segmentations on 176 slices and shows inter-reader agreement of 0.84.

What carries the argument

Projection of a central-slice rough ROI combined with iterative per-slice threshold adjustment, which propagates the initial annotation while enforcing consistency across the reconstructed volume.

If this is right

Annotation labor is confined to a single central slice rather than the entire stack of slices.
The generated masks reach a median Dice overlap of 0.83 with expert manual labels.
This performance level is comparable to the 0.84 median Dice agreement observed between two radiologists.
Much larger collections of labeled DBT data become practical to produce for training segmentation algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same central-slice projection idea could transfer to other volumetric medical imaging tasks that need tissue masks.
Integration into annotation software would let radiologists generate usable ground truth far faster than full manual outlining.
Further automation of the initial threshold selection step could reduce user involvement even more.
Testing the workflow on broader screening populations would reveal whether slice-to-slice consistency holds outside the current evaluation set.

Load-bearing premise

Projecting the central-slice ROI and iteratively adjusting thresholds slice by slice will produce accurate dense tissue boundaries on non-central slices without introducing major errors or inconsistencies.

What would settle it

Full manual segmentation of every slice in a new set of DBT volumes, followed by calculation of Dice scores between those complete labels and the workflow outputs to check whether agreement stays near 0.83 or falls sharply away from the center.

Figures

Figures reproduced from arXiv: 2604.11927 by Bruno Barufaldi, Guilherme Muniz de Oliveira, Juhun Lee, Luana de Mero Omena, Margarita Zuley, Oleg Kruglov, Robert Nishikawa, Tamerlan Mustafaev, Vitor de Sousa Franca.

**Figure 1.** Figure 1: Proposed workflow for dense tissue mask generation. A GUI was developed to implement this workflow. 2.3 Breast density masks generation using the proposed framework In this study, Radiologist 1 annotated the central slice of each DBT study by creating a polygon mask and selecting an appropriate threshold value for that slice. Using our proposed method (section 2.2), we then generated a comprehensive densit… view at source ↗

**Figure 2.** Figure 2: Boxplots showing patient-level Dice values stratified by Visual Assessment of Density (VAD). Boxes represent the interquartile range (Q1-Q3; IQR), the horizontal line within each box indicates the median, and overlaid points correspond to individual case Dice values. The whiskers show 1.5 x IQR. For slice-based validation ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Boxplots showing patient-level Dice values comparing manual segmentations and the proposed method at the 20th and 80th slice indices, stratified by Visual Assessment of Density (VAD). Boxes represent the interquartile range (Q1–Q3; IQR), the horizontal line within each box indicates the median, and overlaid points correspond to individual case Dice values. The whiskers show 1.5 x IQR. Discussion Our result… view at source ↗

**Figure 4.** Figure 4: Example segmentation results from two radiologists for 8 different patients are shown. The figure demonstrates strong agreement across four different VAD values. For consistency, the 20th slice from each volume is displayed. The volume-wise Dice scores for each pair are shown in the upper right corner [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Shows outlier case from a single patient (VAD < 25%). For consistency, the 20th slice from each volume is displayed. The volume-wise Dice values for each pair are shown in the upper right corner. The Dice value for the LMLO view was lower than for the other views from the same patient. Furthermore, Radiologist 2 segmented a greater amount of tissue in the LMLO view compared to the other views, suggesting a… view at source ↗

read the original abstract

Digital breast tomosynthesis (DBT) is now the standard of care for breast cancer screening in the USA. Accurate segmentation of fibroglandular tissue in DBT images is essential for personalized risk estimation, but algorithm development is limited by scarce human-delineated training data. In this study we introduce a time- and labor-saving framework to generate a human-annotated binary segmentation mask for dense tissue in DBT. Our framework enables a user to outline a rough region of interest (ROI) enclosing dense tissue on the central reconstructed slice of a DBT volume and select a segmentation threshold to generate the dense tissue mask. The algorithm then projects the ROI to the remaining slices and iteratively adjusts slice-specific thresholds to maintain consistent dense tissue delineation across the DBT volume. By requiring annotation only on the central slice, the framework substantially reduces annotation time and labor. We used 44 DBT volumes from the DBTex dataset for evaluation. Inter-reader agreement was assessed by computing patient-wise Dice similarity coefficients between segmentation masks produced by two radiologists, yielding a median of 0.84. Accuracy of the proposed method was evaluated by having a radiologist manually segment the 20th and 80th percentile slices from each volume (CC and MLO views; 176 slices total) and calculate Dice scores between the manual and proposed segmentations, yielding a median of 0.83.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The workflow cuts annotation labor for DBT dense tissue masks by limiting manual work to the central slice and achieves reasonable Dice scores, but validation is restricted to only the 20th and 80th percentile slices.

read the letter

The main thing here is a practical workflow that lets a user draw a rough ROI on the central DBT slice, pick a threshold, and then project it while tweaking per-slice thresholds to generate dense tissue masks across the volume. They report a median Dice of 0.83 against manual labels on 176 slices and 0.84 between two radiologists, using 44 volumes from the DBTex dataset. This directly targets the shortage of labeled data for breast density models in tomosynthesis, which matters for risk estimation work in oncology imaging. The approach is a straightforward extension of existing semi-automated segmentation ideas, applied to the slice-by-slice nature of DBT reconstructions, and the inter-reader baseline is a sensible check. It does well at showing how central-slice focus can reduce effort without obvious collapse in the tested cases. The soft spots are clear and worth noting. Validation stays limited to the 20th and 80th percentile slices, so there is no direct evidence on edge slices or full-volume consistency when tissue patterns shift away from the center. The iterative threshold adjustment rule itself is not described in enough detail to judge reproducibility or failure cases. The dataset is modest, which fits an early practical paper but leaves broader claims thin. This is for researchers building or training segmentation models in breast imaging who need faster ground-truth generation. It is not a theoretical advance, but the problem it solves is real enough that a serious referee could usefully push for fuller evaluation details and clearer algorithm steps. I would send it to peer review rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The paper introduces a workflow for efficiently creating dense tissue ground truth masks in DBT volumes. Users annotate a rough ROI and select a threshold only on the central slice; the method projects this ROI to other slices and iteratively adjusts per-slice thresholds for consistent segmentation. On 44 DBT volumes, it reports a median Dice score of 0.83 against manual annotations on 176 selected slices and 0.84 inter-reader agreement.

Significance. If the approach proves robust across entire volumes, it would meaningfully lower the barrier to creating large annotated datasets for training dense tissue segmentation models in DBT, supporting personalized breast cancer risk assessment. The close alignment with inter-reader variability (0.84) is a positive indicator of practical utility, and the direct empirical comparison to manual labels on a non-trivial number of slices provides moderate evidence of feasibility.

major comments (2)

[Evaluation] Evaluation: Accuracy is assessed only via Dice scores on the 20th and 80th percentile slices (176 slices total across CC/MLO views), with no reported quantitative checks on edge slices, boundary smoothness across the volume, or cases where tissue distribution deviates from the central slice. This sampling directly limits support for the central claim of consistent dense tissue delineations across the full DBT volume.
[Methods] Methods: The iterative adjustment algorithm for slice-specific thresholds is described only at a high level without the exact rule, convergence criteria, or handling of variations away from the central slice. This omission prevents assessment of reproducibility and whether the 0.83 median Dice generalizes without systematic errors in non-central slices.

minor comments (1)

[Abstract] Abstract: Inclusion criteria for the 44 DBT volumes from DBTex and the split between CC and MLO views could be stated more explicitly to allow better assessment of generalizability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments on our manuscript. We address each major comment point by point below, providing clarifications and indicating the revisions we will make to improve the paper.

read point-by-point responses

Referee: [Evaluation] Evaluation: Accuracy is assessed only via Dice scores on the 20th and 80th percentile slices (176 slices total across CC/MLO views), with no reported quantitative checks on edge slices, boundary smoothness across the volume, or cases where tissue distribution deviates from the central slice. This sampling directly limits support for the central claim of consistent dense tissue delineations across the full DBT volume.

Authors: We agree that evaluating only the 20th and 80th percentile slices provides limited direct quantitative evidence for consistency on edge slices or in cases of atypical tissue distribution, and that this constrains the strength of the claim for full-volume consistency. The sampling strategy was chosen because full manual annotation of entire volumes is prohibitively time-consuming (precisely the problem our workflow addresses), while the 20th/80th percentiles still represent slices distant from the central annotated slice. The median Dice of 0.83 being nearly identical to inter-reader agreement (0.84) offers indirect support for practical utility. In the revised manuscript we will (i) explicitly acknowledge this as a limitation in the Discussion, (ii) add qualitative visualizations of full-volume segmentations across multiple cases to demonstrate boundary smoothness and handling of edge slices, and (iii) report quantitative Dice scores on edge slices for a small additional subset of volumes if radiologist time permits. revision: partial
Referee: [Methods] Methods: The iterative adjustment algorithm for slice-specific thresholds is described only at a high level without the exact rule, convergence criteria, or handling of variations away from the central slice. This omission prevents assessment of reproducibility and whether the 0.83 median Dice generalizes without systematic errors in non-central slices.

Authors: We accept this criticism; the original Methods section intentionally kept the description concise but thereby omitted necessary implementation details. In the revision we will expand the Methods to include: the precise iterative rule (how the per-slice threshold is updated based on projected ROI statistics), the convergence criterion (e.g., change in segmented area or intensity threshold below a fixed epsilon), and explicit handling of tissue-distribution deviations (e.g., fallback to neighboring-slice thresholds or manual override). We will also supply pseudocode for the full procedure to enable reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: procedural workflow with direct empirical validation

full rationale

The paper presents a user-assisted procedural workflow for generating dense-tissue masks in DBT volumes: a radiologist draws a rough ROI and chooses a threshold on the central slice only; the algorithm then projects the ROI and iteratively adjusts per-slice thresholds. All performance claims (median Dice 0.83 vs. manual labels on 176 slices, inter-reader Dice 0.84) rest on direct comparison to independently drawn manual segmentations rather than any mathematical derivation, fitted parameter, or self-referential prediction. No equations, uniqueness theorems, or ansatzes appear; the method is not claimed to be derived from first principles. Consequently there is no load-bearing step that reduces to its own inputs by construction. The evaluation design (limited to 20th/80th-percentile slices) raises questions of coverage but does not constitute circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that intensity thresholding after ROI projection can maintain consistent delineations and on user-chosen thresholds as inputs; no free parameters are fitted to data and no new entities are postulated.

free parameters (1)

user-selected segmentation threshold
Chosen manually by the user for each volume to separate dense tissue; directly affects the output mask but is not derived or fitted from data.

axioms (1)

domain assumption Dense tissue appearance is sufficiently consistent across slices that a projected ROI plus per-slice threshold tweaks can produce accurate masks
Invoked in the description of the projection and iterative adjustment steps.

pith-pipeline@v0.9.0 · 5589 in / 1436 out tokens · 68778 ms · 2026-05-10T16:33:00.119560+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 6 canonical work pages

[1]

It provides quasi three-dimensional (3D) breast tissue characterization and can improve lesion detection compared with conventional 2D mammography [1], [2]

Background Digital breast tomosynthesis (DBT) is increasingly used for breast cancer screening in the US. It provides quasi three-dimensional (3D) breast tissue characterization and can improve lesion detection compared with conventional 2D mammography [1], [2]. High breast density is associated with increased breast cancer risk [3], [4]. Lack of publicly...
[2]

for-presentation

Methods 2.1 Dataset This study utilized the DBTex dataset [5] which includes DBT volumes of 5,060 patients acquired by Hologic systems, to develop and evaluate the proposed time and labor saving framework. We randomly selected 176 “for-presentation” views (88 R/L CC and 88 R/L MLO) of 44 patients with BI-RADS 1 (normal) assessment for this study. 2.2 Prop...
[3]

The user imports the complete DBT series through the GUI, which automatically displays the central slice for initial interaction (Fig

Load the DBT volume . The user imports the complete DBT series through the GUI, which automatically displays the central slice for initial interaction (Fig. 1a)
[4]

Using a polygon-drawing tool, the user delineates the fibroglandular area on the central slice (Fig

Define the dense-tissue region. Using a polygon-drawing tool, the user delineates the fibroglandular area on the central slice (Fig. 1b). The GUI enables fine -tuning of the contour and excludes irrelevant regions such as the skin fold, nipple, pectoral muscle, or lymph nodes
[5]

The pixel values in each slice are normalized to a [0-1] range to ensure consistent thresholding across the stack in the subsequent steps

Normalize image intensities. The pixel values in each slice are normalized to a [0-1] range to ensure consistent thresholding across the stack in the subsequent steps
[6]

Within the GUI (Fig

Interactive threshold selection . Within the GUI (Fig. 1c), the user adjusts a slider to select the threshold that best separates dense from fatty tissue inside the polygon mask. The system provides real-time visual feedback and quantitative readouts of the segmented dense area on the central slice (i.e., the measurements). Once confirmed, this initial th...
[7]

ground truth

Iterative propagation and optimization. The algorithm propagates the polygon mask and thresholds the data iteratively to all slices in the DBT volume (Fig. 1d). For each slice, an automated search determines the threshold that yields a segmented dense -tissue area most consistent with the reference area from the central slice. The resulting 3D mask repres...

2024
[8]

Effectiveness of Digital Breast Tomosynthesis Compared With Digital Mammography: Outcomes Analysis From 3 Years of Breast Cancer Screening,

E. S. McDonald, A. Oustimov, S. P. Weinstein, M. B. Synnestvedt, M. Schnall, and E. F. Conant, “Effectiveness of Digital Breast Tomosynthesis Compared With Digital Mammography: Outcomes Analysis From 3 Years of Breast Cancer Screening,” JAMA Oncol., vol. 2, no. 6, pp. 737–743, Jun. 2016, doi: 10.1001/jamaoncol.2015.5536

work page doi:10.1001/jamaoncol.2015.5536 2016
[9]

Association of Digital Breast Tomosynthesis vs Digital Mammography With Cancer Detection and Recall Rates by Age and Breast Density,

E. F. Conant et al., “Association of Digital Breast Tomosynthesis vs Digital Mammography With Cancer Detection and Recall Rates by Age and Breast Density,” JAMA Oncol., vol. 5, no. 5, pp. 635–642, May 2019, doi: 10.1001/jamaoncol.2018.7078

work page doi:10.1001/jamaoncol.2018.7078 2019
[10]

Mammographic Density and the Risk and Detection of Breast Cancer | New England Journal of Medicine

“Mammographic Density and the Risk and Detection of Breast Cancer | New England Journal of Medicine.” Accessed: Mar. 06, 2026. [Online]. Available: https://www.nejm.org/doi/full/10.1056/NEJMoa062790

work page doi:10.1056/nejmoa062790 2026
[11]

Using clinical factors and mammographic breast density to estimate breast cancer risk: development and validation of a new predictive model,

J. A. Tice, S. R. Cummings, R. Smith-Bindman, L. Ichikawa, W. E. Barlow, and K. Kerlikowske, “Using clinical factors and mammographic breast density to estimate breast cancer risk: development and validation of a new predictive model,” Ann. Intern. Med., vol. 148, no. 5, pp. 337–347, Mar. 2008, doi: 10.7326/0003-4819-148-5-200803040-00004

work page doi:10.7326/0003-4819-148-5-200803040-00004 2008
[12]

BREAST-CANCER-SCREENING-DBT,

“BREAST-CANCER-SCREENING-DBT,” The Cancer Imaging Archive (TCIA). Accessed: Mar. 06, 2026. [Online]. Available: https://www.cancerimagingarchive.net/collection/breast- cancer-screening-dbt/

2026
[13]

06, 2026

“U.S Mammography and Breast Imaging Market Outlook Report 2022: Mammography Department Budgets Anticipated to Grow at an Estimated 9.7% Each Year from 2023 Through 2025 - ResearchAndMarkets.com.” Accessed: Mar. 06, 2026. [Online]. Available: https://www.businesswire.com/news/home/20230405005454/en/U.S-Mammography-and- Breast-Imaging-Market-Outlook-Report-...

work page arXiv 2022
[14]

Adaptive thresholding technique for segmenting breast dense tissue in digital breast tomosynthesis: a preliminary study,

T. Mustafaev, R. M. Nishikawa, and J. Lee, “Adaptive thresholding technique for segmenting breast dense tissue in digital breast tomosynthesis: a preliminary study,” in 17th International Workshop on Breast Imaging (IWBI 2024), SPIE, May 2024, pp. 240–245. doi: 10.1117/12.3027017

work page doi:10.1117/12.3027017 2024
[15]

Franç a, V-kr0pt/density_segmentation_gui

V. Franç a, V-kr0pt/density_segmentation_gui. (Feb. 15, 2026). Python. Accessed: Mar. 26,

2026
[16]

Available: https://github.com/V-kr0pt/density_segmentation_gui

[Online]. Available: https://github.com/V-kr0pt/density_segmentation_gui