Recognition: unknown
Learning Class Difficulty in Imbalanced Histopathology Segmentation via Dynamic Focal Attention
Pith reviewed 2026-05-10 12:48 UTC · model grok-4.3
The pith
Encoding class difficulty at the representation level provides a principled alternative to conventional loss reweighting for imbalanced segmentation
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Dynamic Focal Attention (DFA) introduces a learnable per-class bias to the cross-attention logits within query-based mask decoders for semantic segmentation. This bias is initialized from a log-frequency prior and optimized end-to-end to capture class-specific difficulty from morphological variability, boundary ambiguity, and contextual similarity. By performing reweighting at the representation level prior to prediction, DFA unifies frequency-based and difficulty-aware approaches. Experiments on BDSA, BCSS, and CRAG benchmarks demonstrate consistent improvements in Dice and IoU metrics, matching or exceeding baselines without requiring a separate difficulty estimator or additional training.
What carries the argument
Dynamic Focal Attention (DFA), a mechanism that introduces a learnable per-class bias to cross-attention logits in query-based mask decoders to enable representation-level reweighting.
If this is right
- Models can adaptively capture difficulty signals through training without needing a separate difficulty estimator.
- It achieves matching or better Dice and IoU scores on three histopathology benchmarks without additional training stages.
- It unifies frequency-based and difficulty-aware approaches under a common attention-bias framework.
- Reweighting occurs at the representation level prior to prediction rather than at the gradient level after prediction.
Where Pith is reading between the lines
- This attention-bias approach could be tested on other imbalanced segmentation domains such as remote sensing or cell microscopy to check generalization.
- The method might simplify training pipelines by removing the need for custom loss functions or two-stage estimators in imbalance settings.
- Further work could examine whether the learned biases primarily reflect boundary ambiguity or other contextual factors on specific tissue types.
Load-bearing premise
That a learnable per-class bias added to cross-attention logits will capture morphological variability, boundary ambiguity, and contextual similarity signals beyond the log-frequency initialization, rather than collapsing to frequency-based reweighting.
What would settle it
If fixing the per-class bias to its log-frequency initialization produces identical Dice and IoU scores to the version where the bias is optimized end-to-end, the claim that it captures additional difficulty signals would be false.
Figures
read the original abstract
Semantic segmentation of histopathology images under class imbalance is typically addressed through frequency-based loss reweighting, which implicitly assumes that rare classes are difficult. However, true difficulty also arises from morphological variability, boundary ambiguity, and contextual similarity-factors that frequency cannot capture. We propose Dynamic Focal Attention (DFA), a simple and efficient mechanism that learns class-specific difficulty directly within the cross-attention of query-based mask decoders. DFA introduces a learnable per-class bias to attention logits, enabling representation-level reweighting prior to prediction rather than gradient-level reweighting after prediction. Initialised from a log-frequency prior to prevent gradient starvation, the bias is optimised end-to-end, allowing the model to adaptively capture difficulty signals through training, effectively unifying frequency-based and difficulty-aware approaches under a common attention-bias framework. On three histopathology benchmarks (BDSA, BCSS, CRAG), DFA consistently improves Dice and IoU, matching or exceeding a difficulty-aware baseline without a separate estimator or additional training stage. These results demonstrate that encoding class difficulty at the representation level provides a principled alternative to conventional loss reweighting for imbalanced segmentation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Dynamic Focal Attention (DFA) for semantic segmentation of histopathology images under class imbalance. DFA adds a learnable per-class bias to the cross-attention logits of query-based mask decoders; this bias is initialized from a log-frequency prior (to avoid gradient starvation) and optimized end-to-end. The authors claim that the resulting representation-level reweighting captures morphological variability, boundary ambiguity, and contextual similarity beyond what frequency alone can explain, thereby unifying frequency-based loss reweighting and difficulty-aware methods. Experiments on the BDSA, BCSS, and CRAG benchmarks are reported to yield consistent Dice and IoU gains that match or exceed a difficulty-aware baseline without requiring a separate estimator or extra training stage.
Significance. If the central claim holds—that the optimized bias meaningfully deviates from its log-frequency initialization and produces robust gains—this offers a lightweight, integrated mechanism for encoding class difficulty directly inside attention rather than post-hoc loss reweighting. Such an approach could simplify pipelines for imbalanced medical-image segmentation while still benefiting from end-to-end training. The absence of quantitative metrics, ablations, or bias-value reporting, however, prevents a clear assessment of practical impact or generalizability.
major comments (2)
- [Abstract and Method (DFA formulation)] The central claim that DFA captures difficulty signals beyond the log-frequency prior (Abstract, §3) is load-bearing yet unsupported. The manuscript provides neither the learned per-class bias values after optimization nor an ablation that freezes the bias at its initialization versus allowing end-to-end updates. Without these, it remains possible that DFA collapses to standard frequency reweighting inside attention, exactly as the stress-test concern anticipates.
- [Experimental evaluation] §4 (Experimental evaluation): the abstract asserts “consistent improvements” and “matching or exceeding” a difficulty-aware baseline on BDSA, BCSS, and CRAG, yet reports no numerical Dice/IoU values, standard deviations, ablation tables, or statistical tests. This omission makes it impossible to verify the magnitude, reliability, or statistical significance of the claimed gains.
minor comments (2)
- [Abstract] The abstract introduces “Dynamic Focal Attention” without a brief equation or diagram clarifying how the per-class bias is added to the attention logits; a short illustrative equation would improve immediate readability.
- [Related Work / Experiments] Ensure the difficulty-aware baseline is fully described (architecture, training protocol, and reference) so that the “matching or exceeding” claim can be reproduced.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the requested evidence and details.
read point-by-point responses
-
Referee: The central claim that DFA captures difficulty signals beyond the log-frequency prior (Abstract, §3) is load-bearing yet unsupported. The manuscript provides neither the learned per-class bias values after optimization nor an ablation that freezes the bias at its initialization versus allowing end-to-end updates. Without these, it remains possible that DFA collapses to standard frequency reweighting inside attention, exactly as the stress-test concern anticipates.
Authors: We agree that the central claim requires explicit support. In the revised manuscript we will add a table reporting the per-class bias values at log-frequency initialization and after end-to-end optimization for each of the three benchmarks. We will also include an ablation that freezes the bias parameters at their initial values and directly compares performance against the full DFA model. These additions will allow readers to evaluate whether the optimized biases deviate meaningfully from the frequency prior and capture additional difficulty signals. revision: yes
-
Referee: the abstract asserts “consistent improvements” and “matching or exceeding” a difficulty-aware baseline on BDSA, BCSS, and CRAG, yet reports no numerical Dice/IoU values, standard deviations, ablation tables, or statistical tests. This omission makes it impossible to verify the magnitude, reliability, or statistical significance of the claimed gains.
Authors: We acknowledge that the current manuscript does not present the full numerical results, standard deviations, or statistical tests in §4. In the revision we will expand the experimental section with complete tables showing mean Dice and IoU scores plus standard deviations across repeated runs for DFA and all compared methods on BDSA, BCSS, and CRAG. We will also include ablation tables and report statistical significance (e.g., paired t-test p-values) to substantiate the claimed gains. revision: yes
Circularity Check
No significant circularity; empirical results independent of bias initialization
full rationale
The paper proposes DFA by adding a learnable per-class bias to cross-attention logits, initialized from a log-frequency prior but then optimized end-to-end on segmentation tasks. Performance is evaluated via Dice and IoU on the external BDSA, BCSS, and CRAG benchmarks. No step in the described chain reduces the reported improvements or the unification claim to a quantity defined solely by the initialization or by self-citation; the adaptation is learned from data and measured independently. No self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations appear. The derivation remains self-contained against standard benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- per-class bias
axioms (1)
- domain assumption Cross-attention logits in query-based mask decoders can be modified by per-class biases to achieve representation-level reweighting that captures difficulty beyond frequency.
invented entities (1)
-
Dynamic Focal Attention (DFA)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Bioinformatics35(18), 3461–3467 (Sep 2019)
Amgad, M., Elfandy, H., Hussein, H., Atteya, L.A., Elsebaie, M.A.T., Abo Elnasr, L.S., Sakr, R.A., Salem, H.S.E., Ismail, A.F., Saad, A.M., Ahmed, J., Rahman, M., Ruhban, I.A., Elgazar, N.M., Alagha, Y., Osman, M.H., Alhusseiny, A.M., Kha- laf, M.M., Younes, A.F., Abdulkarim, A., Younes, D.M., Gadallah, A.M., Elka- shash, A.M., Fala, S.Y., Zaki, B.M., Bee...
-
[2]
CoRRabs/1710.05381(2017),http: //arxiv.org/abs/1710.05381
Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. CoRRabs/1710.05381(2017),http: //arxiv.org/abs/1710.05381
-
[3]
Nature Medicine25(8), 1301–1309 (2019).https://doi.org/10
Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical- grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine25(8), 1301–1309 (2019).https://doi.org/10. 1038/s41591-019-0508-1
2019
-
[4]
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. CoRRabs/2005.12872(2020), https://arxiv.org/abs/2005.12872
-
[5]
Nature Medicine (2024).https://doi.org/ 10.1038/s41591-024-02857-3
Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., Jaume, G., Song, A.H., Chen, B.,Zhang,A.,Shao,D.,Shaban,M.,Williams,M.,Oldenburg,L.,Weishaupt,L.L., Wang,J.J.,Vaidya,A.,Le,L.P.,Gerber,G.,Sahai,S.,Williams,W.,Mahmood,F.: Towards a general-purpose foundation model for computational pathology. Nature Medicine30, 850–862 (2024).https://doi.org/10.1038/s415...
-
[6]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1290–1299 (June 2022)
2022
-
[7]
In: Proceedings of the 35th International Conference on Neural Information Processing Systems
Cheng, B., Schwing, A.G., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21, Curran Associates Inc., Red Hook, NY, USA (2021)
2021
-
[8]
Class- balanced loss based on effective number of samples,
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: 2019 IEEE/CVF Conference on Computer Vision 10 Lakmali Nadeesha Kumari and Sen-Ching Samson Cheung and Pattern Recognition (CVPR). pp. 9260–9269 (2019).https://doi.org/10. 1109/CVPR.2019.00949
-
[9]
Alzheimer’s & Dementia21(Suppl 8), e109898 (2025).https://doi.org/10
Flanagan, M.E., Gutman, D., Dugger, B.N., Cooper, L., Pearce, T.M., Kovacs, G.G., Kukull, W.W., Crary, J.F., Manthey, D., Biber, S., Keene, C.D., Sue- moto, C.K., Bumgardner, C., Nelson, P.T.: Brain Digital Slide Archive: An open source whole slide image sharing platform for AD/ADRD research and diagnos- tics. Alzheimer’s & Dementia21(Suppl 8), e109898 (2...
2025
-
[10]
Graham, S., Chen, H., Gamper, J., Dou, Q., Heng, P.A., Snead, D., Cheung, Y.W., Rajpoot, N.: MILD-Net: Minimal information loss dilated network for gland in- stancesegmentationincolonhistologyimages.MedicalImageAnalysis52,199–211 (2019).https://doi.org/10.1016/j.media.2018.12.001
-
[11]
Medical Image Analysis96, 103196 (2024).https://doi.org/10.1016/j.media.2024.103196
Graham, S., Vu, Q.D., Jahanifar, M., Weigert, M., Schmidt, U., Zhang, W., Zhang, J., Yang, S., Xiang, J., Wang, X., Rumberger, J.L., Baumann, E., Hirsch, P., Wang, X.,Schürch,C.M.,Pizzagalli,D.U.,Matos,P.,Rosa,I.,Narayanan,P.L.,Shephard, A.J.,Bhatt,D.,Zacharias,H.V.,Chan,Y.B.,Albrecht,T.,Liao,Z.,Rajpoot,N.M.: CoNIC: Colon nuclei identification and countin...
-
[12]
Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.: Hover-net: Simultaneous segmentation and classification of nu- clei in multi-tissue histology images. Medical Image Analysis58, 101563 (2019).https://doi.org/https://doi.org/10.1016/j.media.2019.101563, https://www.sciencedirect.com/science/article/pii/S1361841519301045
-
[13]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., Dollar, P., Girshick, R.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 4015–4026 (October 2023)
2023
-
[14]
Pdd-agent: Multimodal large language model-driven ai agent for enhanced plant disease diagnosis
Kumari, L.N., Bandara, C.T., Chuah, C.N., Cheung, S.C.S.: A warmer start to active learning with adaptive gaussian mixture models for skin lesion segmentation. In:2025IEEEInternationalConferenceonImageProcessing(ICIP).pp.2247–2252 (2025).https://doi.org/10.1109/ICIP55913.2025.11084666
-
[15]
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence42(2), 318–327 (2020).https://doi.org/10.1109/TPAMI.2018.2858826
-
[16]
Nature Biomedical Engineering5(6), 555–570 (2021).https://doi.org/ 10.1038/s41551-020-00682-w
Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering5(6), 555–570 (2021).https://doi.org/ 10.1038/s41551-020-00682-w
-
[17]
In: International Conference on Learning Represen- tations (ICLR) (2021),https://openreview.net/forum?id=37nvvqkCo5
Menon, A.K., Jayasumana, S., Rawat, A.S., Jain, H., Veit, A., Kumar, S.: Long-tail learning via logit adjustment. In: International Conference on Learning Represen- tations (ICLR) (2021),https://openreview.net/forum?id=37nvvqkCo5
2021
-
[18]
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Press, O., Smith, N.A., Lewis, M.: Train short, test long: Attention with linear biases enables input length extrapolation. CoRRabs/2108.12409(2021),https: //arxiv.org/abs/2108.12409
work page internal anchor Pith review arXiv 2021
-
[19]
Journal of Machine Learning Research21(140), 1–67 (2020), http://jmlr.org/papers/v21/20-074.html
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text- to-text transformer. Journal of Machine Learning Research21(140), 1–67 (2020), http://jmlr.org/papers/v21/20-074.html
2020
-
[20]
In: 2016 IEEE Conference on Computer Vi- Title Suppressed Due to Excessive Length 11 sion and Pattern Recognition (CVPR)
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: 2016 IEEE Conference on Computer Vi- Title Suppressed Due to Excessive Length 11 sion and Pattern Recognition (CVPR). pp. 761–769 (2016).https://doi.org/10. 1109/CVPR.2016.89
2016
-
[21]
Yeung, M., Sala, E., Schönlieb, C.B., Rundo, L.: Unified focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Computerized Medical Imaging and Graphics95, 102026 (2022). https://doi.org/10.1016/j.compmedimag.2021.102026
-
[22]
ArXivabs/2307.09570(2023),https://api.semanticscholar.org/ CorpusID:259982521
Zhang, J., Ma, K., Kapse, S., Saltz, J.H., Vakalopoulou, M., Prasanna, P., Samaras, D.: Sam-path: A segment anything model for semantic segmentation in digital pathology. ArXivabs/2307.09570(2023),https://api.semanticscholar.org/ CorpusID:259982521
-
[23]
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. CoRRabs/2010.04159(2020), https://arxiv.org/abs/2010.04159
work page internal anchor Pith review arXiv 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.