pith. sign in

arxiv: 2606.08121 · v1 · pith:TBRFPIHPnew · submitted 2026-06-06 · 💻 cs.CV

Trustworthy Visual Predicates for Robust Manipulation Understanding under Degradation

Pith reviewed 2026-06-27 20:00 UTC · model grok-4.3

classification 💻 cs.CV
keywords visual predicatesmanipulation understandingrobustness under degradationpredicate reliabilityegocentric visionaction recognitionneuro-symbolic modelsconfidence-aware estimation
0
0 comments X

The pith

Visual predicates fail in structured ways under image degradation rather than uniformly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to measure how reliably different visual predicates can be recovered from images when those images suffer blur, occlusion, low resolution, frame drops, or detection noise. It defines a vocabulary of predicates used in manipulation understanding, then tracks how each one holds up or collapses using new reliability metrics that include preservation rate, sensitivity to each degradation type, temporal consistency, and effect on downstream task accuracy. The central finding is that failures are not random: static spatial predicates stay relatively stable while contact, motion-coupling, grasp, and release predicates degrade fastest. This matters because these predicates are the relational building blocks inside event-chain and neuro-symbolic models, so knowing which ones are trustworthy under real conditions lets those models be made more robust.

Core claim

Experiments on controlled videos and on VISOR/EPIC-KITCHENS, H2O, and ARCTIC show that predicate failures are structured rather than uniform. Static spatial predicates remain comparatively robust, whereas contact-sensitive, dynamic, and derived predicates such as grasp and release are more fragile. Under severe degradation, detection noise, occlusion, and frame dropping cause the strongest reliability losses. Downstream analysis shows that degraded predicates reduce manipulation-understanding accuracy from 0.89 to 0.58, while removing confidence weighting under moderate degradation reduces accuracy from 0.74 to 0.64.

What carries the argument

A predicate-level reliability framework that supplies a structured predicate vocabulary, confidence-aware estimation, and five metrics (preservation, degradation sensitivity, temporal consistency, confidence-weighted stability, downstream impact) to diagnose which predicates survive which degradations.

If this is right

  • Static spatial predicates can be used with higher trust in degraded conditions for downstream reasoning.
  • Contact-sensitive and dynamic predicates require additional safeguards or alternative evidence sources.
  • Confidence weighting in predicate estimation measurably improves downstream accuracy under moderate degradation.
  • Detection noise, occlusion, and frame dropping are the degradations that produce the largest reliability losses.
  • Manipulation-understanding pipelines lose roughly one-third of their accuracy when predicates are left unfiltered under severe degradation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Perception modules could monitor image quality in real time and down-weight or replace fragile predicates accordingly.
  • The same reliability metrics could be applied to action-recognition pipelines that also rely on contact and grasp predicates.
  • Future datasets collected from physical robots under uncontrolled lighting and motion could be used to validate or refine the synthetic-degradation results.
  • Designers of neuro-symbolic systems might add explicit uncertainty propagation from predicate confidence scores into higher-level planning.

Load-bearing premise

The chosen public datasets together with the applied synthetic degradations are representative of the visual failures that occur in real deployed manipulation systems.

What would settle it

A controlled test on real-world robot videos containing naturally occurring blur, occlusion, and frame drops in which all predicate types show statistically indistinguishable failure rates would falsify the structured-failure claim.

read the original abstract

Manipulation understanding requires reliable relational evidence, such as contact, support, containment, motion coupling, grasp, release, and active-hand involvement. Although these visual predicates are widely used in event-chain, graph-based, and neuro-symbolic models, their reliability under visual degradation is rarely analyzed directly. This paper introduces a predicate-level reliability framework for robust manipulation understanding under blur, occlusion, illumination change, low resolution, frame dropping, and detection noise. The framework defines a structured predicate vocabulary, confidence-aware predicate estimation, and reliability metrics for predicate preservation, degradation sensitivity, temporal consistency, confidence-weighted stability, and downstream impact. Experiments on controlled manipulation videos and public egocentric or bimanual datasets, including VISOR/EPIC-KITCHENS, H2O, and ARCTIC, show that predicate failures are structured rather than uniform. Static spatial predicates remain comparatively robust, whereas contact-sensitive, dynamic, and derived predicates such as grasp and release are more fragile. Under severe degradation, detection noise, occlusion, and frame dropping cause the strongest reliability losses. Downstream analysis shows that degraded predicates reduce manipulation-understanding accuracy from 0.89 to 0.58, while removing confidence weighting under moderate degradation reduces accuracy from 0.74 to 0.64. These results show that predicate reliability provides a diagnostic layer between visual perception and structured manipulation reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper introduces a predicate-level reliability framework for manipulation understanding that defines a structured vocabulary of visual predicates (contact, support, grasp, release, etc.), confidence-aware estimation, and metrics including predicate preservation, degradation sensitivity, temporal consistency, confidence-weighted stability, and downstream impact. Experiments apply synthetic degradations (blur, occlusion, illumination, low-res, frame drop, detection noise) to controlled videos and public datasets (VISOR/EPIC-KITCHENS, H2O, ARCTIC) and report that failures are structured rather than uniform: static spatial predicates are comparatively robust while contact-sensitive, dynamic, and derived predicates are fragile. Detection noise, occlusion, and frame dropping produce the largest reliability losses, with downstream manipulation-understanding accuracy dropping from 0.89 to 0.58 and removal of confidence weighting reducing accuracy from 0.74 to 0.64 under moderate degradation.

Significance. If the reported structure of predicate failures and the quantitative impact numbers hold after statistical validation and real-world testing, the framework would supply a practical diagnostic layer between low-level perception and structured reasoning models, allowing systems to weight or replace fragile predicates under known degradation conditions and thereby improve robustness in deployed manipulation pipelines.

major comments (3)
  1. [Abstract] Abstract: the accuracy reductions (0.89 to 0.58 and 0.74 to 0.64) are stated without error bars, dataset sizes, number of trials, or any statistical significance tests, so it is impossible to determine whether the claimed distinction between robust static-spatial predicates and fragile contact/dynamic predicates is supported by the data.
  2. [Abstract] Abstract: the central claim that confidence weighting improves downstream accuracy (0.74 to 0.64) cannot be evaluated because the paper provides no description of how confidence scores are computed, how they are integrated into predicate estimation, or how the weighted versus unweighted pipelines differ.
  3. [Abstract] Abstract / Experiments: the observed predicate-failure structure rests on synthetic degradations applied to the listed public datasets; without any comparison to real degraded manipulation footage (e.g., actual camera motion blur coupled with hand occlusion), it remains possible that the reported robustness ordering is an artifact of the chosen degradation model rather than an intrinsic property of the predicates.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment point by point below, indicating where revisions will be made to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the accuracy reductions (0.89 to 0.58 and 0.74 to 0.64) are stated without error bars, dataset sizes, number of trials, or any statistical significance tests, so it is impossible to determine whether the claimed distinction between robust static-spatial predicates and fragile contact/dynamic predicates is supported by the data.

    Authors: The abstract condenses the primary findings; the experimental results section reports the underlying dataset sizes (across VISOR/EPIC-KITCHENS, H2O, and ARCTIC), number of trials, error bars on all metrics, and statistical tests supporting the static-vs-dynamic distinction. To ensure the abstract is self-contained, we will revise it to reference the statistical validation and key dataset details. revision: yes

  2. Referee: [Abstract] Abstract: the central claim that confidence weighting improves downstream accuracy (0.74 to 0.64) cannot be evaluated because the paper provides no description of how confidence scores are computed, how they are integrated into predicate estimation, or how the weighted versus unweighted pipelines differ.

    Authors: We agree the abstract omits these implementation details. The methods section defines confidence scores from predicate detector outputs, their integration into the stability metric, and the weighted vs. unweighted ablation. We will revise the abstract to include a concise description of the confidence computation and pipeline difference so the claim can be evaluated directly from the abstract. revision: yes

  3. Referee: [Abstract] Abstract / Experiments: the observed predicate-failure structure rests on synthetic degradations applied to the listed public datasets; without any comparison to real degraded manipulation footage (e.g., actual camera motion blur coupled with hand occlusion), it remains possible that the reported robustness ordering is an artifact of the chosen degradation model rather than an intrinsic property of the predicates.

    Authors: Synthetic degradations enable controlled isolation of individual factors on real manipulation videos from the public datasets. We acknowledge that real-world degradations may include unmodeled correlations. In revision we will expand the discussion and limitations sections to explicitly note this possibility and state that the reported ordering requires future validation against real degraded footage. revision: partial

Circularity Check

0 steps flagged

No circularity: framework definitions and empirical results are independent

full rationale

The paper introduces a predicate reliability framework by defining vocabulary, confidence-aware estimation, and metrics (preservation, sensitivity, consistency, stability, impact) directly from first principles, then reports empirical observations on public datasets under synthetic degradations. No equations, derivations, or fitted parameters are described that reduce predictions to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claims rest on experimental measurements rather than any self-referential reduction, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the framework itself is the primary contribution but rests on unstated assumptions about predicate definitions and dataset representativeness.

pith-pipeline@v0.9.1-grok · 5768 in / 1068 out tokens · 20564 ms · 2026-06-27T20:00:16.930851+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Robotics and Autonomous Systems57(5), 469–483 (2009) https://doi.org/10.1016/j.robot.2008.10.024

    Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and Autonomous Systems57(5), 469–483 (2009) https://doi.org/10.1016/j.robot.2008.10.024

  2. [2]

    Interna- tional Journal of Computer Vision130(1), 33–55 (2022) https://doi.org/10.1007/ s11263-021-01531-2

    Damen, D., Doughty, H., Farinella, G.M., Furnari, A., Kazakos, E., Ma, J., Moltisanti, D., Munro, J., Perrett, T., Price, W., Wray, M.: Rescaling egocentric vision: Collection, pipeline and challenges for EPIC-KITCHENS-100. Interna- tional Journal of Computer Vision130(1), 33–55 (2022) https://doi.org/10.1007/ s11263-021-01531-2

  3. [3]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Sener, F., Chatterjee, D., Shelepov, D., He, K., Singhania, D., Wang, R., Yao, A.: Assembly101: A large-scale multi-view video dataset for understanding procedural activities. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21096–21106 (2022)

  4. [4]

    In: Proceedings 47 of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Grauman, K., Westbury, A., Torresani, L., Kitani, K., Malik, J., Afouras, T., Ashutosh, K., Baiyya, V., Bansal, S., Boote, B.,et al.: Ego-Exo4D: Understanding skilled human activity from first- and third-person perspectives. In: Proceedings 47 of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19383–19400 (2024)

  5. [5]

    The International Journal of Robotics Research30(10), 1229–1249 (2011) https://doi.org/10.1177/ 0278364911410459

    Aksoy, E.E., Abramov, A., D¨ orr, J., Ning, K., Dellen, B., W¨ org¨ otter, F.: Learn- ing the semantics of object–action relations by observation. The International Journal of Robotics Research30(10), 1229–1249 (2011) https://doi.org/10.1177/ 0278364911410459

  6. [6]

    In: Proceedings of the IEEE International Conference on Robotics and Automation, pp

    Ziaeetabar, F., Aksoy, E.E., W¨ org¨ otter, F., Tamosiunaite, M.: Semantic analy- sis of manipulation actions using spatial relations. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4612–4619 (2017). https://doi.org/10.1109/ICRA.2017.7989536

  7. [7]

    Tsagarakis, and Enrico Mingo Hoffman

    Ziaeetabar, F., Kulvicius, T., Tamosiunaite, M., W¨ org¨ otter, F.: Recognition and prediction of manipulation actions using enriched semantic event chains. Robotics and Autonomous Systems110, 173–188 (2018) https://doi.org/10.1016/j.robot. 2018.10.005

  8. [8]

    Shamma, Michael S

    Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual genome: Connecting language and vision using crowdsourced dense image anno- tations. International Journal of Computer Vision123(1), 32–73 (2017) https: //doi.org/10.1007/s11263-016-0981-7

  9. [9]

    In: Proceedings of the 38th International Conference on Machine Learning

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763...

  10. [10]

    https://doi.org/10.48550/ arXiv.2303.05499

    Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., Zhu, J., Zhang, L.: Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection (2023). https://doi.org/10.48550/ arXiv.2303.05499

  11. [11]

    In: IEEE/CVF International Conference on Computer Vision

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., Doll´ ar, P., Girshick, R.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026 (2023). https://doi.org/10.1109/ICCV51070.2023.00371

  12. [12]

    In: International Conference on Learning Representations (2019)

    Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to com- mon corruptions and perturbations. In: International Conference on Learning Representations (2019)

  13. [13]

    In: International Conference on Learning 48 Representations (2019)

    Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Bren- del, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning 48 Representations (2019)

  14. [14]

    In: NeurIPS Workshop on Machine Learning for Autonomous Driving (2019)

    Michaelis, C., Mitzkus, B., Geirhos, R., Rusak, E., Bringmann, O., Ecker, A.S., Bethge, M., Brendel, W.: Benchmarking robustness in object detection: Autonomous driving when winter is coming. In: NeurIPS Workshop on Machine Learning for Autonomous Driving (2019)

  15. [15]

    PLOS ONE15(12), 0243829 (2020) https://doi.org/10.1371/journal.pone.0243829

    Ziaeetabar, F., Pomp, J., Pfeiffer, S., El-Sourani, N., Schubotz, R.I., Tamosiu- naite, M., W¨ org¨ otter, F.: Using enriched semantic event chains to model human action prediction based on minimal spatial information. PLOS ONE15(12), 0243829 (2020) https://doi.org/10.1371/journal.pone.0243829

  16. [16]

    Scientific reports 10(1), 3999 (2020)

    W¨ org¨ otter, F., Ziaeetabar, F., Pfeiffer, S., Kaya, O., Kulvicius, T., Tamosiu- naite, M.: Humans predict action using grammar-like structures. Scientific reports 10(1), 3999 (2020)

  17. [17]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018). https://doi.org/10.1609/aaai.v32i1.12328

  18. [18]

    IEEE Access (2024) https://doi.org/10.1109/ACCESS.2024.3509674

    Ziaeetabar, F., Tamosiunaite, M., W¨ org¨ otter, F.: A hierarchical graph-based approach for recognition and description generation of bimanual actions in videos. IEEE Access (2024) https://doi.org/10.1109/ACCESS.2024.3509674

  19. [19]

    IEEE Access13, 201990–202009 (2025) https://doi.org/10.1109/ACCESS.2025.3637990

    Ziaeetabar, F., W¨ org¨ otter, F.: Adaptive multimodal graph reasoning with founda- tion models for fine-grained action recognition. IEEE Access13, 201990–202009 (2025) https://doi.org/10.1109/ACCESS.2025.3637990

  20. [20]

    Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains

    Ziaeetabar, F.: Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains (2026). https://doi.org/10.48550/arXiv.2604.21053

  21. [21]

    In: Advances in Neural Information Processing Systems, vol

    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27, pp. 568–576 (2014)

  22. [22]

    Deep Residual Learning for Image Recognition

    Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4733 (2017). https://doi.org/10.1109/CVPR. 2017.502

  23. [23]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Lin, J., Gan, C., Han, S.: TSM: Temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7083–7093 (2019)

  24. [24]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recog- nition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019) 49

  25. [25]

    Proceedings of Machine Learning Research, vol

    Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 813–

  26. [26]

    PMLR, Virtual Event (2021)

  27. [27]

    In: Advances in Neural Information Processing Systems, vol

    Tong, Z., Song, Y., Wang, J., Wang, L.: VideoMAE: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022)

  28. [28]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J., Jiang, H., Liu, M., Liu, X., Martin, M., Nagarajan, T.,et al.: Ego4D: Around the world in 3,000 hours of egocentric video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18973–18990 (2022)

  29. [29]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Kwon, T., Tekin, B., Stuhmer, J., Bogo, F., Pollefeys, M.: H2O: Two hands manipulating objects for first person interaction recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10138–10148 (2021)

  30. [30]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

    Fan, Z., Taheri, O., Tzionas, D., Kocabas, M., Kaufmann, M., Black, M.J., Hilliges, O.: ARCTIC: A dataset for dexterous bimanual hand-object manipu- lation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

  31. [31]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Cho, H., Kim, C., Kim, J., Lee, S., Ismayilzada, E., Baek, S.: Transformer-based unified recognition of two hands manipulating objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4769– 4778 (2023)

  32. [32]

    In: Proceedings of the British Machine Vision Conference (2023)

    Roh, W., Lee, S.H., Ryoo, W.J., Lee, J., Oh, G., Hwang, S., Chi, H.-g., Kim, S.: Functional hand type prior for 3d hand pose estimation and action recognition from egocentric view monocular videos. In: Proceedings of the British Machine Vision Conference (2023)

  33. [33]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Ji, J., Krishna, R., Fei-Fei, L., Niebles, J.C.: Action genome: Actions as com- positions of spatio-temporal scene graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10236–10247 (2020). https://doi.org/10.1109/CVPR42600.2020.01025

  34. [34]

    In: Advances in Neural Information Processing Systems, Datasets and Benchmarks Track (2022)

    Darkhalil, A., Shan, D., Zhu, B., Ma, J., Kar, A., Higgins, R., Fidler, S., Fouhey, D., Damen, D.: EPIC-KITCHENS VISOR benchmark: VIdeo segmentations and object relations. In: Advances in Neural Information Processing Systems, Datasets and Benchmarks Track (2022)

  35. [35]

    In: European Conference 50 on Computer Vision, pp

    Brahmbhatt, S., Tang, C., Twigg, C.D., Kemp, C.C., Hays, J.: ContactPose: A dataset of grasps with object contact and hand pose. In: European Conference 50 on Computer Vision, pp. 361–378. Springer, Cham (2020)

  36. [36]

    In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

    Ziaeetabar, F., Kulvicius, T., Tamosiunaite, M., W¨ org¨ otter, F.: Prediction of manipulation action classes using semantic spatial reasoning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3350– 3357 (2018). IEEE

  37. [37]

    In: Proceedings of the 3rd ACM International Conference on Multimedia in Asia, pp

    Hirata, T., Mukuta, Y., Harada, T.: Making video recognition models robust to common corruptions with supervised contrastive learning. In: Proceedings of the 3rd ACM International Conference on Multimedia in Asia, pp. 1–6 (2021). https://doi.org/10.1145/3469877.3497692

  38. [38]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

    Zeng, R., Xu, Q., Huang, W., Chen, P., Tan, M., Gan, C.: Benchmarking the robustness of temporal action detection models against temporal corruptions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

  39. [39]

    Medical Image Analysis48, 117–130 (2018)

    Parisot, S., Ktena, S.I., Ferrante, E., Lee, M., Guerrero, R., Glocker, B., Rueckert, D.: Disease prediction using graph convolutional networks: Application to autism spectrum disorder and alzheimer’s disease. Medical Image Analysis48, 117–130 (2018)

  40. [40]

    Computers in Biology and Medicine149, 106079 (2022)

    Ma, Q., Zhou, S., Li, C., Liu, F., Liu, Y., Hou, M., Zhang, Y.: Dgrunit: Dual graph reasoning unit for brain tumor segmentation. Computers in Biology and Medicine149, 106079 (2022)

  41. [41]

    arXiv preprint arXiv:2508.01465 (2025) 51

    Ziaeetabar, F.: Efficientgformer: Multimodal brain tumor segmentation via pruned graph-augmented transformer. arXiv preprint arXiv:2508.01465 (2025) 51