pith. machine review for the scientific record. sign in

arxiv: 2605.09366 · v1 · submitted 2026-05-10 · 💻 cs.AI

Recognition: no theorem link

Towards a Virtual Neuroscientist: Autonomous Neuroimaging Analysis via Multi-Agent Collaboration

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:13 UTC · model grok-4.3

classification 💻 cs.AI
keywords multi-agent systemsneuroimaging analysisautonomous workflowsADHD-200ADNIcode synthesisquality control
0
0 comments X

The pith

Multi-agent AI collaboration enables autonomous construction of neuroimaging analysis workflows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors seek to demonstrate that specialist AI agents can work together to create and improve entire neuroimaging data processing programs from start to finish, adapting as they go based on intermediate results. Current standardized tools are rigid and require experts to manually fix problems and adjust settings repeatedly, which slows down the creation of useful brain biomarkers for diseases. By focusing on writing code rather than just calling tools and using both statistical checks and visual reviews to verify quality, the system can explore different strategies and refine them without outside help. This would matter if true because it could remove the bottleneck of human trial-and-error, allowing faster and larger-scale analysis of brain scans for clinical insights.

Core claim

NIAgent introduces a code-centric multi-agent system in which specialized agents collaboratively synthesize, execute, and optimize executable programs built from domain-specific neuroimaging primitives, paired with a hierarchical verification framework that combines cohort-level metric screening and agent-driven visual inspection to enable evidence-based remediation and adaptive workflow construction.

What carries the argument

Code-centric multi-agent synthesis of executable programs over composable primitives, augmented by hierarchical verification of cohort metrics and agentic visual inspection.

If this is right

  • Workflows adapt dynamically to runtime observations during execution.
  • Reduces reliance on manual trial-and-error for parameter tuning and error remediation.
  • Improves predictive performance on datasets like ADHD-200 and ADNI compared to static workflow baselines.
  • Exhibits agentic behaviors such as strategy exploration and adaptive refinement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such systems might extend to other scientific fields where data pipelines require custom adaptation, like genomics or materials science.
  • Integrating this with larger reasoning models could eventually allow agents to generate new hypotheses about brain disorders.
  • Testing on more varied clinical datasets would reveal how well the adaptive behaviors generalize beyond the tested cases.

Load-bearing premise

That combining code-centric multi-agent synthesis with hierarchical verification will consistently yield robust and generalizable workflows without requiring human intervention or post-hoc tuning.

What would settle it

Running NIAgent on a previously unseen neuroimaging dataset from a different scanner or population and observing whether it achieves lower accuracy or fails to remediate pipeline issues compared to human-designed baselines.

Figures

Figures reproduced from arXiv: 2605.09366 by Carl Yang, Keqi Han, Lifang He, Songlin Zhao, Yao Su.

Figure 1
Figure 1. Figure 1: Overview of the NIAgent framework. LLM Agents for Scientific Workflows. Recent work has increasingly explored LLM agents not only for general tool use, but also for scientific discovery and domain-specialized research automation. For example, ReAct [11] established a general reasoning-and-acting paradigm, while subsequent systems explored multi-agent collaboration and executable-code-based action spaces su… view at source ↗
Figure 2
Figure 2. Figure 2: Ablation study results. Stacked bars show total execution errors across five independent [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Evaluation of the closed loop autonomous QC module. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example questionnaire page used for human evaluation in the QC agreement study, shown [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Example visualization used for raw T1w visual QC. The figure is a mosaic view from the [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example visualization used for T1w skull-stripping QC. The red contour shows the extracted [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example visualization used for T1w tissue-segmentation QC. Red indicates the brain mask, [PITH_FULL_IMAGE:figures/full_fig_p036_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example visualization used for T1w-to-MNI normalization QC. The red outlines correspond [PITH_FULL_IMAGE:figures/full_fig_p037_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example visualization used for raw fMRI visual QC. The figure shows the MRIQC mosaic [PITH_FULL_IMAGE:figures/full_fig_p037_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Example visualization used for fMRI-to-T1w co-registration QC. The red contours [PITH_FULL_IMAGE:figures/full_fig_p038_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Example visualization used for fMRI-to-MNI normalization QC. The red contours [PITH_FULL_IMAGE:figures/full_fig_p039_11.png] view at source ↗
read the original abstract

Transforming neuroimaging data into clinically actionable biomarkers is a knowledge-intensive and labor-intensive process. Standardized workflows such as fMRIPrep have improved robustness and efficiency, but they are statically configured and cannot reason about downstream objectives, deliberate over alternative strategies, or close the loop between intermediate evidence and subsequent decisions in the way a human researcher would. This lack of closed-loop adaptation often leaves domain experts trapped in a cycle of manual trial-and-error to tune parameters and remediate pipeline failures, severely constraining the scalability of clinical biomarker development. To bridge this gap, we introduce NIAgent, a multi-agent system for autonomous end-to-end neuroimaging analysis. Unlike conventional flat tool-calling agents, NIAgent adopts a code-centric execution paradigm where specialist agents collaboratively synthesize and optimize executable programs over composable domain-specific primitives. This design enables robust, long-horizon workflow construction that adapts dynamically to runtime observations. Furthermore, we propose a hierarchical verification framework for autonomous quality control, integrating cohort-level metric screening with agentic visual inspection to drive evidence-grounded workflow remediation. Experiments on ADHD-200 and ADNI demonstrate that NIAgent outperforms standard workflow-based baselines in predictive performance while exhibiting sophisticated agentic behaviors, including strategy exploration and adaptive refinement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces NIAgent, a multi-agent system for autonomous end-to-end neuroimaging analysis. It employs a code-centric paradigm in which specialist agents collaboratively synthesize and optimize executable workflows from composable domain-specific primitives, enabling dynamic adaptation to runtime observations. A hierarchical verification framework integrates cohort-level metric screening with agentic visual inspection for autonomous quality control. Experiments on the ADHD-200 and ADNI datasets are claimed to demonstrate that NIAgent outperforms standard workflow-based baselines in predictive performance while exhibiting agentic behaviors such as strategy exploration and adaptive refinement.

Significance. If the empirical results hold under rigorous evaluation, the work could meaningfully advance automated neuroimaging pipelines by addressing the limitations of static workflows like fMRIPrep. The code-centric multi-agent design and hierarchical verification represent a concrete step toward closed-loop, reasoning-based analysis that reduces manual trial-and-error, with potential implications for scalable biomarker discovery in clinical settings.

major comments (2)
  1. [Experiments/Results] Experiments/Results section: The central claim that NIAgent 'outperforms standard workflow-based baselines in predictive performance' is presented without any quantitative metrics, error bars, specific baseline implementations, ablation studies, or statistical tests. This absence prevents evaluation of effect sizes or robustness and is load-bearing for the primary empirical contribution.
  2. [Method/Hierarchical verification] Hierarchical verification framework description (likely §3.2): The integration of 'cohort-level metric screening with agentic visual inspection' is described at a high level but lacks concrete definitions of the metrics used, thresholds for remediation, or how visual inspection is operationalized as an agentic process, making reproducibility and assessment of the 'evidence-grounded' claim difficult.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'predictive performance' is used without specifying the downstream task (e.g., ADHD classification accuracy, ADNI biomarker prediction) or the exact nature of the 'standard workflow-based baselines'.
  2. [Introduction/Method] Notation and terminology: The term 'code-centric execution paradigm' is introduced without a clear contrast to 'flat tool-calling agents' or a diagram illustrating the agent interaction graph and primitive library.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which helps clarify the presentation of our empirical results and methodological details. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Experiments/Results] Experiments/Results section: The central claim that NIAgent 'outperforms standard workflow-based baselines in predictive performance' is presented without any quantitative metrics, error bars, specific baseline implementations, ablation studies, or statistical tests. This absence prevents evaluation of effect sizes or robustness and is load-bearing for the primary empirical contribution.

    Authors: We acknowledge that the current Experiments section presents the performance claims at a summary level without sufficient quantitative detail. In the revised manuscript, we will expand this section to include specific predictive performance metrics (e.g., accuracy, AUC-ROC) with error bars from repeated runs, explicit descriptions of the baseline implementations (including fMRIPrep configurations and other standard workflows), ablation studies isolating the contributions of the code-centric multi-agent collaboration and hierarchical verification, and statistical tests (e.g., paired t-tests or Wilcoxon tests with p-values) to quantify effect sizes and robustness. These additions will directly address the load-bearing nature of the empirical claims. revision: yes

  2. Referee: [Method/Hierarchical verification] Hierarchical verification framework description (likely §3.2): The integration of 'cohort-level metric screening with agentic visual inspection' is described at a high level but lacks concrete definitions of the metrics used, thresholds for remediation, or how visual inspection is operationalized as an agentic process, making reproducibility and assessment of the 'evidence-grounded' claim difficult.

    Authors: We agree that the description of the hierarchical verification framework in §3.2 is currently high-level and requires greater specificity for reproducibility. In the revised manuscript, we will expand this section to define the exact cohort-level metrics (e.g., motion displacement thresholds, signal-to-noise ratio cutoffs, and other image quality indices), the precise remediation thresholds that trigger workflow adjustments, and the operational details of the agentic visual inspection process, including the agent's input prompts, visual analysis criteria, decision logic, and how it integrates with the metric screening to produce evidence-grounded remediations. This will make the framework fully concrete and assessable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces NIAgent as a multi-agent system for autonomous neuroimaging analysis and evaluates it empirically on ADHD-200 and ADNI datasets against workflow baselines. No equations, fitted parameters, or self-referential definitions appear in the derivation; claims of outperformance and agentic behaviors rest on experimental comparisons and hierarchical verification rather than reducing to inputs by construction. No load-bearing self-citations or ansatz smuggling are present that would force the central results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The abstract relies on unstated assumptions about the reliability of agent collaboration and the sufficiency of the proposed verification layer; no free parameters or invented physical entities are described.

axioms (2)
  • domain assumption Multi-agent systems can reliably synthesize and debug executable neuroimaging workflows from domain primitives
    Implicit in the design of NIAgent and the claim of robust long-horizon construction
  • domain assumption Hierarchical verification (cohort metrics plus visual inspection) provides sufficient evidence for autonomous remediation
    Central to the quality-control framework described
invented entities (1)
  • NIAgent multi-agent system no independent evidence
    purpose: Autonomous end-to-end neuroimaging analysis via code synthesis
    New system introduced to address limitations of static pipelines

pith-pipeline@v0.9.0 · 5523 in / 1120 out tokens · 41845 ms · 2026-05-12T03:13:50.151873+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 3 internal anchors

  1. [1]

    The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.Scientific data, 3(1):1–9, 2016

    Krzysztof J Gorgolewski, Tibor Auer, Vince D Calhoun, R Cameron Craddock, Samir Das, Eugene P Duff, Guillaume Flandin, Satrajit S Ghosh, Tristan Glatard, Yaroslav O Halchenko, et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.Scientific data, 3(1):1–9, 2016

  2. [2]

    fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

    Oscar Esteban, Christopher J Markiewicz, Ross W Blair, Craig A Moodie, A Ilkay Isik, Asier Erramuzpe, James D Kent, Mathias Goncalves, Elizabeth DuPre, Madeleine Snyder, et al. fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

  3. [3]

    A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 16(5):1–72, 2025

    Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 16(5):1–72, 2025

  4. [4]

    Agentic ai for scientific discovery: A survey of progress, challenges, and future directions,

    Mourad Gridach, Jay Nanavati, Khaldoun Zine El Abidine, Lenon Mendes, and Christina Mack. Agentic ai for scientific discovery: A survey of progress, challenges, and future directions. arXiv preprint arXiv:2503.08979, 2025

  5. [5]

    The adhd-200 consortium: a model to advance the translational potential of neuroimaging in clinical neuroscience.Frontiers in systems neuroscience, 6:62, 2012

    ADHD-200 consortium. The adhd-200 consortium: a model to advance the translational potential of neuroimaging in clinical neuroscience.Frontiers in systems neuroscience, 6:62, 2012

  6. [6]

    Alzheimer’s disease neuroimaging initiative (adni) clinical characterization.Neurology, 74(3):201–209, 2010

    Ronald Carl Petersen, Paul S Aisen, Laurel A Beckett, Michael C Donohue, Anthony Collins Gamst, Danielle J Harvey, Clifford R Jack Jr, William J Jagust, Leslie M Shaw, Arthur W Toga, et al. Alzheimer’s disease neuroimaging initiative (adni) clinical characterization.Neurology, 74(3):201–209, 2010

  7. [7]

    Freesurfer.Neuroimage, 62(2):774–781, 2012

    Bruce Fischl. Freesurfer.Neuroimage, 62(2):774–781, 2012

  8. [8]

    Brainsuite: an automated cortical surface identification tool.Medical image analysis, 6(2):129–142, 2002

    David W Shattuck and Richard M Leahy. Brainsuite: an automated cortical surface identification tool.Medical image analysis, 6(2):129–142, 2002

  9. [9]

    Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python.Frontiers in neuroinformatics, 5:13, 2011

    Krzysztof Gorgolewski, Christopher D Burns, Cindee Madison, Dav Clark, Yaroslav O Halchenko, Michael L Waskom, and Satrajit S Ghosh. Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python.Frontiers in neuroinformatics, 5:13, 2011

  10. [10]

    Mriqc: Advancing the automatic prediction of image quality in mri from unseen sites.PloS one, 12(9):e0184661, 2017

    Oscar Esteban, Daniel Birman, Marie Schaer, Oluwasanmi O Koyejo, Russell A Poldrack, and Krzysztof J Gorgolewski. Mriqc: Advancing the automatic prediction of image quality in mri from unseen sites.PloS one, 12(9):e0184661, 2017

  11. [11]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

  12. [12]

    Executable code actions elicit better llm agents

    Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. Executable code actions elicit better llm agents. InForty-first International Conference on Machine Learning, 2024

  13. [13]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scien- tist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024

  14. [14]

    M., Cox, S., Schilter, O., Baldassari, C., White, A

    Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools.arXiv preprint arXiv:2304.05376, 2023

  15. [15]

    Biomni: A general-purpose biomedical ai agent

    Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani, Ryan Li, Lin Qiu, Gavin Li, Junze Zhang, et al. Biomni: A general-purpose biomedical ai agent. biorxiv, 2025. 10

  16. [16]

    Medrax: Medical reasoning agent for chest x-ray

    Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, and Bo Wang. Medrax: Medical reasoning agent for chest x-ray. InInternational Conference on Machine Learning, pages 15661–15676. PMLR, 2025

  17. [17]

    Neura: An agentic system for autonomous neuroimaging workflows

    Jun Xie, Jing Wang, Xiumei Wu, Xinyuan Liu, Yiqi Mi, Qinjin Liu, Tong Xu, Chen Liu, Huafu Chen, and Jing Guo. Neura: An agentic system for autonomous neuroimaging workflows. bioRxiv, pages 2026–04, 2026

  18. [18]

    NeuroClaw Technical Report

    Cheng Wang, Zhibin He, Zhihao Peng, Shengyuan Liu, Yufan Hu, Lichao Sun, Xiang Li, and Yixuan Yuan. Neuroclaw technical report.arXiv preprint arXiv:2604.24696, 2026

  19. [19]

    Agentic Large Language Models for Training-Free Neuro-Radiological Image Analysis

    Ayhan Can Erdur, Daniel Scholz, Jiazhen Pan, Benedikt Wiestler, Daniel Rueckert, and Jan C Peeken. Agentic large language models for training-free neuro-radiological image analysis. arXiv preprint arXiv:2604.16729, 2026

  20. [20]

    Afni: software for analysis and visualization of functional magnetic resonance neuroimages.Computers and Biomedical research, 29(3):162–173, 1996

    Robert W Cox. Afni: software for analysis and visualization of functional magnetic resonance neuroimages.Computers and Biomedical research, 29(3):162–173, 1996

  21. [21]

    FSL.NeuroImage, 62(2):782–790, 2012

    Mark Jenkinson, Christian F Beckmann, Timothy E J Behrens, Mark W Woolrich, and Stephen M Smith. FSL.NeuroImage, 62(2):782–790, 2012

  22. [22]

    Spm12 manual

    John Ashburner, Gareth Barnes, Chun-Chuan Chen, Jean Daunizeau, Guillaume Flandin, Karl Friston, Stefan Kiebel, James Kilner, Vladimir Litvak, Rosalyn Moran, et al. Spm12 manual. Wellcome Trust Centre for Neuroimaging, London, UK, 2464(4):53, 2014

  23. [23]

    Advanced normalization tools (ants).Insight j, 2(365):1–35, 2009

    Brian B Avants, Nick Tustison, Gang Song, et al. Advanced normalization tools (ants).Insight j, 2(365):1–35, 2009

  24. [24]

    Computing inter-rater reliability and its variance in the presence of high agreement.British Journal of Mathematical and Statistical Psychology, 61(1):29–48, 2008

    Kilem Li Gwet. Computing inter-rater reliability and its variance in the presence of high agreement.British Journal of Mathematical and Statistical Psychology, 61(1):29–48, 2008

  25. [25]

    Improved optimization for the robust and accurate linear registration and motion correction of brain images.NeuroImage, 17(2):825–841, 2002

    Mark Jenkinson, Peter Bannister, Michael Brady, and Stephen Smith. Improved optimization for the robust and accurate linear registration and motion correction of brain images.NeuroImage, 17(2):825–841, 2002

  26. [26]

    Statistical Parametric Mapping: The Analysis of Functional Brain Images

    William D Penny, Karl J Friston, John T Ashburner, Stefan J Kiebel, and Thomas E Nichols. Statistical Parametric Mapping: The Analysis of Functional Brain Images. Elsevier, 2011

  27. [27]

    Fast robust automated brain extraction.Human Brain Mapping, 17(3): 143–155, 2002

    Stephen M Smith. Fast robust automated brain extraction.Human Brain Mapping, 17(3): 143–155, 2002

  28. [28]

    Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm.IEEE Transactions on Medical Imaging, 20(1):45–57, 2001

    Yongyue Zhang, Michael Brady, and Stephen Smith. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm.IEEE Transactions on Medical Imaging, 20(1):45–57, 2001

  29. [29]

    Symmetric diffeo- morphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.Medical Image Analysis, 12(1):26–41, 2008

    Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeo- morphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.Medical Image Analysis, 12(1):26–41, 2008

  30. [30]

    A reproducible evaluation of ANTs similarity metric performance in brain image registration

    Brian B Avants, Nicholas J Tustison, Gang Song, Philip A Cook, Arno Klein, and James C Gee. A reproducible evaluation of ANTs similarity metric performance in brain image registration. NeuroImage, 54(3):2033–2044, 2011

  31. [31]

    N4ITK: improved N3 bias correction.IEEE Transactions on Medical Imaging, 29(6):1310–1320, 2010

    Nicholas J Tustison, Brian B Avants, Philip A Cook, Yuanjie Zheng, Alexander Egan, Paul A Yushkevich, and James C Gee. N4ITK: improved N3 bias correction.IEEE Transactions on Medical Imaging, 29(6):1310–1320, 2010

  32. [32]

    An open source multivariate framework for n-tissue segmentation with evaluation on public data.Neuroinfor- matics, 9(4):381–400, 2011

    Brian B Avants, Nicholas J Tustison, Jue Wu, Philip A Cook, and James C Gee. An open source multivariate framework for n-tissue segmentation with evaluation on public data.Neuroinfor- matics, 9(4):381–400, 2011

  33. [33]

    Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the mni mri single-subject brain.Neuroimage, 2002

    Nathalie Tzourio-Mazoyer, Brigitte Landeau, Dimitri Papathanassiou, Fabrice Crivello, Octave Etard, Nicolas Delcroix, Bernard Mazoyer, and Marc Joliot. Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the mni mri single-subject brain.Neuroimage, 2002. 11

  34. [34]

    Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral Cortex, 2017

    Alexander Schaefer, Ru Kong, Evan Gordon, Timothy Laumann, Xinian Zuo, Avram Holmes, Simon Eickhoff, and T Thomas Yeo. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri.Cerebral Cortex, 2017

  35. [35]

    A multi-modal parcellation of human cerebral cortex.Nature, 2016

    Matthew F Glasser, Timothy S Coalson, Emma C Robinson, Carl Hacker, John Harwell, Essa Yacoub, Kamil Ugurbil, Jesper Andersson, Christian F Beckmann, Mark Jenkinson, et al. A multi-modal parcellation of human cerebral cortex.Nature, 2016

  36. [36]

    Brain network transformer

    Xuan Kan, Wei Dai, Hejie Cui, Zilong Zhang, Ying Guo, and Carl Yang. Brain network transformer. InNeurIPS, 2022

  37. [37]

    Bayrak, Tyler Derr, Mudassir Shabbir, Daniel Moyer, Catie Chang, and Xenofon Koutsoukos

    Anwar Said, Roza G. Bayrak, Tyler Derr, Mudassir Shabbir, Daniel Moyer, Catie Chang, and Xenofon Koutsoukos. Neurograph: benchmarks for graph machine learning in brain connectomics. InNeurIPS, 2023. 12 A End-to-End Autonomous Neuroimaging Analysis Experiments Details A.1 Task Descriptions of the End-to-End Neuroimaging Analysis We evaluate NIAgent in an e...

  38. [38]

    A complete neuroimaging preprocessing pipeline

  39. [40]

    The provided dataset should be treated as training set

    The corresponding inference script that can load the trained model and produce predictions on the held-out test set. The provided dataset should be treated as training set. Your delivered preprocessing pipeline and model will be applied to another held-out test set of subjects (which is invisible to you). Your performance will be evaluated based on the pr...

  40. [41]

    A complete neuroimaging preprocessing pipeline. 13

  41. [42]

    Trained downstream prediction model

  42. [43]

    Remove QC

    The corresponding inference script that can load the trained model(s) and produce predictions on the held-out test set. The provided dataset should be treated as training set. Your delivered preprocessing pipeline and model will be applied to another held-out test set of subjects (which is invisible to you). Your performance will be evaluated based on the...

  43. [44]

    Run MRIQC to obtain Image Quality Metrics (IQMs) and corresponding visual inspection outputs for each subject

  44. [45]

    Use the metrics to identify subjects with abnormal values

  45. [46]

    Perform visual inspection only on the small subset of subjects flagged as abnormal

  46. [47]

    For each subject, review only the most critical images and provide a final judgment. At the end of this stage, you must report the before-preprocessing QC results to the supervisor: Which subjects have data quality that is too poor and should be excluded from further processing, while the rest of the subjects can proceed to subsequent processing and analy...

  47. [48]

    For each preprocessing step that requires QC, compute the metrics relevant to that specific step only

  48. [49]

    For each preprocessing step separately, identify outlier subjects based only on that step’s own metrics (e.g., the most abnormal 15% for that step)

  49. [50]

    For each preprocessing step separately, perform visual inspection only on the subjects flagged for that same step. 27

  50. [51]

    A subject may therefore receive visual QC for one step but not for another, depending on which step-specific metric screen flagged that subject

  51. [52]

    verdict": Literal[

    After the step-specific visual inspections are completed, aggregate the per-step QC decisions into the final subject-level judgment and clearly report which preprocessing step(s) failed for each rejected subject. --- Note that the neuroimaging processing pipeline may involve many different steps. You only need to perform QC for the specific processing ste...

  52. [53]

    First, write the Python script and use this script to process **a set of sampled subjects (for example, 10 subjects)** to test the validity of the script

  53. [54]

    - If the any expected derivatives files are missing, check the script or logs

    Check the results of these subjects to see whether any expected derivatives files are missing. - If the any expected derivatives files are missing, check the script or logs. Fix any issues if found. If the script is correct, report the issue concisely to the Supervisor Agent for guidance. - If none of the expected derivatives files are missing, you must s...

  54. [55]

    Note that you must not check subjects one by one manually; instead, you should use a script to perform this check

    After all subjects have been processed, write a simple script to check whether any expected derivatives files are missing for all subjects. Note that you must not check subjects one by one manually; instead, you should use a script to perform this check

  55. [56]

    Inform the Supervisor Agent that you have finished your job and any downstream analysis can proceed

    Stop and Report the final preprocessing pipeline, as well as the storage locations of the generated data, to the Supervisor Agent. Inform the Supervisor Agent that you have finished your job and any downstream analysis can proceed. * Note that the log file for each subject may be empty, as some tools do not generate logs during execution. Therefore, the c...