GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents

Qun Liu; Sean McSweeney; Xi Yu; Yang Yang; Yonghua Du; Yuewei Lin

arxiv: 2510.13896 · v2 · submitted 2025-10-14 · 🧬 q-bio.QM · cs.AI· cs.CV· cs.MA

GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents

Xi Yu , Yang Yang , Qun Liu , Yonghua Du , Sean McSweeney , Yuewei Lin This is my paper

Pith reviewed 2026-05-18 07:20 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.AIcs.CVcs.MA

keywords cellular image segmentationmulti-agent systemslarge language modelstraining-free methodsmicroscopy analysisout-of-distribution generalizationtext-guided refinementself-evolving workflows

0 comments

The pith

A multi-agent LLM system routes cellular images to the best segmentation tool on the fly and matches or beats fixed models across diverse benchmarks without training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GenCellAgent, a training-free framework that deploys large language model agents to coordinate specialist segmenters and vision-language models for cellular microscopy images. It operates through a planner-executor-evaluator loop with long-term memory that selects tools, adapts them using a few reference images when conditions change, evaluates output quality, and refines results via text prompts for new structures. If this holds, segmentation becomes practical for varied microscope types and cell morphologies without retraining or large annotation sets. A sympathetic reader would care because it addresses the core bottlenecks of modality heterogeneity and limited labeled data in quantitative biology.

Core claim

GenCellAgent orchestrates specialist segmenters and generalist vision-language models via a planner-executor-evaluator loop with long-term memory. The system automatically routes each image to the most suitable tool, adapts tool behavior on the fly with a small number of reference images when imaging conditions differ, supports text-guided segmentation of organelles not covered by existing models, and stores expert edits in memory to enable self-evolution and personalized workflows. Across seven cell-segmentation benchmarks spanning diverse microscopy modalities and totaling 4,718 images, this routing consistently matches or exceeds the best individual tool on every dataset while outperiping

What carries the argument

planner-executor-evaluator loop with long-term memory that routes images to tools, executes them, checks quality, and adapts via references or text prompts

Load-bearing premise

The LLM-based planner can reliably choose the optimal tool and the evaluator can accurately judge segmentation quality without introducing systematic errors or biases across heterogeneous modalities.

What would settle it

A new microscopy modality or organelle type where the agent repeatedly selects a tool that underperforms the best baseline even after supplying reference images and text guidance.

read the original abstract

Cellular image segmentation is essential for quantitative biology yet remains difficult due to heterogeneous modalities, morphological variability, and limited annotations. We present GenCellAgent, a training-free multi-agent framework that orchestrates specialist segmenters and generalist vision-language models via a planner-executor-evaluator loop (choose tool $\rightarrow$ run $\rightarrow$ quality-check) with long-term memory. The system (i) automatically routes images to the best tool, (ii) adapts on the fly using a few reference images when imaging conditions differ from what a tool expects, (iii) supports text-guided segmentation of organelles not covered by existing models, and (iv) commits expert edits to memory, enabling self-evolution and personalized workflows. Across seven cell-segmentation benchmarks spanning diverse microscopy modalities (4,718 images), this routing consistently matches or exceeds the best individual tool on every dataset and outperforms all baselines in overall accuracy. On out-of-distribution organelle data, GenCellAgent substantially outperforms specialist models that were not trained on the target domain, recovering structures that dedicated tools fail to detect. It also segments novel objects such as the Golgi apparatus via iterative text-guided refinement, with light human correction further boosting performance. Together, these capabilities provide a practical path to robust, adaptable cellular image segmentation without retraining, while reducing annotation burden and matching user preferences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GenCellAgent routes cell images through existing tools plus VLM feedback and memory to get training-free adaptation, but the evaluator step looks like the weakest link.

read the letter

The paper's core idea is a closed planner-executor-evaluator loop that picks among specialist segmenters, runs them, then uses a vision-language model to score the output and decide on text-guided fixes or tool switches, with a memory store that keeps successful edits for later use. That specific combination for cellular microscopy is new in the literature they cite. It targets a real pain point: labs often face new modalities or structures where no single pretrained model works well, and retraining is expensive. The claim that it matches or beats the best single tool on seven benchmarks covering 4718 images, plus better recovery on out-of-distribution organelles, is the kind of practical result that could matter if the numbers are clean. Text-guided refinement for things like the Golgi is a straightforward way to extend coverage without new labels. The memory-based self-evolution part is also a reasonable attempt at personalization. The soft spot is exactly the one the stress-test flags. Microscopy images have low-contrast edges, staining artifacts, and textures that current VLMs were not trained on, so the quality scorer could systematically over- or under-rate results and lock in bad choices or spurious memory updates. The abstract gives no error bars, no per-dataset breakdowns, no failure cases, and no ablation on how often the evaluator actually triggers useful changes versus noise. Without those details the headline result is hard to trust. This is for imaging biologists who want something that works out of the box across setups rather than for theorists. A serious referee should see it to check the implementation details and the actual statistical comparisons, even if the current evidence is thin.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces GenCellAgent, a training-free multi-agent system that uses a planner-executor-evaluator loop with long-term memory to route cellular images to specialist segmentation tools, adapt via few-shot references, perform text-guided refinement for novel structures, and commit edits for self-evolution. It claims that this routing matches or exceeds the best single tool on each of seven benchmarks (4718 images across modalities) and substantially outperforms specialist models on out-of-distribution organelle data by recovering missed structures.

Significance. If the central empirical claims hold after validation of the evaluator component, the work would be significant for demonstrating a practical, annotation-light approach to generalizable segmentation that combines existing tools without retraining and handles modality shifts and novel objects via LLM orchestration. The training-free and self-evolving aspects address real pain points in quantitative biology where new imaging conditions frequently arise.

major comments (3)

[Methods (planner-executor-evaluator loop description)] The headline result that routing matches or exceeds the best individual tool on all seven benchmarks (4718 images) and recovers structures on OOD organelle data is load-bearing for the paper's contribution, yet the manuscript provides no validation that the VLM-based evaluator produces quality scores that correlate with ground-truth metrics such as IoU or Dice on microscopy images. Low-contrast boundaries and staining artifacts common in these data could lead to systematic mis-scoring, undermining both tool selection and refinement decisions.
[§4] §4 (Results on benchmarks): the reported outperformance lacks accompanying statistical tests, error bars across multiple runs, or failure-mode analysis, making it impossible to determine whether observed gains are robust or driven by particular image subsets.
[§4.3 (OOD organelle experiments)] The text-guided refinement for organelles such as the Golgi apparatus is presented as a key capability, but the manuscript does not quantify how many iterations are typically required, the success rate of the VLM evaluator in triggering correct edits, or direct comparisons against baselines on the same OOD dataset.

minor comments (2)

[Methods] The notation for the long-term memory update rule and the exact prompt templates used for the evaluator are not fully specified, which would aid reproducibility.
[Figures] Figure captions for the qualitative examples should include the specific tool selected by the planner and the evaluator score for each panel to allow readers to trace the decision process.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which highlight important aspects for strengthening the empirical validation of GenCellAgent. We appreciate the recognition of the work's potential significance for training-free, generalizable segmentation in quantitative biology. Below we respond point-by-point to the major comments, indicating where revisions will be made to address the concerns.

read point-by-point responses

Referee: [Methods (planner-executor-evaluator loop description)] The headline result that routing matches or exceeds the best individual tool on all seven benchmarks (4718 images) and recovers structures on OOD organelle data is load-bearing for the paper's contribution, yet the manuscript provides no validation that the VLM-based evaluator produces quality scores that correlate with ground-truth metrics such as IoU or Dice on microscopy images. Low-contrast boundaries and staining artifacts common in these data could lead to systematic mis-scoring, undermining both tool selection and refinement decisions.

Authors: We agree that explicit validation of the VLM evaluator's scores against ground-truth metrics would strengthen the claims. While the end-to-end benchmark results provide indirect support for the evaluator's utility in tool selection and refinement, we acknowledge the risk of mis-scoring due to low-contrast boundaries or artifacts. In the revised manuscript we will add a dedicated analysis (in Methods or Supplementary Information) that computes correlation (Pearson and Spearman) between the evaluator's quality scores and IoU/Dice on a representative subset of images with available ground truth. This will directly address potential systematic biases. revision: yes
Referee: [§4] §4 (Results on benchmarks): the reported outperformance lacks accompanying statistical tests, error bars across multiple runs, or failure-mode analysis, making it impossible to determine whether observed gains are robust or driven by particular image subsets.

Authors: We thank the referee for this observation on statistical rigor. The current results report mean performance across the 4,718 images, but we recognize that statistical tests and variability measures would better demonstrate robustness. In revision we will add paired statistical tests (e.g., Wilcoxon signed-rank) comparing GenCellAgent to the best single tool per benchmark, report standard deviations or error bars where multiple LLM runs are feasible, and include a concise failure-mode analysis highlighting image characteristics associated with underperformance. revision: yes
Referee: [§4.3 (OOD organelle experiments)] The text-guided refinement for organelles such as the Golgi apparatus is presented as a key capability, but the manuscript does not quantify how many iterations are typically required, the success rate of the VLM evaluator in triggering correct edits, or direct comparisons against baselines on the same OOD dataset.

Authors: We concur that additional quantitative details on the refinement loop would clarify the practical value of this capability. In the revised §4.3 we will report (i) the average and distribution of iterations needed for convergence on the OOD organelle data, (ii) the success rate of the evaluator in correctly triggering edits that improve segmentation (measured against ground truth where available), and (iii) direct side-by-side comparisons with the specialist baselines on the identical OOD test set. These metrics will be presented in the main text or a supplementary table. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation on external benchmarks

full rationale

The paper describes a multi-agent framework for image segmentation and reports its performance via direct empirical comparisons against independent specialist tools and baselines across seven external datasets totaling 4718 images. No mathematical derivation, parameter fitting presented as prediction, or self-referential equations appear in the provided text. All central claims rest on observable routing accuracy, adaptation results, and out-of-distribution performance measured against ground-truth annotations from the benchmarks themselves, with no load-bearing self-citation chains or ansatz smuggling required to support the reported outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that LLM-based agents can accurately evaluate segmentation quality and guide refinement across unseen modalities using only generalist vision-language capabilities and limited references.

axioms (1)

domain assumption Existing specialist segmentation tools can be effectively selected and adapted by LLM agents for heterogeneous cellular images without domain-specific fine-tuning.
Invoked throughout the description of the planner-executor-evaluator loop and on-the-fly adaptation mechanism.

pith-pipeline@v0.9.0 · 5792 in / 1564 out tokens · 50721 ms · 2026-05-18T07:20:26.447101+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

planner–executor–evaluator loop (choose tool → run → quality-check) with long-term memory... style-aware matching... iterative prompt refinement
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

test-time scaling... N trials per iteration... evaluator score threshold

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CellScientist: Dual-Space Hierarchical Orchestration for Closed-Loop Refinement of Virtual Cell Models
cs.LG 2026-05 unverdicted novelty 6.0

CellScientist introduces a dual-space hierarchical orchestration system that enables closed-loop refinement of virtual cell models by routing execution discrepancies back to hypothesis or implementation updates, yield...
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories
cs.AI 2026-04 unverdicted novelty 5.0

AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · cited by 2 Pith papers · 5 internal anchors

[1]

In: 2012 Fifth International Symposium on Computational Intelligence and Design, vol

Xie, J., Yu, X., Zheng, X.: Biological cell image segmentation using novel hybrid morphology-based method. In: 2012 Fifth International Symposium on Computational Intelligence and Design, vol. 2, pp. 202–205 (2012). IEEE

work page 2012
[2]

In: 2015 International Conference on Automation, Mechanical Control and Computational Engineering, pp

Wang, B., Chen, M.: Application research on the analysis of biological detection image segmentation using pde. In: 2015 International Conference on Automation, Mechanical Control and Computational Engineering, pp. 749–753 (2015). Atlantis Press

work page 2015
[3]

In: TENCON 2003

Humnabadkar, K., Singh, S., Ghosh, D., Bora, P.: Unsupervised active contour model for biological image segmentation and analysis. In: TENCON 2003. Con- ference on Convergent Technologies for Asia-Pacific Region, vol. 2, pp. 538–542 (2003). IEEE

work page 2003
[4]

Nature Methods18(1), 100–106 (2021)

Stringer, C., Wang, T., Michaelos, M., Pachitariu, M.: Cellpose: a generalist algorithm for cellular segmentation. Nature Methods18(1), 100–106 (2021)

work page 2021
[5]

Nature Methods19(12), 1634–1641 (2022)

Pachitariu, M., Stringer, C.: Cellpose 2.0: how to train your own model. Nature Methods19(12), 1634–1641 (2022)

work page 2022
[6]

Nature Methods, 1–8 (2025)

Stringer, C., Pachitariu, M.: Cellpose3: one-click image restoration for improved cellular segmentation. Nature Methods, 1–8 (2025)

work page 2025
[7]

Communications Biology8(1), 962 (2025)

Zhang, X., Lin, Z., Wang, L., Chu, Y.S., Yang, Y., Xiao, X., Lin, Y., Liu, Q.: Swin- cell: a 3d transformer and flow-based framework for improved cell segmentation. Communications Biology8(1), 962 (2025)

work page 2025
[8]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y.,et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

work page 2023
[9]

20 BioRxiv, 2023–11 (2025)

Israel, U., Marks, M., Dilip, R., Li, Q., Yu, C., Laubscher, E., Iqbal, A., Pradhan, E., Ates, A., Abt, M., et al.: Cellsam: a foundation model for cell segmentation. 20 BioRxiv, 2023–11 (2025)

work page 2023
[10]

Nature Methods, 1–13 (2025)

Archit, A., Freckmann, L., Nair, S., Khalid, N., Hilt, P., Rajashekar, V., Freitag, M., Teuber, C., Buckley, G., Haaren, S., et al.: Segment anything for microscopy. Nature Methods, 1–13 (2025)

work page 2025
[11]

Nature Communications15(1), 654 (2024)

Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications15(1), 654 (2024)

work page 2024
[12]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

Zhao, Y., Bian, H., Mu, M., Uddin, M.R., Li, Z., Li, X., Wang, T., Xu, M.: Cryosam: Training-free cryoet tomogram segmentation with foundation models. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 124–134 (2024). Springer

work page 2024
[13]

Nature Methods (2025)

Jones, D.C., Elz, A.E., Hadadianpour, A., Ryu, H., Glass, D.R., Newell, E.W.: Cell simulation as cell segmentation. Nature Methods (2025)

work page 2025
[14]

Nature Methods, 1–13 (2025)

Lefebvre, A.E., Sturm, G., Lin, T.-Y., Stoops, E., López, M.P., Kaufmann-Malaga, B., Hake, K.: Nellie: automated organelle segmentation, tracking and hierarchical feature extraction in 2d/3d live-cell microscopy. Nature Methods, 1–13 (2025)

work page 2025
[15]

Nature Methods20(4), 569–579 (2023)

Lu, M., Christensen, C.N., Weber, J.M., Konno, T., Läubli, N.F., Scherer, K.M., Avezov, E., Lio, P., Lapkin, A.A., Kaminski Schierle, G.S.,et al.: Ernet: a tool for the semantic segmentation and quantitative analysis of endoplasmic reticulum topology. Nature Methods20(4), 569–579 (2023)

work page 2023
[16]

Cell Systems14(1), 7–8 (2023)

Glancy, B.: Mitonet: A generalizable model for segmentation of individual mitochondria within electron microscopy datasets. Cell Systems14(1), 7–8 (2023)

work page 2023
[17]

Nature Methods21(8), 1371–1373 (2024)

Royer, L.A.: Omega—harnessing the power of large language models for bioimage analysis. Nature Methods21(8), 1371–1373 (2024)

work page 2024
[18]

Microscopy and Microanalysis28(S1), 1576–1577 (2022)

Chiu, C.-L., Clack, N.,et al.: Napari: a python multi-dimensional image viewer platform for the research community. Microscopy and Microanalysis28(S1), 1576–1577 (2022)

work page 2022
[19]

io chatbot: a community-driven ai assistant for integrative computational bioimaging

Lei, W., Fuster-Barceló, C., Reder, G., Muñoz-Barrutia, A., Ouyang, W.: Bioim- age. io chatbot: a community-driven ai assistant for integrative computational bioimaging. nature methods21(8), 1368–1370 (2024)

work page 2024
[20]

arXiv preprint arXiv:2407.09811 (2024)

Xiao, Y., Liu, J., Zheng, Y., Xie, X., Hao, J., Li, M., Wang, R., Ni, F., Li, Y., Luo, J., et al.: Cellagent: An llm-driven multi-agent framework for automated single-cell data analysis. arXiv preprint arXiv:2407.09811 (2024)

work page arXiv 2024
[21]

Nature methods18(9), 1038–1045 (2021) 21

Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.: Livecell—a large-scale dataset for label-free live cell segmentation. Nature methods18(9), 1038–1045 (2021) 21

work page 2021
[22]

Nature biotechnology40(4), 555–565 (2022)

Greenwald, N.F., Miller, G., Moen, E., Kong, A., Kagel, A., Dougherty, T., Fullaway, C.C., McIntosh, B.J., Leow, K.X., Schwartz, M.S.,et al.: Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nature biotechnology40(4), 555–565 (2022)

work page 2022
[23]

Elife9, 57613 (2020)

Wolny, A., Cerrone, L., Vijayan, A., Tofanelli, R., Barro, A.V., Louveaux, M., Wenzl, C., Strauss, S., Wilson-Sánchez, D., Lymbouridou, R.,et al.: Accurate and versatile 3d segmentation of plant tissues at cellular resolution. Elife9, 57613 (2020)

work page 2020
[24]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Graham, S., Jahanifar, M., Azam, A., Nimir, M., Tsang, Y.-W., Dodd, K., Hero, E., Sahota, H., Tank, A., Benes, K.,et al.: Lizard: a large-scale dataset for colonic nuclear instance segmentation and classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 684–693 (2021)

work page 2021
[25]

International journal of computer vision88(2), 303–338 (2010)

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International journal of computer vision88(2), 303–338 (2010)

work page 2010
[26]

Janelia Research Campus (2024)

Team, C.P., Ackerman, D., Ahrens, M.B., Aso, Y., Avetissian, E., Bennett, D., et al.: CellMap 2024 Segmentation Challenge. Janelia Research Campus (2024). https://doi.org/10.25378/janelia.c.7456966

work page doi:10.25378/janelia.c.7456966 2024
[27]

Seggpt: Segmenting everything in context,

Wang, X., Zhang, X., Cao, Y., Wang, W., Shen, C., Huang, T.: Seggpt: Segmenting everything in context. arXiv preprint arXiv:2304.03284 (2023)

work page arXiv 2023
[28]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Lai, X., Tian, Z., Chen, Y., Li, Y., Yuan, Y., Liu, S., Jia, J.: Lisa: Reasoning seg- mentation via large language model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9579–9589 (2024)

work page 2024
[29]

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Luo, J., Zhang, W., Yuan, Y., Zhao, Y., Yang, J., Gu, Y., Wu, B., Chen, B., Qiao, Z., Long, Q., et al.: Large language model agent: A survey on methodology, applications and challenges. arXiv preprint arXiv:2503.21460 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

ACM Transactions on Information Systems (2024)

Zhang, Z., Dai, Q., Bo, X., Ma, C., Li, R., Chen, X., Zhu, J., Dong, Z., Wen, J.-R.: A survey on the memory mechanism of large language model based agents. ACM Transactions on Information Systems (2024)

work page 2024
[31]

Understanding the planning of LLM agents: A survey

Huang, X., Liu, W., Chen, X., Wang, X., Wang, H., Lian, D., Wang, Y., Tang, R., Chen, E.: Understanding the planning of llm agents: A survey, 2024. URL https://arxiv. org/abs/2402.02716

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., Zhang, X.: Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024) 22

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

NEJM AI2(1), 2400555 (2025)

Ifargan, T., Hafner, L., Kern, M., Alcalay, O., Kishony, R.: Autonomous llm- driven research—from data to human-verifiable research papers. NEJM AI2(1), 2400555 (2025)

work page 2025
[34]

arXiv preprint arXiv:2505.13259 , year =

Zheng, T., Deng, Z., Tsang, H.T., Wang, W., Bai, J., Wang, Z., Song, Y.: From automation to autonomy: A survey on large language models in scientific discovery. arXiv preprint arXiv:2505.13259 (2025)

work page arXiv 2025
[35]

arXiv preprint arXiv:2502.06111 (2025)

Xiao, Y., Wang, R., Kong, L., Golac, D., Wang, W.: Csr-bench: Benchmarking llm agents in deployment of computer science research repositories. arXiv preprint arXiv:2502.06111 (2025)

work page arXiv 2025
[36]

Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

Gridach, M., Nanavati, J., Abidine, K.Z.E., Mendes, L., Mack, C.: Agentic ai for scientific discovery: A survey of progress, challenges, and future directions. arXiv preprint arXiv:2503.08979 (2025)

work page arXiv 2025
[37]

Journal of the American Chemical Society147(15), 12534–12545 (2025)

Song, T., Luo, M., Zhang, X., Chen, L., Huang, Y., Cao, J., Zhu, Q., Liu, D., Zhang, B., Zou, G.,et al.: A multiagent-driven robotic ai chemist enabling autonomous chemical research on demand. Journal of the American Chemical Society147(15), 12534–12545 (2025)

work page 2025
[38]

Agent Laboratory: Using LLM Agents as Research Assistants

Schmidgall, S., Su, Y., Wang, Z., Sun, X., Wu, J., Yu, X., Liu, J., Moor, M., Liu, Z., Barsoum, E.: Agent laboratory: Using llm agents as research assistants. arXiv preprint arXiv:2501.04227 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Advanced Materials37(22), 2413523 (2025)

Ghafarollahi, A., Buehler, M.J.: Sciagents: automating scientific discovery through bioinspired multi-agent intelligent graph reasoning. Advanced Materials37(22), 2413523 (2025)

work page 2025
[40]

arXiv preprint arXiv:2409.00054 (2024)

Hu, Y., Liu, D., Wang, Q., Yu, C., Xu, C., Zheng, Q., Ji, H., Xiong, J.: Automating intervention discovery from scientific literature: A progressive ontology prompting and dual-llm framework. arXiv preprint arXiv:2409.00054 (2024)

work page arXiv 2024
[41]

A vision for auto research with llm agents.arXiv preprint arXiv:2504.18765, 2025

Liu, C., Wang, C., Cao, J., Ge, J., Wang, K., Zhang, L., Cheng, M.-M., Zhao, P., Li, T., Jia, X., et al.: A vision for auto research with llm agents. arXiv preprint arXiv:2504.18765 (2025)

work page arXiv 2025
[42]

Nature Biomedical Engineering, 1–14 (2025)

Qu, Y., Huang, K., Yin, M., Zhan, K., Liu, D., Yin, D., Cousins, H.C., Johnson, W.A., Wang, X., Shah, M., et al.: Crispr-gpt for agentic automation of gene-editing experiments. Nature Biomedical Engineering, 1–14 (2025)

work page 2025
[43]

biorxiv, 2025–05 (2025)

Huang, K., Zhang, S., Wang, H., Qu, Y., Lu, Y., Roohani, Y., Li, R., Qiu, L., Li, G., Zhang, J., et al.: Biomni: A general-purpose biomedical ai agent. biorxiv, 2025–05 (2025)

work page 2025
[44]

In: Proceedings of the IEEE Conference on Computer Vision 23 and Pattern Recognition, pp

Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision 23 and Pattern Recognition, pp. 2414–2423 (2016)

work page 2016
[45]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[46]

Advances in neural information processing systems 25(2012) 24 Appendix A Tools Repository The repository integrates multiple categories of tools

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25(2012) 24 Appendix A Tools Repository The repository integrates multiple categories of tools. The primary LLMused is Gemini-2.0-Flash, while the evaluation agent employs Gemini-2.5-Flash-Preview-...

work page 2012
[47]

Analyze the query, previous reasoning steps, and observations. 26

work page
[48]

Decide on the next action: use a tool or provide a final answer

work page
[49]

thought":

Respond in the following JSON format: If you need to use a tool: {{ "thought": "Your detailed reasoning about what to do next", "action": {{ "name": "Tool name (google, imagesegmentation, oneshotsegmentation, segmentationevaluation)", "reason": "Explanation of why you chose this tool", "input": "Specific input for the tool, if different from the original ...

work page
[50]

**Reference Image 1** A poor segmentation mask (score = 0)

work page
[51]

**Reference Image 2** Another poor segmentation mask (score = 0)

work page
[52]

33 Use the first two images as context examples to understand what poor segmentation looks like

**Evaluation Image** A new segmentation mask that needs to be evaluated. 33 Use the first two images as context examples to understand what poor segmentation looks like. Then evaluate the third image according to the criteria below. --- ### Evaluation Criteria (0100 scoring scale, with weights):

work page
[53]

**Stacked Morphology** (Weight: 0.35) Assess how well the membrane layers are organized and stacked in the segmentation

work page
[54]

**Cisternae Definition** (Weight: 0.25) Evaluate the clarity, separation, and recognizable structure of cisternae in the segmentation

work page
[55]

**Overall Cohesion** (Weight: 0.2) Does the segmentation appear connected, logical, and anatomically plausible as a whole?

work page
[56]

ReviewScore

**Segmentation Cleanliness** (Weight: 0.2) Check for artifacts, stray regions, or noise that detracts from the clarity of the segmentation. --- ### Reference Image 1 (Score = 0) This segmentation mask performs poorly across all evaluation criteria, as it incorrectly labels the entire image area as segmented, without distinguishing relevant structures from...

work page
[57]

Describe the segmentation process used in the current run

work page
[58]

List the tools used in order and what each contributed

work page
[59]

one-shot segmentation) - Feedback frequency - Number of iterations

Summarize the users interaction behavior: - Use of automatic vs manual tools - Use of references (e.g. one-shot segmentation) - Feedback frequency - Number of iterations

work page
[60]

Based on this run alone, recommend: [CURRENT RUN] Recommended HITL Mode: <Fully Automatic | Reference Guided | Human Interaction> Reason: <why this HITL mode fits this specific run> --- ## PART 2: Long-Term User Profile and Final Recommendation

work page
[61]

Review the historical HITL recommendations and detect **behavioral trends**: - Is the user becoming more or less interactive over time? - Are they consistently using the same tools or exploring new ones? - Are they gradually shifting from automation to correction (or vice versa)?

work page
[62]

Consistently prefers fully automated workflows with minimal feedback

Generate a long-term **User Profile** considering both the current and past sessions. Example profiles: - "Consistently prefers fully automated workflows with minimal feedback." - "Has evolved from reference-based guidance to more manual correction." - "Initially used correction tools but now prefers faster automatic approaches."

work page
[63]

The user increasingly engages with manual tools

Provide the final recommendation: [OVERALL RECOMMENDATION] Recommended HITL Mode: <Fully Automatic | Reference Guided | Human Interaction> User Profile: <summary across runs that includes progression or consistency> Reason: <why this mode is appropriate based on the pattern across sessions> -- ## Guidance: 37 - If the tool`oneshotsegmentation`was used in ...

work page
[64]

Summarize the Visual Characteristics described across the search content

work page
[65]

Help me segment the mitochondrion in the provided image. Please use MitoNet

Generate a Segmentation Prompt that could be used to guide a visual segmentation tool based on those characteristics. {search content} Output format:\n" ### Visual Characteristics Summary ### [your summary here] ### Segmentation Prompt ### [your segmentation prompt here] Listing 6: Search Summarize Prompt Appendix D More Details for Human Interactions The...

work page

[1] [1]

In: 2012 Fifth International Symposium on Computational Intelligence and Design, vol

Xie, J., Yu, X., Zheng, X.: Biological cell image segmentation using novel hybrid morphology-based method. In: 2012 Fifth International Symposium on Computational Intelligence and Design, vol. 2, pp. 202–205 (2012). IEEE

work page 2012

[2] [2]

In: 2015 International Conference on Automation, Mechanical Control and Computational Engineering, pp

Wang, B., Chen, M.: Application research on the analysis of biological detection image segmentation using pde. In: 2015 International Conference on Automation, Mechanical Control and Computational Engineering, pp. 749–753 (2015). Atlantis Press

work page 2015

[3] [3]

In: TENCON 2003

Humnabadkar, K., Singh, S., Ghosh, D., Bora, P.: Unsupervised active contour model for biological image segmentation and analysis. In: TENCON 2003. Con- ference on Convergent Technologies for Asia-Pacific Region, vol. 2, pp. 538–542 (2003). IEEE

work page 2003

[4] [4]

Nature Methods18(1), 100–106 (2021)

Stringer, C., Wang, T., Michaelos, M., Pachitariu, M.: Cellpose: a generalist algorithm for cellular segmentation. Nature Methods18(1), 100–106 (2021)

work page 2021

[5] [5]

Nature Methods19(12), 1634–1641 (2022)

Pachitariu, M., Stringer, C.: Cellpose 2.0: how to train your own model. Nature Methods19(12), 1634–1641 (2022)

work page 2022

[6] [6]

Nature Methods, 1–8 (2025)

Stringer, C., Pachitariu, M.: Cellpose3: one-click image restoration for improved cellular segmentation. Nature Methods, 1–8 (2025)

work page 2025

[7] [7]

Communications Biology8(1), 962 (2025)

Zhang, X., Lin, Z., Wang, L., Chu, Y.S., Yang, Y., Xiao, X., Lin, Y., Liu, Q.: Swin- cell: a 3d transformer and flow-based framework for improved cell segmentation. Communications Biology8(1), 962 (2025)

work page 2025

[8] [8]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y.,et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

work page 2023

[9] [9]

20 BioRxiv, 2023–11 (2025)

Israel, U., Marks, M., Dilip, R., Li, Q., Yu, C., Laubscher, E., Iqbal, A., Pradhan, E., Ates, A., Abt, M., et al.: Cellsam: a foundation model for cell segmentation. 20 BioRxiv, 2023–11 (2025)

work page 2023

[10] [10]

Nature Methods, 1–13 (2025)

Archit, A., Freckmann, L., Nair, S., Khalid, N., Hilt, P., Rajashekar, V., Freitag, M., Teuber, C., Buckley, G., Haaren, S., et al.: Segment anything for microscopy. Nature Methods, 1–13 (2025)

work page 2025

[11] [11]

Nature Communications15(1), 654 (2024)

Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications15(1), 654 (2024)

work page 2024

[12] [12]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

Zhao, Y., Bian, H., Mu, M., Uddin, M.R., Li, Z., Li, X., Wang, T., Xu, M.: Cryosam: Training-free cryoet tomogram segmentation with foundation models. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 124–134 (2024). Springer

work page 2024

[13] [13]

Nature Methods (2025)

Jones, D.C., Elz, A.E., Hadadianpour, A., Ryu, H., Glass, D.R., Newell, E.W.: Cell simulation as cell segmentation. Nature Methods (2025)

work page 2025

[14] [14]

Nature Methods, 1–13 (2025)

Lefebvre, A.E., Sturm, G., Lin, T.-Y., Stoops, E., López, M.P., Kaufmann-Malaga, B., Hake, K.: Nellie: automated organelle segmentation, tracking and hierarchical feature extraction in 2d/3d live-cell microscopy. Nature Methods, 1–13 (2025)

work page 2025

[15] [15]

Nature Methods20(4), 569–579 (2023)

Lu, M., Christensen, C.N., Weber, J.M., Konno, T., Läubli, N.F., Scherer, K.M., Avezov, E., Lio, P., Lapkin, A.A., Kaminski Schierle, G.S.,et al.: Ernet: a tool for the semantic segmentation and quantitative analysis of endoplasmic reticulum topology. Nature Methods20(4), 569–579 (2023)

work page 2023

[16] [16]

Cell Systems14(1), 7–8 (2023)

Glancy, B.: Mitonet: A generalizable model for segmentation of individual mitochondria within electron microscopy datasets. Cell Systems14(1), 7–8 (2023)

work page 2023

[17] [17]

Nature Methods21(8), 1371–1373 (2024)

Royer, L.A.: Omega—harnessing the power of large language models for bioimage analysis. Nature Methods21(8), 1371–1373 (2024)

work page 2024

[18] [18]

Microscopy and Microanalysis28(S1), 1576–1577 (2022)

Chiu, C.-L., Clack, N.,et al.: Napari: a python multi-dimensional image viewer platform for the research community. Microscopy and Microanalysis28(S1), 1576–1577 (2022)

work page 2022

[19] [19]

io chatbot: a community-driven ai assistant for integrative computational bioimaging

Lei, W., Fuster-Barceló, C., Reder, G., Muñoz-Barrutia, A., Ouyang, W.: Bioim- age. io chatbot: a community-driven ai assistant for integrative computational bioimaging. nature methods21(8), 1368–1370 (2024)

work page 2024

[20] [20]

arXiv preprint arXiv:2407.09811 (2024)

Xiao, Y., Liu, J., Zheng, Y., Xie, X., Hao, J., Li, M., Wang, R., Ni, F., Li, Y., Luo, J., et al.: Cellagent: An llm-driven multi-agent framework for automated single-cell data analysis. arXiv preprint arXiv:2407.09811 (2024)

work page arXiv 2024

[21] [21]

Nature methods18(9), 1038–1045 (2021) 21

Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.: Livecell—a large-scale dataset for label-free live cell segmentation. Nature methods18(9), 1038–1045 (2021) 21

work page 2021

[22] [22]

Nature biotechnology40(4), 555–565 (2022)

Greenwald, N.F., Miller, G., Moen, E., Kong, A., Kagel, A., Dougherty, T., Fullaway, C.C., McIntosh, B.J., Leow, K.X., Schwartz, M.S.,et al.: Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nature biotechnology40(4), 555–565 (2022)

work page 2022

[23] [23]

Elife9, 57613 (2020)

Wolny, A., Cerrone, L., Vijayan, A., Tofanelli, R., Barro, A.V., Louveaux, M., Wenzl, C., Strauss, S., Wilson-Sánchez, D., Lymbouridou, R.,et al.: Accurate and versatile 3d segmentation of plant tissues at cellular resolution. Elife9, 57613 (2020)

work page 2020

[24] [24]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Graham, S., Jahanifar, M., Azam, A., Nimir, M., Tsang, Y.-W., Dodd, K., Hero, E., Sahota, H., Tank, A., Benes, K.,et al.: Lizard: a large-scale dataset for colonic nuclear instance segmentation and classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 684–693 (2021)

work page 2021

[25] [25]

International journal of computer vision88(2), 303–338 (2010)

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International journal of computer vision88(2), 303–338 (2010)

work page 2010

[26] [26]

Janelia Research Campus (2024)

Team, C.P., Ackerman, D., Ahrens, M.B., Aso, Y., Avetissian, E., Bennett, D., et al.: CellMap 2024 Segmentation Challenge. Janelia Research Campus (2024). https://doi.org/10.25378/janelia.c.7456966

work page doi:10.25378/janelia.c.7456966 2024

[27] [27]

Seggpt: Segmenting everything in context,

Wang, X., Zhang, X., Cao, Y., Wang, W., Shen, C., Huang, T.: Seggpt: Segmenting everything in context. arXiv preprint arXiv:2304.03284 (2023)

work page arXiv 2023

[28] [28]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Lai, X., Tian, Z., Chen, Y., Li, Y., Yuan, Y., Liu, S., Jia, J.: Lisa: Reasoning seg- mentation via large language model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9579–9589 (2024)

work page 2024

[29] [29]

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Luo, J., Zhang, W., Yuan, Y., Zhao, Y., Yang, J., Gu, Y., Wu, B., Chen, B., Qiao, Z., Long, Q., et al.: Large language model agent: A survey on methodology, applications and challenges. arXiv preprint arXiv:2503.21460 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

ACM Transactions on Information Systems (2024)

Zhang, Z., Dai, Q., Bo, X., Ma, C., Li, R., Chen, X., Zhu, J., Dong, Z., Wen, J.-R.: A survey on the memory mechanism of large language model based agents. ACM Transactions on Information Systems (2024)

work page 2024

[31] [31]

Understanding the planning of LLM agents: A survey

Huang, X., Liu, W., Chen, X., Wang, X., Wang, H., Lian, D., Wang, Y., Tang, R., Chen, E.: Understanding the planning of llm agents: A survey, 2024. URL https://arxiv. org/abs/2402.02716

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [32]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., Zhang, X.: Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024) 22

work page internal anchor Pith review Pith/arXiv arXiv 2024

[33] [33]

NEJM AI2(1), 2400555 (2025)

Ifargan, T., Hafner, L., Kern, M., Alcalay, O., Kishony, R.: Autonomous llm- driven research—from data to human-verifiable research papers. NEJM AI2(1), 2400555 (2025)

work page 2025

[34] [34]

arXiv preprint arXiv:2505.13259 , year =

Zheng, T., Deng, Z., Tsang, H.T., Wang, W., Bai, J., Wang, Z., Song, Y.: From automation to autonomy: A survey on large language models in scientific discovery. arXiv preprint arXiv:2505.13259 (2025)

work page arXiv 2025

[35] [35]

arXiv preprint arXiv:2502.06111 (2025)

Xiao, Y., Wang, R., Kong, L., Golac, D., Wang, W.: Csr-bench: Benchmarking llm agents in deployment of computer science research repositories. arXiv preprint arXiv:2502.06111 (2025)

work page arXiv 2025

[36] [36]

Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

Gridach, M., Nanavati, J., Abidine, K.Z.E., Mendes, L., Mack, C.: Agentic ai for scientific discovery: A survey of progress, challenges, and future directions. arXiv preprint arXiv:2503.08979 (2025)

work page arXiv 2025

[37] [37]

Journal of the American Chemical Society147(15), 12534–12545 (2025)

Song, T., Luo, M., Zhang, X., Chen, L., Huang, Y., Cao, J., Zhu, Q., Liu, D., Zhang, B., Zou, G.,et al.: A multiagent-driven robotic ai chemist enabling autonomous chemical research on demand. Journal of the American Chemical Society147(15), 12534–12545 (2025)

work page 2025

[38] [38]

Agent Laboratory: Using LLM Agents as Research Assistants

Schmidgall, S., Su, Y., Wang, Z., Sun, X., Wu, J., Yu, X., Liu, J., Moor, M., Liu, Z., Barsoum, E.: Agent laboratory: Using llm agents as research assistants. arXiv preprint arXiv:2501.04227 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[39] [39]

Advanced Materials37(22), 2413523 (2025)

Ghafarollahi, A., Buehler, M.J.: Sciagents: automating scientific discovery through bioinspired multi-agent intelligent graph reasoning. Advanced Materials37(22), 2413523 (2025)

work page 2025

[40] [40]

arXiv preprint arXiv:2409.00054 (2024)

Hu, Y., Liu, D., Wang, Q., Yu, C., Xu, C., Zheng, Q., Ji, H., Xiong, J.: Automating intervention discovery from scientific literature: A progressive ontology prompting and dual-llm framework. arXiv preprint arXiv:2409.00054 (2024)

work page arXiv 2024

[41] [41]

A vision for auto research with llm agents.arXiv preprint arXiv:2504.18765, 2025

Liu, C., Wang, C., Cao, J., Ge, J., Wang, K., Zhang, L., Cheng, M.-M., Zhao, P., Li, T., Jia, X., et al.: A vision for auto research with llm agents. arXiv preprint arXiv:2504.18765 (2025)

work page arXiv 2025

[42] [42]

Nature Biomedical Engineering, 1–14 (2025)

Qu, Y., Huang, K., Yin, M., Zhan, K., Liu, D., Yin, D., Cousins, H.C., Johnson, W.A., Wang, X., Shah, M., et al.: Crispr-gpt for agentic automation of gene-editing experiments. Nature Biomedical Engineering, 1–14 (2025)

work page 2025

[43] [43]

biorxiv, 2025–05 (2025)

Huang, K., Zhang, S., Wang, H., Qu, Y., Lu, Y., Roohani, Y., Li, R., Qiu, L., Li, G., Zhang, J., et al.: Biomni: A general-purpose biomedical ai agent. biorxiv, 2025–05 (2025)

work page 2025

[44] [44]

In: Proceedings of the IEEE Conference on Computer Vision 23 and Pattern Recognition, pp

Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision 23 and Pattern Recognition, pp. 2414–2423 (2016)

work page 2016

[45] [45]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[46] [46]

Advances in neural information processing systems 25(2012) 24 Appendix A Tools Repository The repository integrates multiple categories of tools

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25(2012) 24 Appendix A Tools Repository The repository integrates multiple categories of tools. The primary LLMused is Gemini-2.0-Flash, while the evaluation agent employs Gemini-2.5-Flash-Preview-...

work page 2012

[47] [47]

Analyze the query, previous reasoning steps, and observations. 26

work page

[48] [48]

Decide on the next action: use a tool or provide a final answer

work page

[49] [49]

thought":

Respond in the following JSON format: If you need to use a tool: {{ "thought": "Your detailed reasoning about what to do next", "action": {{ "name": "Tool name (google, imagesegmentation, oneshotsegmentation, segmentationevaluation)", "reason": "Explanation of why you chose this tool", "input": "Specific input for the tool, if different from the original ...

work page

[50] [50]

**Reference Image 1** A poor segmentation mask (score = 0)

work page

[51] [51]

**Reference Image 2** Another poor segmentation mask (score = 0)

work page

[52] [52]

33 Use the first two images as context examples to understand what poor segmentation looks like

**Evaluation Image** A new segmentation mask that needs to be evaluated. 33 Use the first two images as context examples to understand what poor segmentation looks like. Then evaluate the third image according to the criteria below. --- ### Evaluation Criteria (0100 scoring scale, with weights):

work page

[53] [53]

**Stacked Morphology** (Weight: 0.35) Assess how well the membrane layers are organized and stacked in the segmentation

work page

[54] [54]

**Cisternae Definition** (Weight: 0.25) Evaluate the clarity, separation, and recognizable structure of cisternae in the segmentation

work page

[55] [55]

**Overall Cohesion** (Weight: 0.2) Does the segmentation appear connected, logical, and anatomically plausible as a whole?

work page

[56] [56]

ReviewScore

**Segmentation Cleanliness** (Weight: 0.2) Check for artifacts, stray regions, or noise that detracts from the clarity of the segmentation. --- ### Reference Image 1 (Score = 0) This segmentation mask performs poorly across all evaluation criteria, as it incorrectly labels the entire image area as segmented, without distinguishing relevant structures from...

work page

[57] [57]

Describe the segmentation process used in the current run

work page

[58] [58]

List the tools used in order and what each contributed

work page

[59] [59]

one-shot segmentation) - Feedback frequency - Number of iterations

Summarize the users interaction behavior: - Use of automatic vs manual tools - Use of references (e.g. one-shot segmentation) - Feedback frequency - Number of iterations

work page

[60] [60]

Based on this run alone, recommend: [CURRENT RUN] Recommended HITL Mode: <Fully Automatic | Reference Guided | Human Interaction> Reason: <why this HITL mode fits this specific run> --- ## PART 2: Long-Term User Profile and Final Recommendation

work page

[61] [61]

Review the historical HITL recommendations and detect **behavioral trends**: - Is the user becoming more or less interactive over time? - Are they consistently using the same tools or exploring new ones? - Are they gradually shifting from automation to correction (or vice versa)?

work page

[62] [62]

Consistently prefers fully automated workflows with minimal feedback

Generate a long-term **User Profile** considering both the current and past sessions. Example profiles: - "Consistently prefers fully automated workflows with minimal feedback." - "Has evolved from reference-based guidance to more manual correction." - "Initially used correction tools but now prefers faster automatic approaches."

work page

[63] [63]

The user increasingly engages with manual tools

Provide the final recommendation: [OVERALL RECOMMENDATION] Recommended HITL Mode: <Fully Automatic | Reference Guided | Human Interaction> User Profile: <summary across runs that includes progression or consistency> Reason: <why this mode is appropriate based on the pattern across sessions> -- ## Guidance: 37 - If the tool`oneshotsegmentation`was used in ...

work page

[64] [64]

Summarize the Visual Characteristics described across the search content

work page

[65] [65]

Help me segment the mitochondrion in the provided image. Please use MitoNet

Generate a Segmentation Prompt that could be used to guide a visual segmentation tool based on those characteristics. {search content} Output format:\n" ### Visual Characteristics Summary ### [your summary here] ### Segmentation Prompt ### [your segmentation prompt here] Listing 6: Search Summarize Prompt Appendix D More Details for Human Interactions The...

work page