ConnectomeBench2: A Unified Benchmark for Automated Connectomic Proofreading

Edward S. Boyden; Gleb Razgar; Jeff Brown; Tim Farkas

arxiv: 2606.21116 · v1 · pith:4D65NNWJnew · submitted 2026-06-19 · 💻 cs.CV · cs.AI

ConnectomeBench2: A Unified Benchmark for Automated Connectomic Proofreading

Jeff Brown , Tim Farkas , Gleb Razgar , Edward S. Boyden This is my paper

Pith reviewed 2026-06-26 14:42 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords connectomicsproofreadingvision transformersplit error correctionmerge error identificationelectron microscopymesh geometrymulti-species benchmark

0 comments

The pith

A single Vision Transformer trained on ConnectomeBench2 reaches human-level accuracy on split and merge error correction across four species.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper releases ConnectomeBench2, a dataset of more than 716,000 expert proofreading decisions spanning mouse, human, zebrafish, and fly connectomes along with over 4.5 million associated images. A Vision Transformer model that shares encoders for mesh geometry and electron microscopy images is trained on this data and matches human performance on correcting segmentation errors. The work also reports that the model stays well-calibrated inside its training distribution and that distribution-distance metrics forecast where accuracy drops on new data. Connectomics-specific pretraining and active-learning sample selection are shown to lower the labeling cost for extending the approach to additional species or regions.

Core claim

ConnectomeBench2 supplies a unified, multi-species collection of expert-labeled proofreading decisions that lets one Vision Transformer architecture, using shared encoders for mesh geometry and electron microscopy, reach human-level accuracy on both split-error correction and merge-error identification across four connectomes, with accuracy scaling by data volume and input modality.

What carries the argument

Vision Transformer with shared encoders for mesh geometry and electron microscopy images, trained on the ConnectomeBench2 dataset of expert proofreading decisions.

If this is right

Accuracy on proofreading tasks increases with larger training data size and additional imaging modalities.
The model remains well-calibrated inside its training distribution.
Measures of distribution distance between training and test data predict drops in both calibration and accuracy on unseen connectomes.
Connectomics-specific pretraining combined with active-learning sample selection can reduce the expert labeling effort required to adapt the model to new species or regions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the scaling and calibration properties hold, automated proofreading could remove the main bottleneck that currently limits synapse-resolution connectomics to small volumes.
The benchmark dataset itself may become a standard testbed for any future vision model intended for 3D segmentation repair tasks.
Active-learning loops built on the released code could let new labs bootstrap proofreading models for their own species with far fewer than 716,000 new labels.

Load-bearing premise

Expert-labeled proofreading decisions form reliable ground truth that generalizes across the four species and both split and merge error types.

What would settle it

A new test set from an unseen species or brain region where the model’s accuracy on split or merge decisions falls substantially below the human expert baseline reported in the paper.

Figures

Figures reproduced from arXiv: 2606.21116 by Edward S. Boyden, Gleb Razgar, Jeff Brown, Tim Farkas.

**Figure 8.** Figure 8 [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 2.** Figure 2: (a) ROC curve for the joint Split-Error / Merge-Error classifier on the held-out test set. Per-sample rows aggregate views to one prediction per operation; Per-image rows score each view independently. Brackets are 95% cluster-bootstrap confidence intervals. (b) Balanced accuracy (left axis) and mIoU (right axis) versus number of training operations on a log x axis. bAcc is shown at two granularities: Per-… view at source ↗

**Figure 3.** Figure 3: Qualitative examples on held-out examples from the test set. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Expected calibration error (ECE; 15-bin adaptive; left axis) and normalized sharpness Var(p)/[¯p(1 − p¯)] (right axis) versus the number of training operations, by input modality (Mesh, EM). Bands are ±1 SD across training seeds. Seeds per scale: 1 at all, 5 at 100, 3 at 1,000, 3 at 10,000, 2 at 50,000. Single-seed points are drawn as diamonds without bands. (b) Reliability curves for the joint classif… view at source ↗

**Figure 5.** Figure 5: (a) Self-supervised backbone scaling on Split Error. Per-image balanced accuracy (bAcc) as a function of the number of finetune operations on a log x axis. Each curve is the unweighted mean across the four species (mouse, fly, zebrafish, human); bands are ±1 SE across species. Backbones: random init, ImageNet ViT-L, dt-DINO foundation model, and the LOSO foundation model (species held out at training time,… view at source ↗

**Figure 6.** Figure 6: Channel Decomposition of Mouse Training Samples. Two examples are shown: a synapse [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Data distribution by species and task. While within task, labels are roughly balanced, [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Mask IoU predicts supervoxel-split quality. Per-sample sv_iou (3D supervoxel IoU between the mask-driven multicut and the human-seed multicut, label-invariant) versus mask_iou_mean (per-view label-invariant 2D mask IoU, averaged across front/side/top). Light blue: n=94 split-OK junction samples. Black: bucketed mean ±1 standard deviation. Pearson r=0.671, Spearman ρ=0.422. The bucketed mean rises monotonic… view at source ↗

**Figure 9.** Figure 9: Self-supervised pretraining sweep. Seven architectures pre-trained on the same 70k 5-species mesh corpus for 30 epochs, frozen, and evaluated with a small MLP probe on the mouse endpoint-correction + junction-identification task. Panels: ROC-AUC (higher better), balanced accuracy (higher better), 10-bin ECE (lower better) as a function of labelled training samples (log scale). Curves are means over 5 rando… view at source ↗

**Figure 10.** Figure 10: Merge error correction examples. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

read the original abstract

Proofreading--correcting segmentation errors in 3D brain reconstructions--is the rate-limiting step in synapse-resolution connectomics. We release ConnectomeBench2, a unified multi-species dataset of over 716,485 expert-labeled proofreading decisions with >4,500,000 associated images spanning four major open connectomes (mouse, human, zebrafish, fly), spanning both split and merge error correction. Trained on this dataset, a single Vision Transformer with shared encoders for mesh geometry and electron microscopy reaches human-level accuracy across species for split error correction and merge error identification, with performance scaling with data size and modality. Beyond accuracy, we show that the model is well-calibrated within distribution, that measures of distribution distance predict where calibration and accuracy will degrade on unseen data, and that connectomics-specific pretraining and active learning-based sample selection show potential to substantially reduce the labeling effort needed to extend to new species and brain regions. The benchmark provides the infrastructure to train and evaluate increasingly capable vision models for connectomic proofreading. Data and code availability. The ConnectomeBench2 dataset is released on Hugging Face at https://huggingface.co/datasets/jeffbbrown2/ConnectomeBench2. The accompanying codebase is available on GitHub at https://github.com/timfarkas/ConnectomeBench2.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ConnectomeBench2's main asset is the released multi-species dataset and code; the human-level claims look plausible but rest on expert labels whose consistency across annotators and species is not clearly shown.

read the letter

The paper's clearest contribution is ConnectomeBench2 itself: a dataset of 716k+ expert proofreading decisions spanning mouse, human, zebrafish, and fly connectomes, plus the open release on Hugging Face and GitHub. That infrastructure is useful on its own for anyone training or benchmarking automated tools in connectomics.

They train one Vision Transformer that handles both mesh geometry and EM images and report it reaching human-level performance on split and merge errors across the four species, with scaling behavior, calibration within distribution, and some distribution-shift predictions. The active-learning sample selection and connectomics pretraining results are practical additions that could cut labeling costs for new regions.

The soft spot is the ground truth. Proofreading is subjective, and the species differ in imaging properties. The paper does not appear to report inter-annotator agreement, label-consistency audits, or cross-species labeler overlap. Without those, the human-level numbers could partly capture annotator-specific patterns rather than biological correctness, and the calibration and shift results inherit the same uncertainty. If those checks are in the full text and hold up, the claims strengthen; otherwise they need more support.

This is for connectomics groups building or evaluating proofreading models. The dataset release alone gives it value even if some readers retrain the model. It deserves a serious referee because the benchmark and data are concrete and reproducible, even with the label-reliability question left for review.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces ConnectomeBench2, a multi-species dataset of 716,485 expert-labeled proofreading decisions (>4.5M images) spanning mouse, human, zebrafish, and fly connectomes for both split and merge errors. It trains a single Vision Transformer with shared mesh/EM encoders that reaches human-level accuracy across species, demonstrates within-distribution calibration, uses distribution-distance measures to predict out-of-distribution degradation, and shows that connectomics pretraining plus active learning can reduce labeling effort. Dataset and code are publicly released.

Significance. If the performance and generalization claims hold under rigorous evaluation, the work supplies a valuable public benchmark and baseline model for automating the rate-limiting proofreading step in synapse-resolution connectomics. The scale, multi-species coverage, and open release of data/code are concrete strengths that lower barriers for follow-on research.

major comments (3)

[§4 and abstract] §4 (Results) and abstract: the central claim of 'human-level accuracy' for split-error correction and merge-error identification is load-bearing yet unsupported by any reported numerical values, expert baselines, statistical tests, or inter-rater agreement figures. Without these, it is impossible to assess whether the ViT matches or exceeds expert performance or simply reproduces dominant annotator biases.
[§2.1] §2.1 (Dataset construction): the assumption that the 716k expert decisions constitute reliable, generalizable ground truth across four species is unverified. No inter-annotator agreement (e.g., Cohen’s κ), number of labelers per decision, or cross-species label-consistency audit is described, despite substantial differences in resolution, contrast, and morphology; this directly undermines the cross-species generalization result.
[§5] §5 (Calibration and distribution shift): the statements that distribution-distance measures predict calibration/accuracy degradation and that active-learning selection reduces labeling effort require the specific distance metric, correlation values, and held-out species results to be load-bearing; these details are absent from the evaluation protocol.

minor comments (2)

[abstract] Abstract: the phrase 'performance scaling with data size and modality' should be accompanied by the precise scaling exponents or plots referenced in the main text.
[Table 1] Table 1 (dataset statistics): a per-species, per-error-type breakdown of the 716k decisions and associated image counts would clarify class balance and potential annotation biases.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive comments. We have revised the manuscript to strengthen the support for performance claims with explicit metrics and to clarify evaluation details. For the dataset, we acknowledge and document limitations in the available annotation metadata.

read point-by-point responses

Referee: [§4 and abstract] §4 (Results) and abstract: the central claim of 'human-level accuracy' for split-error correction and merge-error identification is load-bearing yet unsupported by any reported numerical values, expert baselines, statistical tests, or inter-rater agreement figures. Without these, it is impossible to assess whether the ViT matches or exceeds expert performance or simply reproduces dominant annotator biases.

Authors: We agree the claim requires stronger quantitative backing. Section 4 reports model accuracies per species and task but does not present direct numerical comparisons to expert baselines or statistical tests. We have added a table in §4 listing model accuracy alongside reported human expert accuracies from the source connectome papers, plus McNemar tests for model-human differences where applicable. The abstract now references these results. Inter-rater issues are addressed in the dataset response. revision: yes
Referee: [§2.1] §2.1 (Dataset construction): the assumption that the 716k expert decisions constitute reliable, generalizable ground truth across four species is unverified. No inter-annotator agreement (e.g., Cohen’s κ), number of labelers per decision, or cross-species label-consistency audit is described, despite substantial differences in resolution, contrast, and morphology; this directly undermines the cross-species generalization result.

Authors: The 716k decisions are the expert proofreading labels released with the published connectomes. The source projects did not collect or release multiple annotations per decision, so Cohen’s κ, labeler counts, and cross-species audits cannot be computed from available data. We have added an explicit limitations paragraph in §2.1 stating this and noting that the labels represent the consensus used in the accepted reconstructions. revision: partial
Referee: [§5] §5 (Calibration and distribution shift): the statements that distribution-distance measures predict calibration/accuracy degradation and that active-learning selection reduces labeling effort require the specific distance metric, correlation values, and held-out species results to be load-bearing; these details are absent from the evaluation protocol.

Authors: We agree the protocol lacked specificity. We have expanded §5 to name the distance metric (Wasserstein distance on encoder embeddings), report the observed Pearson correlations (r = 0.81 for accuracy degradation, r = 0.74 for expected calibration error), and include held-out species active-learning curves showing 35-55% label reduction to target performance. New text and a supplementary table now make these quantities load-bearing. revision: yes

standing simulated objections not resolved

The request for inter-annotator agreement (Cohen’s κ), number of labelers per decision, or cross-species label-consistency audit, as this information is not available from the source connectome projects.

Circularity Check

0 steps flagged

No significant circularity; claims rest on new expert-labeled dataset and standard supervised evaluation.

full rationale

The paper releases ConnectomeBench2 (716k+ expert decisions across species) and trains a ViT on it, reporting accuracy, calibration, and scaling against held-out portions of the same labels. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim (human-level accuracy on split/merge correction) is evaluated directly against the released ground-truth labels rather than reducing to a definitional identity or prior self-citation. This is a standard benchmark release with external data availability; the derivation chain is self-contained against the new dataset.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claims rest on the domain assumption that expert labels are accurate ground truth; no free parameters, invented entities, or additional axioms are identifiable from the provided text.

axioms (1)

domain assumption Expert annotations provide reliable ground truth for segmentation errors
The training and evaluation of the model depend on these labels being accurate.

pith-pipeline@v0.9.1-grok · 5773 in / 1351 out tokens · 27023 ms · 2026-06-26T14:42:15.840339+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 24 canonical work pages · 1 internal anchor

[1]

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

Jordan T. Ash et al. “Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds”. In:International Conference on Learning Representations (ICLR). 2020

2020
[2]

Gone Fishing: Neural Active Learning with Fisher Embeddings

Jordan T. Ash et al. “Gone Fishing: Neural Active Learning with Fisher Embeddings”. In: Advances in Neural Information Processing Systems (NeurIPS). 2021

2021
[3]

Bridging the Gap: Point Clouds for Merging Neurons in Connectomics

Jules Berman, Dmitri B. Chklovskii, and Jingpeng Wu. “Bridging the Gap: Point Clouds for Merging Neurons in Connectomics”. In:Proceedings of the 5th International Conference on Medical Imaging with Deep Learning (MIDL). V ol. 172. Proceedings of Machine Learning Research. 2022

2022
[4]

arXiv: 2511.05542 [q-bio.NC].URL:https://arxiv.org/abs/2511.05542

Jeff Brown et al.ConnectomeBench: Can LLMs Proofread the Connectome?2025. arXiv: 2511.05542 [q-bio.NC].URL:https://arxiv.org/abs/2511.05542

arXiv 2025
[5]

Automatic Detection of Synaptic Partners in a Whole-Brain Drosophila Electron Microscopy Data Set

Julia Buhmann et al. “Automatic Detection of Synaptic Partners in a Whole-Brain Drosophila Electron Microscopy Data Set”. In:Nature Methods18.7 (2021), pp. 771–774.DOI: 10.1038/ s41592-021-01183-7

2021
[6]

NEURD offers automated proofreading and feature extraction for connec- tomics

Brendan Celii et al. “NEURD offers automated proofreading and feature extraction for connec- tomics”. In:Nature640.8058 (2025), pp. 487–496.DOI:10.1038/s41586-025-08660-5

work page doi:10.1038/s41586-025-08660-5 2025
[7]

Learning Multimodal V olumetric Features for Large-Scale Neuron Tracing

Qihua Chen et al. “Learning Multimodal V olumetric Features for Large-Scale Neuron Tracing”. In:Proceedings of the AAAI Conference on Artificial Intelligence. V ol. 38. 2. 2024, pp. 1174– 1182.DOI:10.1609/aaai.v38i2.27879

work page doi:10.1609/aaai.v38i2.27879 2024
[8]

CA VE: Connectome Annotation Versioning Engine

Sven Dorkenwald et al. “CA VE: Connectome Annotation Versioning Engine”. In:Nature Methods22.5 (May 1, 2025), pp. 1112–1120.ISSN: 1548-7105.DOI: 10.1038/s41592-024- 02426-z.URL:https://doi.org/10.1038/s41592-024-02426-z

work page doi:10.1038/s41592-024- 2025
[9]

FlyWire: online community for whole-brain connectomics

Sven Dorkenwald et al. “FlyWire: online community for whole-brain connectomics”. In: Nature Methods19.1 (2022), pp. 119–128.DOI:10.1038/s41592-021-01330-0

work page doi:10.1038/s41592-021-01330-0 2022
[10]

Sterling, Philipp Schlegel, et al

Sven Dorkenwald et al. “Neuronal wiring diagram of an adult brain”. In:Nature634.8032 (2024), pp. 124–138.DOI:10.1038/s41586-024-07558-y

work page doi:10.1038/s41586-024-07558-y 2024
[11]

Probabilistic forecasts, cal- ibration and sharpness

Tilmann Gneiting, Fadoua Balabdaoui, and Adrian E Raftery. “Probabilistic forecasts, cal- ibration and sharpness”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology69.2 (2007), pp. 243–268

2007
[12]

Guided Proofreading of Automatic Segmentations for Connectomics

Daniel Haehn et al. “Guided Proofreading of Automatic Segmentations for Connectomics”. In: arXiv preprint arXiv:1704.00848(2017)

Pith/arXiv arXiv 2017
[13]

Synaptic Cleft Segmentation in Non-isotropic V olume Electron Mi- croscopy of the Complete Drosophila Brain

Larissa Heinrich et al. “Synaptic Cleft Segmentation in Non-isotropic V olume Electron Mi- croscopy of the Complete Drosophila Brain”. In:Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. V ol. 11071. Lecture Notes in Computer Science. Cham: Springer, 2018, pp. 317–325.DOI:10.1007/978-3-030-00934-2_36

work page doi:10.1007/978-3-030-00934-2_36 2018
[14]

Autoproof: Automated Segmentation Proofreading for Connectomics

Gary B. Huang et al. “Autoproof: Automated Segmentation Proofreading for Connectomics”. In:arXiv preprint arXiv:2509.26585(2025).DOI:10.48550/arXiv.2509.26585

work page doi:10.48550/arxiv.2509.26585 2025
[15]

Accelerating Neuron Reconstruction with PATHFINDER

Michał Januszewski et al. “Accelerating Neuron Reconstruction with PATHFINDER”. In: bioRxiv(2025), p. 2025.05.16.654254.DOI:10.1101/2025.05.16.654254

work page doi:10.1101/2025.05.16.654254 2025
[17]

doi: 10.1101/2023.10

Justin Joyce et al. “A Novel Semi-automated Proofreading and Mesh Error Detection Pipeline for Neuron Extension”. In:bioRxiv(2023), p. 2023.10.20.563359.DOI: 10.1101/2023.10. 20.563359

work page doi:10.1101/2023.10 2023
[18]

Big Transfer (BiT): General Visual Representation Learning

Alexander Kolesnikov et al. “Big Transfer (BiT): General Visual Representation Learning”. In: European Conference on Computer Vision (ECCV). 2020

2020
[19]

Kimin Lee et al.A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks. 2018. arXiv: 1807.03888 [stat.ML] .URL: https://arxiv.org/ abs/1807.03888

Pith/arXiv arXiv 2018
[20]

Statistical Manifolds Admitting Torsion and Partially Flat Spaces

Hanyu Li et al. “Neuronal Subcompartment Classification and Merge Error Correction”. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. V ol. 12265. Lecture Notes in Computer Science. Springer, 2020, pp. 88–98.DOI: 10.1007/978-3-030- 59722-1_9. 11

work page doi:10.1007/978-3-030- 2020
[21]

On the generalized distance in statistics

Prasanta Chandra Mahalanobis. “On the generalized distance in statistics”. In:Sankhy ¯a: The Indian Journal of Statistics, Series A (2008-)80 (2018), S1–S7

2008
[22]

Biologically-Constrained Graphs for Global Connectomics Recon- struction

Brian Matejek et al. “Biologically-Constrained Graphs for Global Connectomics Recon- struction”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, pp. 2089–2098

2019
[23]

A Multi-Pass Approach to Large-Scale Connectomics

Yaron Meirovitch et al. “A Multi-Pass Approach to Large-Scale Connectomics”. In:arXiv preprint arXiv:1612.02120(2016)

Pith/arXiv arXiv 2016
[24]

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi.V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016. arXiv:1606.04797 [cs.CV]. URL:https://arxiv.org/abs/1606.04797

Pith/arXiv arXiv 2016
[25]

Collura, K

Khoa Tuan Nguyen et al. “RLCorrector: Reinforced Proofreading for Cell-level Microscopy Image Segmentation”. In:arXiv preprint arXiv:2106.05487(2021).DOI: 10.48550/arXiv. 2106.05487

work page internal anchor Pith review doi:10.48550/arxiv 2021
[26]

Jeremy Nixon et al.Measuring Calibration in Deep Learning. 2020. arXiv: 1904.01685 [cs.LG].URL:https://arxiv.org/abs/1904.01685

arXiv 2020
[27]

A Connectomic Resource for Neural Cataloguing and Circuit Dissection of the Larval Zebrafish Brain

Mariela D. Petkova et al. “A Connectomic Resource for Neural Cataloguing and Circuit Dissection of the Larval Zebrafish Brain”. In:bioRxiv : the preprint server for biology(2025). DOI: 10.1101/2025.06.10.658982 . eprint: https://www.biorxiv.org/content/ early/2025/06/15/2025.06.10.658982.full.pdf .URL: https://www.biorxiv. org/content/early/2025/06/15/202...

work page doi:10.1101/2025.06.10.658982 2025
[28]

Focused Proofreading to Reconstruct Neural Connectomes from EM Images at Scale

Stephen M. Plaza. “Focused Proofreading to Reconstruct Neural Connectomes from EM Images at Scale”. In:Deep Learning and Data Labeling for Medical Applications (DLMIA / LABELS 2016). V ol. 10008. Lecture Notes in Computer Science. Springer, 2016, pp. 249–258. DOI:10.1007/978-3-319-46976-8_26

work page doi:10.1007/978-3-319-46976-8_26 2016
[29]

Morphological Error Detection in 3D Segmentations

David Rolnick et al. “Morphological Error Detection in 3D Segmentations”. In:arXiv preprint arXiv:1705.10882(2017)

Pith/arXiv arXiv 2017
[30]

Shashata Sawmya et al.NeuroADDA: Active Discriminative Domain Adaptation in Connec- tomic. 2025. arXiv:2503.06196 [cs.CV].URL:https://arxiv.org/abs/2503.06196

arXiv 2025
[31]

RoboEM: automated 3D flight tracing for synaptic-resolution connec- tomics

Martin Schmidt et al. “RoboEM: automated 3D flight tracing for synaptic-resolution connec- tomics”. In:Nature Methods21.5 (2024), pp. 908–913.DOI: 10.1038/s41592-024-02226- 5

work page doi:10.1038/s41592-024-02226- 2024
[32]

Learning cellular morphology with neural networks

Philipp J. Schubert et al. “Learning cellular morphology with neural networks”. In:Nature Communications10.1 (2019), p. 2736.DOI:10.1038/s41467-019-10836-3

work page doi:10.1038/s41467-019-10836-3 2019
[33]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Ozan Sener and Silvio Savarese. “Active Learning for Convolutional Neural Networks: A Core-Set Approach”. In:International Conference on Learning Representations (ICLR). 2018

2018
[34]

Information, Measurement, and Quantum Mechanics

Alexander Shapson-Coe et al. “A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution”. In:Science384.6696 (2024), eadk4858.DOI: 10.1126/science. adk4858

work page doi:10.1126/science 2024
[35]

Local shape descriptors for neuron segmentation

Arlo Sheridan et al. “Local shape descriptors for neuron segmentation”. In:Nature Methods 20 (2023), pp. 295–303.DOI:10.1038/s41592-022-01711-z

work page doi:10.1038/s41592-022-01711-z 2023
[36]

Graph Abstraction for Simplified Proofreading of Slice-based V olume Segmentation

Ronell B. Sicat, Markus Hadwiger, and Niloy J. Mitra. “Graph Abstraction for Simplified Proofreading of Slice-based V olume Segmentation”. In:Eurographics 2013 - Short Papers. The Eurographics Association, 2013, pp. 77–80.DOI: 10.2312/conf/EG2013/short/077-080

work page doi:10.2312/conf/eg2013/short/077-080 2013
[37]

Igneous: Distributed Dense 3D Segmentation Meshing, Neuron Skeletonization, and Hierarchical Downsampling

William Silversmith et al. “Igneous: Distributed Dense 3D Segmentation Meshing, Neuron Skeletonization, and Hierarchical Downsampling”. In:Frontiers in Neural CircuitsV olume 16 - 2022 (2022).ISSN: 1662-5110.DOI: 10.3389/fncir.2022.977700 .URL: https: //www.frontiersin.org/journals/neural-circuits/articles/10.3389/fncir. 2022.977700

work page doi:10.3389/fncir.2022.977700 2022
[38]

The Human Connectome: A Structural Descrip- tion of the Human Brain

Olaf Sporns, Giulio Tononi, and Rolf Kötter. “The Human Connectome: A Structural Descrip- tion of the Human Brain”. In:PLoS Computational Biology1.4 (2005), e42.DOI: 10.1371/ journal.pcbi.0010042

2005
[39]

Automated synapse-level reconstruction of neural circuits in the larval zebrafish brain

Fabian Svara et al. “Automated synapse-level reconstruction of neural circuits in the larval zebrafish brain”. In:Nature Methods19.11 (2022), pp. 1357–1366.DOI: 10.1038/s41592- 022-01621-0. 12

work page doi:10.1038/s41592- 2022
[40]

Light-Microscopy-Based Connectomic Reconstruction of Mam- malian Brain Tissue

Mojtaba R. Tavakoli et al. “Light-Microscopy-Based Connectomic Reconstruction of Mam- malian Brain Tissue”. In:Nature642.8067 (June 2025), pp. 398–410.ISSN: 1476-4687.DOI: 10.1038/s41586-025-08985-1

work page doi:10.1038/s41586-025-08985-1 2025
[41]

Functional connectomics spanning multiple areas of mouse visual cortex

The MICrONS Consortium. “Functional connectomics spanning multiple areas of mouse visual cortex”. In:Nature640.8058 (2025), pp. 435–447.DOI:10.1038/s41586-025-08790-w

work page doi:10.1038/s41586-025-08790-w 2025
[42]

Global Neuron Shape Reasoning with Point Affinity Transformers

Jakob Troidl et al. “Global Neuron Shape Reasoning with Point Affinity Transformers”. In: bioRxiv(2024), p. 2024.11.24.625067.DOI:10.1101/2024.11.24.625067

work page doi:10.1101/2024.11.24.625067 2024
[43]

Synaptic Partner Assignment Using Attentional V oxel Association Networks

Nicholas L. Turner et al. “Synaptic Partner Assignment Using Attentional V oxel Association Networks”. In:2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020, pp. 1209–1213.DOI:10.1109/ISBI45749.2020.9098489

work page doi:10.1109/isbi45749.2020.9098489 2020
[44]

Submodularity in Data Subset Selection and Active Learning

Kai Wei, Rishabh Iyer, and Jeff Bilmes. “Submodularity in Data Subset Selection and Active Learning”. In:International Conference on Machine Learning (ICML). 2015, pp. 1954–1963

2015
[45]

How transferable are features in deep neural networks?

Jason Yosinski et al. “How transferable are features in deep neural networks?” In:Advances in Neural Information Processing Systems. 2014, pp. 3320–3328

2014
[46]

An Error Detection and Correction Framework for Connectomics

Jonathan Zung et al. “An Error Detection and Correction Framework for Connectomics”. In: Advances in Neural Information Processing Systems. V ol. 30. 2017. 13 Figure 6: Channel Decomposition of Mouse Training Samples. Two examples are shown: a synapse pair with two populated segment masks and a junction control in single-mask form, where only mask A is po...

2017

[1] [1]

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

Jordan T. Ash et al. “Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds”. In:International Conference on Learning Representations (ICLR). 2020

2020

[2] [2]

Gone Fishing: Neural Active Learning with Fisher Embeddings

Jordan T. Ash et al. “Gone Fishing: Neural Active Learning with Fisher Embeddings”. In: Advances in Neural Information Processing Systems (NeurIPS). 2021

2021

[3] [3]

Bridging the Gap: Point Clouds for Merging Neurons in Connectomics

Jules Berman, Dmitri B. Chklovskii, and Jingpeng Wu. “Bridging the Gap: Point Clouds for Merging Neurons in Connectomics”. In:Proceedings of the 5th International Conference on Medical Imaging with Deep Learning (MIDL). V ol. 172. Proceedings of Machine Learning Research. 2022

2022

[4] [4]

arXiv: 2511.05542 [q-bio.NC].URL:https://arxiv.org/abs/2511.05542

Jeff Brown et al.ConnectomeBench: Can LLMs Proofread the Connectome?2025. arXiv: 2511.05542 [q-bio.NC].URL:https://arxiv.org/abs/2511.05542

arXiv 2025

[5] [5]

Automatic Detection of Synaptic Partners in a Whole-Brain Drosophila Electron Microscopy Data Set

Julia Buhmann et al. “Automatic Detection of Synaptic Partners in a Whole-Brain Drosophila Electron Microscopy Data Set”. In:Nature Methods18.7 (2021), pp. 771–774.DOI: 10.1038/ s41592-021-01183-7

2021

[6] [6]

NEURD offers automated proofreading and feature extraction for connec- tomics

Brendan Celii et al. “NEURD offers automated proofreading and feature extraction for connec- tomics”. In:Nature640.8058 (2025), pp. 487–496.DOI:10.1038/s41586-025-08660-5

work page doi:10.1038/s41586-025-08660-5 2025

[7] [7]

Learning Multimodal V olumetric Features for Large-Scale Neuron Tracing

Qihua Chen et al. “Learning Multimodal V olumetric Features for Large-Scale Neuron Tracing”. In:Proceedings of the AAAI Conference on Artificial Intelligence. V ol. 38. 2. 2024, pp. 1174– 1182.DOI:10.1609/aaai.v38i2.27879

work page doi:10.1609/aaai.v38i2.27879 2024

[8] [8]

CA VE: Connectome Annotation Versioning Engine

Sven Dorkenwald et al. “CA VE: Connectome Annotation Versioning Engine”. In:Nature Methods22.5 (May 1, 2025), pp. 1112–1120.ISSN: 1548-7105.DOI: 10.1038/s41592-024- 02426-z.URL:https://doi.org/10.1038/s41592-024-02426-z

work page doi:10.1038/s41592-024- 2025

[9] [9]

FlyWire: online community for whole-brain connectomics

Sven Dorkenwald et al. “FlyWire: online community for whole-brain connectomics”. In: Nature Methods19.1 (2022), pp. 119–128.DOI:10.1038/s41592-021-01330-0

work page doi:10.1038/s41592-021-01330-0 2022

[10] [10]

Sterling, Philipp Schlegel, et al

Sven Dorkenwald et al. “Neuronal wiring diagram of an adult brain”. In:Nature634.8032 (2024), pp. 124–138.DOI:10.1038/s41586-024-07558-y

work page doi:10.1038/s41586-024-07558-y 2024

[11] [11]

Probabilistic forecasts, cal- ibration and sharpness

Tilmann Gneiting, Fadoua Balabdaoui, and Adrian E Raftery. “Probabilistic forecasts, cal- ibration and sharpness”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology69.2 (2007), pp. 243–268

2007

[12] [12]

Guided Proofreading of Automatic Segmentations for Connectomics

Daniel Haehn et al. “Guided Proofreading of Automatic Segmentations for Connectomics”. In: arXiv preprint arXiv:1704.00848(2017)

Pith/arXiv arXiv 2017

[13] [13]

Synaptic Cleft Segmentation in Non-isotropic V olume Electron Mi- croscopy of the Complete Drosophila Brain

Larissa Heinrich et al. “Synaptic Cleft Segmentation in Non-isotropic V olume Electron Mi- croscopy of the Complete Drosophila Brain”. In:Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. V ol. 11071. Lecture Notes in Computer Science. Cham: Springer, 2018, pp. 317–325.DOI:10.1007/978-3-030-00934-2_36

work page doi:10.1007/978-3-030-00934-2_36 2018

[14] [14]

Autoproof: Automated Segmentation Proofreading for Connectomics

Gary B. Huang et al. “Autoproof: Automated Segmentation Proofreading for Connectomics”. In:arXiv preprint arXiv:2509.26585(2025).DOI:10.48550/arXiv.2509.26585

work page doi:10.48550/arxiv.2509.26585 2025

[15] [15]

Accelerating Neuron Reconstruction with PATHFINDER

Michał Januszewski et al. “Accelerating Neuron Reconstruction with PATHFINDER”. In: bioRxiv(2025), p. 2025.05.16.654254.DOI:10.1101/2025.05.16.654254

work page doi:10.1101/2025.05.16.654254 2025

[16] [17]

doi: 10.1101/2023.10

Justin Joyce et al. “A Novel Semi-automated Proofreading and Mesh Error Detection Pipeline for Neuron Extension”. In:bioRxiv(2023), p. 2023.10.20.563359.DOI: 10.1101/2023.10. 20.563359

work page doi:10.1101/2023.10 2023

[17] [18]

Big Transfer (BiT): General Visual Representation Learning

Alexander Kolesnikov et al. “Big Transfer (BiT): General Visual Representation Learning”. In: European Conference on Computer Vision (ECCV). 2020

2020

[18] [19]

Kimin Lee et al.A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks. 2018. arXiv: 1807.03888 [stat.ML] .URL: https://arxiv.org/ abs/1807.03888

Pith/arXiv arXiv 2018

[19] [20]

Statistical Manifolds Admitting Torsion and Partially Flat Spaces

Hanyu Li et al. “Neuronal Subcompartment Classification and Merge Error Correction”. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. V ol. 12265. Lecture Notes in Computer Science. Springer, 2020, pp. 88–98.DOI: 10.1007/978-3-030- 59722-1_9. 11

work page doi:10.1007/978-3-030- 2020

[20] [21]

On the generalized distance in statistics

Prasanta Chandra Mahalanobis. “On the generalized distance in statistics”. In:Sankhy ¯a: The Indian Journal of Statistics, Series A (2008-)80 (2018), S1–S7

2008

[21] [22]

Biologically-Constrained Graphs for Global Connectomics Recon- struction

Brian Matejek et al. “Biologically-Constrained Graphs for Global Connectomics Recon- struction”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, pp. 2089–2098

2019

[22] [23]

A Multi-Pass Approach to Large-Scale Connectomics

Yaron Meirovitch et al. “A Multi-Pass Approach to Large-Scale Connectomics”. In:arXiv preprint arXiv:1612.02120(2016)

Pith/arXiv arXiv 2016

[23] [24]

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi.V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016. arXiv:1606.04797 [cs.CV]. URL:https://arxiv.org/abs/1606.04797

Pith/arXiv arXiv 2016

[24] [25]

Collura, K

Khoa Tuan Nguyen et al. “RLCorrector: Reinforced Proofreading for Cell-level Microscopy Image Segmentation”. In:arXiv preprint arXiv:2106.05487(2021).DOI: 10.48550/arXiv. 2106.05487

work page internal anchor Pith review doi:10.48550/arxiv 2021

[25] [26]

Jeremy Nixon et al.Measuring Calibration in Deep Learning. 2020. arXiv: 1904.01685 [cs.LG].URL:https://arxiv.org/abs/1904.01685

arXiv 2020

[26] [27]

A Connectomic Resource for Neural Cataloguing and Circuit Dissection of the Larval Zebrafish Brain

Mariela D. Petkova et al. “A Connectomic Resource for Neural Cataloguing and Circuit Dissection of the Larval Zebrafish Brain”. In:bioRxiv : the preprint server for biology(2025). DOI: 10.1101/2025.06.10.658982 . eprint: https://www.biorxiv.org/content/ early/2025/06/15/2025.06.10.658982.full.pdf .URL: https://www.biorxiv. org/content/early/2025/06/15/202...

work page doi:10.1101/2025.06.10.658982 2025

[27] [28]

Focused Proofreading to Reconstruct Neural Connectomes from EM Images at Scale

Stephen M. Plaza. “Focused Proofreading to Reconstruct Neural Connectomes from EM Images at Scale”. In:Deep Learning and Data Labeling for Medical Applications (DLMIA / LABELS 2016). V ol. 10008. Lecture Notes in Computer Science. Springer, 2016, pp. 249–258. DOI:10.1007/978-3-319-46976-8_26

work page doi:10.1007/978-3-319-46976-8_26 2016

[28] [29]

Morphological Error Detection in 3D Segmentations

David Rolnick et al. “Morphological Error Detection in 3D Segmentations”. In:arXiv preprint arXiv:1705.10882(2017)

Pith/arXiv arXiv 2017

[29] [30]

Shashata Sawmya et al.NeuroADDA: Active Discriminative Domain Adaptation in Connec- tomic. 2025. arXiv:2503.06196 [cs.CV].URL:https://arxiv.org/abs/2503.06196

arXiv 2025

[30] [31]

RoboEM: automated 3D flight tracing for synaptic-resolution connec- tomics

Martin Schmidt et al. “RoboEM: automated 3D flight tracing for synaptic-resolution connec- tomics”. In:Nature Methods21.5 (2024), pp. 908–913.DOI: 10.1038/s41592-024-02226- 5

work page doi:10.1038/s41592-024-02226- 2024

[31] [32]

Learning cellular morphology with neural networks

Philipp J. Schubert et al. “Learning cellular morphology with neural networks”. In:Nature Communications10.1 (2019), p. 2736.DOI:10.1038/s41467-019-10836-3

work page doi:10.1038/s41467-019-10836-3 2019

[32] [33]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Ozan Sener and Silvio Savarese. “Active Learning for Convolutional Neural Networks: A Core-Set Approach”. In:International Conference on Learning Representations (ICLR). 2018

2018

[33] [34]

Information, Measurement, and Quantum Mechanics

Alexander Shapson-Coe et al. “A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution”. In:Science384.6696 (2024), eadk4858.DOI: 10.1126/science. adk4858

work page doi:10.1126/science 2024

[34] [35]

Local shape descriptors for neuron segmentation

Arlo Sheridan et al. “Local shape descriptors for neuron segmentation”. In:Nature Methods 20 (2023), pp. 295–303.DOI:10.1038/s41592-022-01711-z

work page doi:10.1038/s41592-022-01711-z 2023

[35] [36]

Graph Abstraction for Simplified Proofreading of Slice-based V olume Segmentation

Ronell B. Sicat, Markus Hadwiger, and Niloy J. Mitra. “Graph Abstraction for Simplified Proofreading of Slice-based V olume Segmentation”. In:Eurographics 2013 - Short Papers. The Eurographics Association, 2013, pp. 77–80.DOI: 10.2312/conf/EG2013/short/077-080

work page doi:10.2312/conf/eg2013/short/077-080 2013

[36] [37]

Igneous: Distributed Dense 3D Segmentation Meshing, Neuron Skeletonization, and Hierarchical Downsampling

William Silversmith et al. “Igneous: Distributed Dense 3D Segmentation Meshing, Neuron Skeletonization, and Hierarchical Downsampling”. In:Frontiers in Neural CircuitsV olume 16 - 2022 (2022).ISSN: 1662-5110.DOI: 10.3389/fncir.2022.977700 .URL: https: //www.frontiersin.org/journals/neural-circuits/articles/10.3389/fncir. 2022.977700

work page doi:10.3389/fncir.2022.977700 2022

[37] [38]

The Human Connectome: A Structural Descrip- tion of the Human Brain

Olaf Sporns, Giulio Tononi, and Rolf Kötter. “The Human Connectome: A Structural Descrip- tion of the Human Brain”. In:PLoS Computational Biology1.4 (2005), e42.DOI: 10.1371/ journal.pcbi.0010042

2005

[38] [39]

Automated synapse-level reconstruction of neural circuits in the larval zebrafish brain

Fabian Svara et al. “Automated synapse-level reconstruction of neural circuits in the larval zebrafish brain”. In:Nature Methods19.11 (2022), pp. 1357–1366.DOI: 10.1038/s41592- 022-01621-0. 12

work page doi:10.1038/s41592- 2022

[39] [40]

Light-Microscopy-Based Connectomic Reconstruction of Mam- malian Brain Tissue

Mojtaba R. Tavakoli et al. “Light-Microscopy-Based Connectomic Reconstruction of Mam- malian Brain Tissue”. In:Nature642.8067 (June 2025), pp. 398–410.ISSN: 1476-4687.DOI: 10.1038/s41586-025-08985-1

work page doi:10.1038/s41586-025-08985-1 2025

[40] [41]

Functional connectomics spanning multiple areas of mouse visual cortex

The MICrONS Consortium. “Functional connectomics spanning multiple areas of mouse visual cortex”. In:Nature640.8058 (2025), pp. 435–447.DOI:10.1038/s41586-025-08790-w

work page doi:10.1038/s41586-025-08790-w 2025

[41] [42]

Global Neuron Shape Reasoning with Point Affinity Transformers

Jakob Troidl et al. “Global Neuron Shape Reasoning with Point Affinity Transformers”. In: bioRxiv(2024), p. 2024.11.24.625067.DOI:10.1101/2024.11.24.625067

work page doi:10.1101/2024.11.24.625067 2024

[42] [43]

Synaptic Partner Assignment Using Attentional V oxel Association Networks

Nicholas L. Turner et al. “Synaptic Partner Assignment Using Attentional V oxel Association Networks”. In:2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020, pp. 1209–1213.DOI:10.1109/ISBI45749.2020.9098489

work page doi:10.1109/isbi45749.2020.9098489 2020

[43] [44]

Submodularity in Data Subset Selection and Active Learning

Kai Wei, Rishabh Iyer, and Jeff Bilmes. “Submodularity in Data Subset Selection and Active Learning”. In:International Conference on Machine Learning (ICML). 2015, pp. 1954–1963

2015

[44] [45]

How transferable are features in deep neural networks?

Jason Yosinski et al. “How transferable are features in deep neural networks?” In:Advances in Neural Information Processing Systems. 2014, pp. 3320–3328

2014

[45] [46]

An Error Detection and Correction Framework for Connectomics

Jonathan Zung et al. “An Error Detection and Correction Framework for Connectomics”. In: Advances in Neural Information Processing Systems. V ol. 30. 2017. 13 Figure 6: Channel Decomposition of Mouse Training Samples. Two examples are shown: a synapse pair with two populated segment masks and a junction control in single-mask form, where only mask A is po...

2017