pith. sign in

arxiv: 1906.10886 · v1 · pith:NEQQIA7Wnew · submitted 2019-06-26 · 💻 cs.CV · cs.GR· eess.IV

Joint Multi-frame Detection and Segmentation for Multi-cell Tracking

Pith reviewed 2026-05-25 16:08 UTC · model grok-4.3

classification 💻 cs.CV cs.GReess.IV
keywords multi-cell trackingcell detectionmitosis detectioncell segmentationUNetspatio-temporal featurescell lineagedense cell populations
0
0 comments X

The pith

A multi-frame UNet extracts spatio-temporal cell features to improve centroid detection during mitosis and enable joint segmentation for tracking in dense populations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a tracking-by-detection pipeline for living cells that feeds multiple video frames into a UNet to capture both motion across frames and appearance within frames, raising detection accuracy especially when cells divide. A separate mitosis detector then links parent and daughter cells into lineages, while a second UNet produces an initial segmentation that is refined by combining it with the refined detections. The authors argue this joint use of detection and segmentation overcomes the problems of changing cell shapes and nearly identical neighboring cells. A reader would care because reliable automated tracking would let biologists measure division rates and migration patterns in crowded live-cell videos without manual annotation.

Core claim

The authors establish that multi-frame input to UNet improves detection of cells in mitotic phase, a dedicated mitosis detection algorithm constructs cell lineages, and the combination of these detections with primary segmentation from a second UNet produces accurate fine segmentation even in highly dense cell populations, yielding state-of-the-art multi-cell tracking performance.

What carries the argument

Multi-frame UNet that jointly extracts inter-frame and intra-frame spatio-temporal information, used for both centroid detection and primary segmentation, plus a mitosis detection algorithm that builds lineages.

Load-bearing premise

The performance of the detector has high impact on tracking performance, so better detection directly produces better tracking.

What would settle it

A controlled comparison on the same video sequences in which single-frame detection matches or exceeds multi-frame detection accuracy while overall tracking performance remains lower would falsify the claim that multi-frame detection is the key driver.

Figures

Figures reproduced from arXiv: 1906.10886 by Chengkang He, Fei Wang, Huaying Chen, Peng Gao, Wenjuan Xi, Zibin Zhou.

Figure 1
Figure 1. Figure 1: Overview of our proposed tracking framework. (a) Input. (b) UNet for primary cell segmentation. (c) UNet for cell centroid detection with multi-frame images. (d) Primary multi-cell tracker. (e) Fine segmentation. (f) Final tracking results. 3 Method In this section our proposed method is detailed. As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Morphological changes in mitosis. Pixels are categorized into three categories: mitotic cells, normal cells and back￾grounds. If information in previous nearby frames is included, network can more accurately learn to identify mitotic cells [17]. Different from usual single-frame input method, we feed incorporative con￾secutive pre-Ninput frames into the network. This approach does improve cell centroid det… view at source ↗
Figure 3
Figure 3. Figure 3: Dense cell segmentation results. Cross: mitotic cells. Dot: normal cells. (a) Orig￾inal image and cell centroid detection results. (b) Primary cell segmentation results. (c) Fine segmentation results. 3.4 Fine Segmentation Results from primary segmentation may contain many connected area as shown in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: shows multi-cell tracking performance of our method on multiple datasets. For the consideration of clarity, only a portion of field of view is se￾lected and enlarged. Different kind of cells have different morphology. We track trajectories of cells and get each cell segmentation. Fine segmentation results on highly dense cell population is shown as in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cell spatio-temporal trajectories of Phc-PSC. Evaluations are conducted to compare our method with other methods with datasets in Cell Tracking Challenge. Due to jointly use detection and segmenta￾tion, our method performs excellent and achieves a new state-of-the-art perfor￾mance on dataset Fluo-Hela. Performance on some datasets is still not very ideal. In future works, fine segmentation will be further … view at source ↗
read the original abstract

Tracking living cells in video sequence is difficult, because of cell morphology and high similarities between cells. Tracking-by-detection methods are widely used in multi-cell tracking. We perform multi-cell tracking based on the cell centroid detection, and the performance of the detector has high impact on tracking performance. In this paper, UNet is utilized to extract inter-frame and intra-frame spatio-temporal information of cells. Detection performance of cells in mitotic phase is improved by multi-frame input. Good detection results facilitate multi-cell tracking. A mitosis detection algorithm is proposed to detect cell mitosis and the cell lineage is built up. Another UNet is utilized to acquire primary segmentation. Jointly using detection and primary segmentation, cells can be fine segmented in highly dense cell population. Experiments are conducted to evaluate the effectiveness of our method, and results show its state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a joint multi-frame detection and segmentation pipeline for multi-cell tracking in video. It employs a UNet to extract spatio-temporal features from multiple frames for centroid detection (with emphasis on improved mitotic-phase detection), introduces a mitosis detection algorithm to construct cell lineages, and uses a second UNet for primary segmentation that is combined with detection outputs to refine segmentation in dense populations. The central claim is that this approach yields state-of-the-art tracking performance.

Significance. If the experimental results hold, the work could advance automated analysis of live-cell imaging by better handling mitosis events and high-density scenarios through explicit use of inter-frame information. The joint detection-segmentation strategy and lineage construction are reasonable extensions of tracking-by-detection paradigms.

major comments (2)
  1. [Abstract] Abstract: the assertion that 'results show its state-of-the-art performance' supplies no datasets, metrics (MOTA, TRA, etc.), baselines, or quantitative numbers, rendering the central claim impossible to evaluate from the provided text.
  2. [Introduction / Method] The manuscript states that detector performance has high impact on tracking but provides no ablation or sensitivity analysis quantifying this dependence (e.g., tracking metrics as a function of detection precision).
minor comments (2)
  1. [Abstract] The repeated phrasing that 'good detection results facilitate multi-cell tracking' is redundant and could be tightened.
  2. [Method] Notation for the two UNets and how their outputs are fused for fine segmentation is not introduced with explicit equations or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that 'results show its state-of-the-art performance' supplies no datasets, metrics (MOTA, TRA, etc.), baselines, or quantitative numbers, rendering the central claim impossible to evaluate from the provided text.

    Authors: We agree that the abstract should provide sufficient detail for readers to evaluate the central claim without needing to consult the full text. The experiments section of the manuscript reports results on standard cell-tracking benchmarks using MOTA, TRA, and other metrics with explicit baseline comparisons. In the revised version we will expand the abstract to include the primary datasets, key quantitative results, and the main baselines. revision: yes

  2. Referee: [Introduction / Method] The manuscript states that detector performance has high impact on tracking but provides no ablation or sensitivity analysis quantifying this dependence (e.g., tracking metrics as a function of detection precision).

    Authors: The manuscript cites the well-established dependence of tracking-by-detection performance on detector quality and demonstrates improved tracking when mitotic detection is enhanced. We acknowledge that an explicit sensitivity analysis would make this dependence more transparent. We will add a new subsection reporting tracking metrics (MOTA, TRA) under controlled variations in detection precision to quantify the relationship. revision: yes

Circularity Check

0 steps flagged

No circularity; method is empirical pipeline with no derivation chain

full rationale

The manuscript presents a UNet-based joint detection and segmentation pipeline for cell tracking, with claims resting on experimental results rather than any mathematical derivation, fitted parameters renamed as predictions, or self-citation chains. No equations, ansatzes, or load-bearing uniqueness theorems appear in the abstract or described approach. The central claim of SOTA performance is an empirical assertion unsupported by numbers here but is not circular by construction; the paper is self-contained against external benchmarks in the sense that no internal reduction to inputs occurs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5686 in / 953 out tokens · 41499 ms · 2026-05-25T16:08:02.134742+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 3 internal anchors

  1. [1]

    A generalized data association approach for cell tracking in high-density population[C]//2015 International Conference on Control, Automation and Information Sciences (ICCAIS)

    Ren Y, Xu B, Zhang J, et al. A generalized data association approach for cell tracking in high-density population[C]//2015 International Conference on Control, Automation and Information Sciences (ICCAIS). IEEE, 2015: 502-507

  2. [2]

    Cell tracking using deep neural networks with multi-task learning[J]

    He T, Mao H, Guo J, et al. Cell tracking using deep neural networks with multi-task learning[J]. Image and Vision Computing, 2017, 60: 142-153

  3. [3]

    Cell Segmentation, Tracking, and Mitosis Detection Using Temporal Context[J]

    Yang F, Mackey M A, Ianzini F, et al. Cell Segmentation, Tracking, and Mitosis Detection Using Temporal Context[J]. Lecture Notes in Computer Science (LNCS), 2005, 8(Pt 1):302-309

  4. [4]

    Deep residual learning for image recogni- tion[C]//Proceedings of the IEEE conference on computer vision and pattern recog- nition (CVPR)

    He K, Zhang X, Ren S, et al. Deep residual learning for image recogni- tion[C]//Proceedings of the IEEE conference on computer vision and pattern recog- nition (CVPR). 2016: 770-778

  5. [5]

    Payer C, tern D, Neff T, et al. Instance segmentation and tracking with cosine em- beddings and recurrent hourglass networks[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, Cham, 2018: 3-11

  6. [6]

    U-net: Convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)

    Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, Cham, 2015: 234-241. 12 Z. Zhou, F. Wang, et al

  7. [7]

    A benchmark for comparison of cell tracking algorithms[J]

    Maka M, Ulman V, Svoboda D, et al. A benchmark for comparison of cell tracking algorithms[J]. Bioinformatics, 2014, 30(11): 1609-1617

  8. [8]

    High-speed tracking-by-detection without us- ing image information[C]//2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

    Bochinski E, Eiselein V, Sikora T. High-speed tracking-by-detection without us- ing image information[C]//2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017: 1-6

  9. [9]

    An objective comparison of cell-tracking algorithms[J]

    Ulman V, Maka M, Magnusson K E G, et al. An objective comparison of cell-tracking algorithms[J]. Nature methods, 2017, 14(12): 1141

  10. [10]

    Multiple object tracking: A literature review[J]

    Luo W, Xing J, Milan A, et al. Multiple object tracking: A literature review[J]. arXiv preprint arXiv:1409.7618v4, 2017

  11. [11]

    Deep neural networks segment neu- ronal membranes in electron microscopy images[C]//Advances in neural information processing systems

    Ciresan D, Giusti A, Gambardella L M, et al. Deep neural networks segment neu- ronal membranes in electron microscopy images[C]//Advances in neural information processing systems. 2012: 2843-2851

  12. [12]

    Unet++: A nested u-net architec- ture for medical image segmentation[M]//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (DLMIA)

    Zhou Z, Siddiquee M M R, Tajbakhsh N, et al. Unet++: A nested u-net architec- ture for medical image segmentation[M]//Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (DLMIA). Springer, Cham, 2018: 3-11

  13. [13]

    Delving Deeper into Convolutional Networks for Learning Video Representations

    Ballas N, Yao L, Pal C, et al. Delving deeper into convolutional networks for learning video representations[J]. arXiv preprint arXiv:1511.06432, 2015

  14. [14]

    Stacked hourglass networks for human pose estima- tion[C]//European Conference on Computer Vision (ECCV)

    Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estima- tion[C]//European Conference on Computer Vision (ECCV). Springer, Cham, 2016: 483-499

  15. [15]

    Microscopy Cell Segmentation via Convolutional LSTM Networks

    Arbelle A, Raviv T R. Microscopy Cell Segmentation via Convolutional LSTM Networks[J]. arXiv preprint arXiv:1805.11247, 2018

  16. [16]

    Convolutional LSTM network: A machine learning approach for precipitation nowcasting[C]//Advances in neural information processing systems

    Xingjian S H I, Chen Z, Wang H, et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting[C]//Advances in neural information processing systems. 2015: 802-810

  17. [17]

    Tracking the untrackable: Learning to track mul- tiple cues with long-term dependencies[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV)

    Sadeghian A, Alahi A, Savarese S. Tracking the untrackable: Learning to track mul- tiple cues with long-term dependencies[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017: 300-311

  18. [18]

    A new flood-fill algorithm for closed contour[C]//2005 Siberian Con- ference on Control and Communications

    Khudeev R. A new flood-fill algorithm for closed contour[C]//2005 Siberian Con- ference on Control and Communications. IEEE, 2005: 172-176

  19. [19]

    Extending IOU based multi-object tracking by vi- sual information[C]//2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

    Bochinski E, Senst T, Sikora T. Extending IOU based multi-object tracking by vi- sual information[C]//2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2018: 1-6

  20. [20]

    Spatial tessellations: concepts and applica- tions of Voronoi diagrams[M]

    Okabe A, Boots B, Sugihara K, et al. Spatial tessellations: concepts and applica- tions of Voronoi diagrams[M]. John Wiley & Sons, 2009

  21. [21]

    Adam: A Method for Stochastic Optimization

    Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014

  22. [22]

    U-Net: deep learning for cell counting, detection, and morphometry[J]

    Falk T, Mai D, Bensch R, et al. U-Net: deep learning for cell counting, detection, and morphometry[J]. Nature methods, 2019, 16(1): 67