pith. sign in

arxiv: 2509.25969 · v1 · pith:2EV3KGEPnew · submitted 2025-09-30 · 💻 cs.CV

A Multi-purpose Tracking Framework for Salmon Welfare Monitoring in Challenging Environments

Pith reviewed 2026-05-21 21:05 UTC · model grok-4.3

classification 💻 cs.CV
keywords salmon trackingwelfare monitoringpose estimationunderwater trackingbody part trackingaquacultureID switchestail beat analysis
0
0 comments X

The pith

A tracking framework using salmon body parts outperforms pedestrian trackers in underwater scenes and supports automated tail-beat welfare monitoring.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a flexible tracking framework for continuous salmon monitoring in net pens. It starts with a pose estimation network that locates each salmon along with its body parts, then routes those details through specialized modules built to handle underwater-specific problems such as occlusion, similar appearance, and turning motions. The resulting high-detail tracks feed directly into welfare calculations, including tail-beat wavelength, so that multiple indicators can be derived from a single pipeline rather than separate detectors. Experiments on newly collected datasets for ID transfers in crowds and ID switches during turns show better results than the current pedestrian state-of-the-art. The same body-part tracks also prove suitable for automated tail-beat analysis, demonstrating a practical path to multi-purpose welfare monitoring.

Core claim

The central claim is that a pose-estimation-based tracker augmented with body-part-specific modules can resolve salmon ID transfers in crowded scenes and ID switches during turns more reliably than existing pedestrian trackers, while simultaneously producing tracks accurate enough to compute tail-beat wavelength for welfare assessment.

What carries the argument

Pose estimation network that supplies bounding boxes and body-part locations, combined with specialized modules that exploit those locations to correct tracking errors.

If this is right

  • The method outperforms BoostTrack on both salmon-specific tracking challenges.
  • Body-part tracks enable direct calculation of tail-beat wavelength without additional detectors.
  • A single pipeline can supply multiple welfare indicators instead of running separate systems for each metric.
  • New datasets are provided for evaluating crowded-scene ID transfers and turning-induced ID switches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same body-part approach might transfer to tracking other fish species or marine animals in turbid water.
  • Real-time deployment of the framework could support continuous welfare dashboards on large aquaculture farms.
  • Extending the modules to additional body-part relations could further reduce ID switches in even denser scenes.
  • Because the tracks already contain part-level detail, the system may lower overall compute cost compared with running independent detectors for each welfare indicator.

Load-bearing premise

The specialized modules can use body-part information to fix tracking problems without creating new errors or needing heavy per-scene tuning.

What would settle it

On the released datasets for ID transfer and ID switch challenges, a head-to-head run showing that the proposed tracker does not exceed BoostTrack performance or produces inaccurate tail-beat wavelengths would falsify the central claim.

Figures

Figures reproduced from arXiv: 2509.25969 by Annette Stahl, Christian Schellewald, Espen Uri H{\o}gstedt, Rudolf Mester.

Figure 1
Figure 1. Figure 1: A visualization of our proposed pipeline [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Annotated images from the TailbeatWavelength train [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Two consecutive extreme salmon poses, annotated with [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Number of matches for turning salmon in the Turn [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualized results from the salmon tail beat wavelength [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Computer Vision (CV)-based continuous, automated and precise salmon welfare monitoring is a key step toward reduced salmon mortality and improved salmon welfare in industrial aquaculture net pens. Available CV methods for determining welfare indicators focus on single indicators and rely on object detectors and trackers from other application areas to aid their welfare indicator calculation algorithm. This comes with a high resource demand for real-world applications, since each indicator must be calculated separately. In addition, the methods are vulnerable to difficulties in underwater salmon scenes, such as object occlusion, similar object appearance, and similar object motion. To address these challenges, we propose a flexible tracking framework that uses a pose estimation network to extract bounding boxes around salmon and their corresponding body parts, and exploits information about the body parts, through specialized modules, to tackle challenges specific to underwater salmon scenes. Subsequently, the high-detail body part tracks are employed to calculate welfare indicators. We construct two novel datasets assessing two salmon tracking challenges: salmon ID transfers in crowded scenes and salmon ID switches during turning. Our method outperforms the current state-of-the-art pedestrian tracker, BoostTrack, for both salmon tracking challenges. Additionally, we create a dataset for calculating salmon tail beat wavelength, demonstrating that our body part tracking method is well-suited for automated welfare monitoring based on tail beat analysis. Datasets and code are available at https://github.com/espenbh/BoostCompTrack.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a multi-purpose tracking framework for salmon in underwater net pens that first runs a pose estimator to produce per-salmon bounding boxes together with head, tail and body keypoints, then feeds these into a set of specialized modules that exploit part-level information to reduce ID transfers in crowded scenes and ID switches during turns. Two new tracking-challenge datasets and one tail-beat-wavelength dataset are introduced; the method is reported to outperform BoostTrack on the tracking tasks and to yield usable tail-beat measurements for welfare monitoring. Code and data are released.

Significance. If the performance gains can be attributed to the body-part modules and shown to be robust, the work would supply a single pipeline capable of supporting multiple welfare indicators, lowering the per-indicator overhead that currently limits deployment. The release of three domain-specific datasets and the open-source implementation are concrete contributions to applied multi-object tracking and aquaculture monitoring.

major comments (3)
  1. [§4.2] §4.2 and Table 2: the headline claim that the specialized modules resolve ID transfers and switches better than BoostTrack rests on an overall MOTA/IDF1 improvement, yet no ablation that removes the body-part modules (or replaces them with a plain SORT/ByteTrack baseline on the same pose boxes) is presented. Without this isolation it is impossible to verify that the reported gains arise from exploitation of tail/head keypoints rather than from other implementation details.
  2. [§3.3] §3.3: the description of the ID-transfer and turn-switch modules is high-level; no equations, pseudocode or decision thresholds are supplied for how part-keypoint consistency is used to reject or correct associations. This makes it difficult to assess whether the modules introduce new failure modes or require per-scene retuning.
  3. [§4.3] §4.3: the tail-beat-wavelength evaluation reports qualitative agreement with manual measurements but supplies neither quantitative error statistics (MAE, bias) nor an analysis of how tracking ID switches propagate into wavelength error. This weakens the claim that the body-part tracks are “well-suited” for automated welfare monitoring.
minor comments (2)
  1. [Figure 3] Figure 3 caption and §4.1: the color coding of tracks in the qualitative results is not explained, making it hard to verify the claimed reduction in ID switches.
  2. [Abstract] The abstract states outperformance “for both salmon tracking challenges” but does not quote the numerical margins; the reader must reach Table 2 to obtain them.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects for strengthening the validation and reproducibility of our work. We address each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [§4.2] §4.2 and Table 2: the headline claim that the specialized modules resolve ID transfers and switches better than BoostTrack rests on an overall MOTA/IDF1 improvement, yet no ablation that removes the body-part modules (or replaces them with a plain SORT/ByteTrack baseline on the same pose boxes) is presented. Without this isolation it is impossible to verify that the reported gains arise from exploitation of tail/head keypoints rather than from other implementation details.

    Authors: We agree that an explicit ablation isolating the body-part modules would provide clearer evidence for their contribution. The current comparison uses BoostTrack on the pose-derived boxes, but does not include a direct baseline with the specialized modules removed. In the revised manuscript we will add an ablation study that applies a standard SORT/ByteTrack tracker to the identical pose boxes, thereby quantifying the incremental benefit of the part-keypoint consistency logic. revision: yes

  2. Referee: [§3.3] §3.3: the description of the ID-transfer and turn-switch modules is high-level; no equations, pseudocode or decision thresholds are supplied for how part-keypoint consistency is used to reject or correct associations. This makes it difficult to assess whether the modules introduce new failure modes or require per-scene retuning.

    Authors: We acknowledge that the current description in §3.3 remains conceptual. To improve reproducibility and allow evaluation of potential failure modes or tuning requirements, we will include pseudocode for both modules together with the concrete decision thresholds (keypoint distance and orientation consistency criteria) in the revised version. revision: yes

  3. Referee: [§4.3] §4.3: the tail-beat-wavelength evaluation reports qualitative agreement with manual measurements but supplies neither quantitative error statistics (MAE, bias) nor an analysis of how tracking ID switches propagate into wavelength error. This weakens the claim that the body-part tracks are “well-suited” for automated welfare monitoring.

    Authors: The §4.3 evaluation was designed to demonstrate practical feasibility on real underwater footage. We agree that quantitative error metrics would strengthen the welfare-monitoring claim. In revision we will report MAE and bias relative to the manual annotations. A full propagation analysis of ID-switch effects on wavelength error will be added where the existing annotations permit; otherwise we will explicitly discuss this as a limitation of the current dataset. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical framework evaluated on novel datasets

full rationale

The paper proposes a tracking framework that extends existing pose estimation and tracking methods (e.g., comparison to external BoostTrack baseline) with specialized modules for salmon-specific challenges, then evaluates performance via newly constructed datasets for ID transfers, ID switches, and tail-beat analysis. No load-bearing derivations, equations, or predictions reduce by construction to fitted inputs, self-definitions, or self-citation chains. The central claims are direct empirical comparisons on external benchmarks and new data, rendering the approach self-contained without circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions in computer vision about the performance of pose estimation models and the utility of body part information for tracking disambiguation. No new entities are invented.

axioms (1)
  • domain assumption A pose estimation network trained on salmon images can provide reliable body part detections in real underwater farm conditions.
    Central to extracting the bounding boxes and body parts used by the specialized modules.

pith-pipeline@v0.9.0 · 5784 in / 1356 out tokens · 85749 ms · 2026-05-21T21:05:32.049212+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    uses a pose estimation network to extract bounding boxes around salmon and their corresponding body parts, and exploits information about the body parts, through specialized modules, to tackle challenges specific to underwater salmon scenes

  • IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The TurnModule determines whether a salmon is turning by evaluating a counter c... The CrowdedModule checks whether the small body parts associations suggest that the initial salmon bounding box association needs correction

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Wild salmon enumeration and monitoring using deep learning empowered detection and tracking.Frontiers in Marine Science, 10: 1200408, 2023

    William I Atlas, Sami Ma, Yi Ching Chou, Katrina Connors, Daniel Scurfield, Brandon Nam, Xiaoqiang Ma, Mark Cleve- land, Janvier Doire, Jonathan W Moore, et al. Wild salmon enumeration and monitoring using deep learning empowered detection and tracking.Frontiers in Marine Science, 10: 1200408, 2023. 2

  2. [2]

    Deep learning for automated shark detection and biometrics without key- points

    Jaden Clark, Chinmay Lalgudi, Mark Leone, Jayson Meribe, Sergio Madrigal-Mora, and Mario Espinoza. Deep learning for automated shark detection and biometrics without key- points. InComputer Vision – ECCV 2024 Workshops, pages 105–120, Cham, 2025. Springer Nature Switzerland. 2, 3

  3. [3]

    Mot20: A benchmark for multi object tracking in crowded scenes.arXiv preprint arXiv:2003.09003, 2020

    Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, and Laura Leal-Taix ´e. Mot20: A benchmark for multi object tracking in crowded scenes.arXiv preprint arXiv:2003.09003, 2020. 3, 4

  4. [4]

    Strongsort: Make deep- sort great again.IEEE Transactions on Multimedia, 25: 8725–8737, 2023

    Yunhao Du, Zhicheng Zhao, Yang Song, Yanyun Zhao, Fei Su, Tao Gong, and Hongying Meng. Strongsort: Make deep- sort great again.IEEE Transactions on Multimedia, 25: 8725–8737, 2023. 3

  5. [5]

    Accurate wound and lice detection in atlantic salmon fish using a convolutional neural network

    Aditya Gupta, Even Bringsdal, Kristian Muri Knausg ˚ard, and Morten Goodwin. Accurate wound and lice detection in atlantic salmon fish using a convolutional neural network. Fishes, 7(6):345, 2022. 1, 2, 3

  6. [6]

    Mask r-cnn

    Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. InProceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. 2

  7. [7]

    Automated computer vision based individual salmon (salmo salar) breathing rate estima- tion (sabre) for improved state observability.Aquaculture, 595:741535, 2025

    Espen Berntzen Høgstedt, Christian Schellewald, Rudolf Mester, and Annette Stahl. Automated computer vision based individual salmon (salmo salar) breathing rate estima- tion (sabre) for improved state observability.Aquaculture, 595:741535, 2025. 1, 2, 3

  8. [8]

    (mp)2t: Multiple people multiple parts tracker

    Hamid Izadinia, Imran Saleemi, Wenhui Li, and Mubarak Shah. (mp)2t: Multiple people multiple parts tracker. In Computer Vision – ECCV 2012, pages 100–114. Springer Berlin Heidelberg, 2012. 3

  9. [9]

    Ultralytics yolov8, 2023

    Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. 4

  10. [10]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 1, 2

  11. [11]

    Multi-animal pose estimation, identification and track- ing with deeplabcut.Nature Methods, 19(4):496–504, 2022

    Jessy Lauer, Mu Zhou, Shaokai Ye, William Menegas, Stef- fen Schneider, Tanmay Nath, Mohammed Mostafizur Rah- man, Valentina Di Santo, Daniel Soberanes, Guoping Feng, et al. Multi-animal pose estimation, identification and track- ing with deeplabcut.Nature Methods, 19(4):496–504, 2022. 2, 3, 4

  12. [12]

    Measuring feeding activity of fish in ras using computer vision.Aquacultural engineering, 60:20–27, 2014

    Ziyi Liu, Xian Li, Liangzhong Fan, Huanda Lu, Li Liu, and Ying Liu. Measuring feeding activity of fish in ras using computer vision.Aquacultural engineering, 60:20–27, 2014. 1

  13. [13]

    Sparsetrack: Multi-object tracking by performing scene decomposition based on pseudo-depth.IEEE Transac- tions on Circuits and Systems for Video Technology, 2025

    Zelin Liu, Xinggang Wang, Cheng Wang, Wenyu Liu, and Xiang Bai. Sparsetrack: Multi-object tracking by performing scene decomposition based on pseudo-depth.IEEE Transac- tions on Circuits and Systems for Video Technology, 2025. 3

  14. [14]

    Trackeval.https:// github.com/JonathonLuiten/TrackEval, 2020

    Jonathon Luiten and Arne Hoffhues. Trackeval.https:// github.com/JonathonLuiten/TrackEval, 2020. 5

  15. [15]

    Motion trajectory estima- tion of salmon using stereo vision.IFAC-PapersOnLine, 55 (31):363–368, 2022

    Trym Anthonsen Nyg ˚ard, Jan Henrik Jahren, Christian Schellewald, and Annette Stahl. Motion trajectory estima- tion of salmon using stereo vision.IFAC-PapersOnLine, 55 (31):363–368, 2022. 2

  16. [16]

    Sleap: A deep learning system for multi-animal pose track- ing.Nature methods, 19(4):486–495, 2022

    Talmo D Pereira, Nathaniel Tabris, Arie Matsliah, David M Turner, Junyu Li, Shruthi Ravindranath, Eleni S Papadoyan- nis, Edna Normand, David S Deutsch, Z Yan Wang, et al. Sleap: A deep learning system for multi-animal pose track- ing.Nature methods, 19(4):486–495, 2022. 2, 3

  17. [17]

    You only look once: Unified, real-time object de- tection

    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. 2

  18. [18]

    Stereoyolo+ deepsort: a framework to track fish from underwater stereo camera in situ

    Aya Saad, Stian Jakobsen, Morten Bondø, Mats Mulelid, and Eleni Kelasidi. Stereoyolo+ deepsort: a framework to track fish from underwater stereo camera in situ. InSixteenth International Conference on Machine Vision (ICMV 2023), pages 321–329. SPIE, 2024. 2

  19. [19]

    Non-invasive swimming speed estimation method based on tail-beat frequency determined from fish length measurement using stereo-cameras.Fisheries Science, 90(6):1001–1010,

    Yuto Sasaki, Rin Nishikawa, and Kazuyoshi Komeyama. Non-invasive swimming speed estimation method based on tail-beat frequency determined from fish length measurement using stereo-cameras.Fisheries Science, 90(6):1001–1010,

  20. [20]

    Mouth opening frequency of salmon from underwater video exploit- ing computer vision.IFAC-PapersOnLine, 58(20):313–318,

    Christian Schellewald, Aya Saad, and Annette Stahl. Mouth opening frequency of salmon from underwater video exploit- ing computer vision.IFAC-PapersOnLine, 58(20):313–318,

  21. [21]

    15th IFAC Conference on Control Applications in Ma- rine Systems, Robotics and Vehicles CAMS 2024. 2, 3

  22. [22]

    Adaptrack: Adaptive thresholding-based matching for multi-object tracking

    Kyujin Shim, Kangwook Ko, Jubi Hwang, and Changick Kim. Adaptrack: Adaptive thresholding-based matching for multi-object tracking. In2024 IEEE International Confer- ence on Image Processing (ICIP), pages 2222–2228. IEEE,

  23. [23]

    Part-based multiple-person tracking with partial occlusion handling

    Guang Shu, Afshin Dehghan, Omar Oreifej, Emily Hand, and Mubarak Shah. Part-based multiple-person tracking with partial occlusion handling. In2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 1815–

  24. [24]

    Boost- track: boosting the similarity measure and detection confi- dence for improved multiple object tracking.Machine Vision and Applications, 35(3):1–15, 2024

    Vukasin D Stanojevic and Branimir T Todorovic. Boost- track: boosting the similarity measure and detection confi- dence for improved multiple object tracking.Machine Vision and Applications, 35(3):1–15, 2024. 3, 4, 6

  25. [25]

    Measuring tail beat frequency and coast phase in school of fish for collective motion analysis

    Kei Terayama, Hirohisa Hioki, and Masa-aki Sakagami. Measuring tail beat frequency and coast phase in school of fish for collective motion analysis. InEighth Interna- tional Conference on Graphic and Image Processing (ICGIP 2016), pages 349–356. SPIE, 2017. 3

  26. [26]

    The influence of simulated pressure changes on the behavior of larimichthys crocea during the deep sea submarine descent of net cages.Frontiers in Marine Science, 11:1402762, 2024

    Tong Tong, Xu Yang, Fukun Gui, Jiajun Hu, Shuai Niu, Lianghao Tang, Hengda Huang, and Yucheng Jiang. The influence of simulated pressure changes on the behavior of larimichthys crocea during the deep sea submarine descent of net cages.Frontiers in Marine Science, 11:1402762, 2024. 3

  27. [27]

    Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, St ´efan J

    Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, St ´efan J. van der Walt, Matthew Brett, Joshua Wil- son, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, ˙Ilhan Polat, Yu Feng, Eric ...

  28. [28]

    Smiletrack: Simi- larity learning for occlusion-aware multiple object tracking

    Yu-Hsiang Wang, Jun-Wei Hsieh, Ping-Yang Chen, Ming- Ching Chang, Hung-Hin So, and Xin Li. Smiletrack: Simi- larity learning for occlusion-aware multiple object tracking. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 5740–5748, 2024. 3

  29. [29]

    Towards real-time multi-object tracking

    Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, and Shengjin Wang. Towards real-time multi-object tracking. In European conference on computer vision, pages 107–122. Springer, 2020. 2

  30. [30]

    Simple online and realtime tracking with a deep association metric

    Nicolai Wojke, Alex Bewley, and Dietrich Paulus. Simple online and realtime tracking with a deep association metric. In2017 IEEE International Conference on Image Processing (ICIP), pages 3645–3649, 2017. 3

  31. [31]

    Water quality monitoring using abnormal tail-beat frequency of crucian carp.Ecotoxicology and Environmental Safety, 111:185–191, 2015

    Gang Xiao, Min Feng, Zhenbo Cheng, Meirong Zhao, Jiafa Mao, and Luke Mirowski. Water quality monitoring using abnormal tail-beat frequency of crucian carp.Ecotoxicology and Environmental Safety, 111:185–191, 2015. 1, 3

  32. [32]

    Computer vision-based detection and tracking of fish in aquaculture environments

    Giovanni Zebele. Computer vision-based detection and tracking of fish in aquaculture environments. Bachelor’s the- sis, University of Padova, 2022. 2

  33. [33]

    Estimation for fish mass using image analysis and neural network.Com- puters and Electronics in Agriculture, 173:105439, 2020

    Lu Zhang, Jianping Wang, and Qingling Duan. Estimation for fish mass using image analysis and neural network.Com- puters and Electronics in Agriculture, 173:105439, 2020. 2