pith. sign in

arxiv: 2605.30399 · v1 · pith:H23XWRKCnew · submitted 2026-05-28 · 🧬 q-bio.QM · cs.LG· eess.IV

A Novel Computer Vision Approach for Assessing Fish Responses to Intrusive Objects in Aquaculture

Pith reviewed 2026-06-28 23:42 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.LGeess.IV
keywords computer visionaquaculturefish trackingcaudal finstereo visionbehavior analysissea cagesYOLO
0
0 comments X

The pith

A computer vision pipeline tracks fish caudal fins in sea cages to measure responses to intrusive objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and tests a stereo-vision system that detects, tracks, and reconstructs 3D positions of individual fish by focusing on their caudal fins. It combines YOLOv8 detection, ByteTrack, SuperGlue matching, and triangulation, then applies the output to analyze velocities, accelerations, turning, and pitch angles around objects of varying shapes, sizes, and colors. Datasets come from industrial sea cages. A sympathetic reader would care because the approach aims to give concrete data on how farm structures affect fish behavior, which bears directly on welfare in large-scale aquaculture.

Core claim

The central claim is that a caudal-fin tracking pipeline trained on manually labeled images, using YOLOv8 with ByteTrack for detection and tracking, SuperGlue for stereo matching, and triangulation for 3D reconstruction, produces usable measurements of fish kinematics on industrial-scale datasets and thereby reveals behavioral responses to intrusive objects on both individual and group levels.

What carries the argument

The stereo-vision pipeline that detects and tracks caudal fins then triangulates their 3D positions to obtain velocities, accelerations, and orientation angles.

If this is right

  • The tracking data can quantify impacts of objects on fish behavior at both individual and group scales.
  • Different image pre-processing and augmentation steps can be compared to improve detection under underwater conditions.
  • The resulting kinematic measures (velocity, acceleration, turning, pitch) can be used to compare responses across object shapes, sizes, and colors.
  • Performance matches or exceeds prior methods while adding 3D reconstruction suited to sea-cage environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The 3D trajectory data could support automated alerts when fish show avoidance patterns around certain structures.
  • The same caudal-fin approach might be extended to track responses to other variables such as feeding or water flow changes.
  • Longer recordings could allow statistical comparison of group-level dynamics across multiple cages.

Load-bearing premise

Manually labeled caudal-fin detections from a limited training set will continue to work accurately under the variable lighting, turbidity, and high fish densities of real industrial sea cages.

What would settle it

A clear drop in detection precision, tracking continuity, or 3D position accuracy when the trained model is run on new sea-cage video recorded under different lighting or density conditions.

Figures

Figures reproduced from arXiv: 2605.30399 by Eleni Kelasidi, Hanne-Grete Alvheim, Martin F{\o}re, Stian Mjelde Jakobsen.

Figure 1
Figure 1. Figure 1: Illustration of the experimental setup for data acquisition from industrial scale fish farm of SINTEF ACE [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A block diagram depicting the entire process from capturing data to tracking and estimating fish positions. Inputs: data [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A non-rectified frame (a) compared to a rectified [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A single left frame altered by different pre-processing approaches. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Object detection on a single left frame [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: MOT on a single left frame to gather corresponding points and information about these. Associated points in each frame are combined with the center points from the bounding box to estimate the center of the tail of the fish in the right frame. When there are no point correspondences, detections are discarded. Depth estimation is done by testing two different ap￾proaches, the first of which uses triangulati… view at source ↗
Figure 7
Figure 7. Figure 7: MOT and stereo matching on a single left frame with multiple different pre-processing approaches [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of the tracked fish and the associated [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A single frame of annotated and detected fishtails for two different object detection approaches. The green-colored [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of RAFT-Stereo and classical triangu [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Boxplots comparing the average aggregated minimum distance estimates between tracked fish and structures for [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Boxplots comparing the aggregated average minimum distance estimates between tracked fish and colors for the [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Derived features for a single fish derived velocity. The low values observed for acceleration and turning angle in particular indicate low activity levels, and could suggest that the fish did not experience stress￾related responses by being in close proximity to the intrusive objects. While the parameters did not present large differences in response in this particular study, the deeper insight into behav… view at source ↗
read the original abstract

The aquaculture industry needs to address several challenges to secure sustainable seafood production that can serve an increasing global demand. One major challenge is to ensure good fish health and acceptable welfare during production since the improvement of fish welfare is of vital importance in current and future production systems. In this study, this is addressed by developing and implementing methods to identify fish behaviors in response to intrusive objects both on individual and on a group basis. A novel approach for detecting, tracking, and estimating the 3D position of individual fish has thus been developed, and specifically designed to track the caudal fins of farmed fish in industrial sea cages. The tracking data was subjected to a novel stereo-vision method adapted to estimate fish positions, velocities, accelerations, and turning and pitch angles. Datasets obtained from industrial-scale fish farms were then analyzed to identify the impact of structures of varying shapes, sizes, and colors on fish behavior. The method was trained using manually labeled caudal fins, and used YOLOv8 with ByteTrack as an object detector and tracker, SuperGlue for matching detections in the left and right frames, and triangulation to reconstruct the 3D positions of the fish. Different image pre-processing and augmentation methods for enhancing object detection accuracy were tested and their performance compared, while RAFT-Stereo was tested for depth estimation purposes. The obtained results both validate the method's performance against previous research efforts, and demonstrate the novelty and potential of this method in providing more insight into behavioral dynamics in sea-cages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript develops a computer vision pipeline (YOLOv8 + ByteTrack for caudal-fin detection and tracking, SuperGlue for stereo matching, and triangulation for 3D reconstruction) to quantify individual and group fish responses to intrusive objects of varying shapes, sizes, and colors in industrial sea cages. It reports testing image pre-processing and augmentation strategies plus RAFT-Stereo for depth, and claims that the obtained results validate performance against prior work while demonstrating novelty for behavioral analysis.

Significance. If quantitative validation on industrial data were provided, the approach could supply a practical stereo-vision tool for non-invasive welfare monitoring in aquaculture; the use of off-the-shelf components on real sea-cage video is a pragmatic strength, but the current absence of domain-specific metrics prevents assessment of whether the claimed behavioral insights are supported.

major comments (3)
  1. [Abstract] Abstract: the statement that 'the obtained results both validate the method's performance against previous research efforts' supplies no quantitative metrics (mAP, MOTA, ID-switch rate, 3D triangulation error, or statistical comparisons) on the target industrial sea-cage data, rendering the central validation claim unsupported.
  2. [Methods] Methods (training and generalization paragraph): the pipeline is trained on 'manually labeled caudal fins' from a 'limited' set of images, yet no training-set size, validation protocol, or cross-domain performance numbers are reported for the variable lighting, turbidity, and high-density conditions of the industrial datasets; this directly undermines the weakest assumption that detection and 3D reconstruction accuracy will hold.
  3. [Results] Results (behavioral analysis section): no sample sizes, error bars, or statistical tests are supplied for the reported velocities, accelerations, turning/pitch angles, or group-level responses, so it is impossible to evaluate whether the claimed insights into behavioral dynamics are statistically distinguishable from noise or prior methods.
minor comments (1)
  1. [Abstract] The abstract and methods would benefit from explicit statements of the number of fish, video duration, and number of intrusive-object trials analyzed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, indicating revisions where the manuscript requires strengthening to support its claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that 'the obtained results both validate the method's performance against previous research efforts' supplies no quantitative metrics (mAP, MOTA, ID-switch rate, 3D triangulation error, or statistical comparisons) on the target industrial sea-cage data, rendering the central validation claim unsupported.

    Authors: We agree that the abstract claim is not supported by quantitative metrics on the industrial sea-cage data. The manuscript provides qualitative discussion of performance relative to prior work but does not report mAP, MOTA, or 3D error numbers on the target domain. We will revise the abstract to remove the unsupported validation phrasing and focus solely on the demonstrated novelty for behavioral analysis in sea cages. revision: yes

  2. Referee: [Methods] Methods (training and generalization paragraph): the pipeline is trained on 'manually labeled caudal fins' from a 'limited' set of images, yet no training-set size, validation protocol, or cross-domain performance numbers are reported for the variable lighting, turbidity, and high-density conditions of the industrial datasets; this directly undermines the weakest assumption that detection and 3D reconstruction accuracy will hold.

    Authors: The manuscript describes training on manually labeled caudal fins from a limited set but indeed omits exact training-set size, split protocol, and quantitative cross-domain metrics. We will expand the methods section to report the precise training-set size, validation approach used, and any available performance numbers. However, full quantitative evaluation on the unlabeled industrial data was not performed; generalization is supported only by the qualitative results shown in the figures. revision: partial

  3. Referee: [Results] Results (behavioral analysis section): no sample sizes, error bars, or statistical tests are supplied for the reported velocities, accelerations, turning/pitch angles, or group-level responses, so it is impossible to evaluate whether the claimed insights into behavioral dynamics are statistically distinguishable from noise or prior methods.

    Authors: We acknowledge the absence of sample sizes, error bars, and statistical tests in the behavioral results. This prevents rigorous assessment of the reported differences. We will revise the results section to include sample sizes (number of tracked fish and frames), error bars where applicable, and appropriate statistical tests (e.g., t-tests or ANOVA with p-values) for the velocity, acceleration, and angle measurements to substantiate the behavioral claims. revision: yes

Circularity Check

0 steps flagged

No circularity: off-the-shelf CV pipeline applied to new data

full rationale

The paper applies existing detectors (YOLOv8), trackers (ByteTrack), matchers (SuperGlue), and stereo methods (RAFT-Stereo, triangulation) to manually labeled caudal-fin video from sea cages. No equations, fitted parameters, or derivations are presented that reduce any claimed result to its own inputs by construction. Validation is stated against prior external research; no self-citation chain or ansatz smuggling is load-bearing. This is the common case of an applied-methods paper whose central claims rest on empirical performance rather than self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper applies existing computer-vision components without introducing new fitted parameters or postulated entities; it rests on standard domain assumptions about detector generalization and stereo geometry.

axioms (2)
  • domain assumption YOLOv8 trained on manually labeled caudal fins produces reliable detections under farm conditions
    Central to the detection and tracking stage described in the abstract.
  • domain assumption SuperGlue matching followed by triangulation yields accurate 3D positions from stereo frames
    Required for the velocity, acceleration, and angle estimates.

pith-pipeline@v0.9.1-grok · 5826 in / 1269 out tokens · 24971 ms · 2026-06-28T23:42:54.457572+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages

  1. [1]

    National Aquaculture Sector Overview. Norway,

    T. Venvik, “National Aquaculture Sector Overview. Norway,” FAO Fisheries and Aquaculture Division, 2023, accessed: 2023-12-19. [Online]. Available: https://firms.fao.org/fi/website/FIRetrieveAction. do?dom=countrysector&xml=naso norway.xml&lang=en

  2. [2]

    Exposed aquaculture in norway,

    H. V . Bjelland, M. Føre, P. Lader, D. Kristiansen, I. M. Holmen, A. Fredheim, E. I. Grøtli, D. E. Fathi, F. Oppedal, I. B. Utne, and I. Schjølberg, “Exposed aquaculture in norway,” inOCEANS 2015 - MTS/IEEE Washington, 2015, pp. 1–10

  3. [3]

    Reducing risk in aquaculture through autonomous underwater operations,

    I. B. Utne, I. Schjølberg, S. Sandøy, X. Yang, and I. M. Holmen, “Reducing risk in aquaculture through autonomous underwater operations,” 2018. [Online]. Available:{https://iapsam.org/psam14/ proceedings/paper/paper 98 1.pdf}

  4. [4]

    Precision fish farming: A new framework to improve production in aquaculture,

    M. Føre, K. Frank, T. Norton, E. Svendsen, J. A. Alfredsen, T. Demp- ster, H. Eguiraun, W. Watson, A. Stahl, L. M. Sunde, C. Schellewald, K. R. Skøien, M. O. Alver, and D. Berckmans, “Precision fish farming: A new framework to improve production in aquaculture,” 2017

  5. [5]

    Advanced technology in aquaculture–smart feeding in marine fish farms,

    M. Føre, M. O. Alver, K. Frank, and J. A. Alfredsen, “Advanced technology in aquaculture–smart feeding in marine fish farms,” in Smart Livestock Nutrition. Springer, 2023, pp. 227–268

  6. [6]

    Planning for fish net inspection with an autonomous osv,

    T. X. Lin, Q. Tao, and F. Zhang, “Planning for fish net inspection with an autonomous osv,” in2020 International Conference on System Science and Engineering (ICSSE), 2020, pp. 1–5

  7. [7]

    Kelasidi and E

    E. Kelasidi and E. Svendsen,Robotics for Sea-Based Fish Farming. Springer International Publishing, 2023, pp. 1–20

  8. [8]

    Salmon behavioural response to robots in an aquaculture sea cage,

    M. Kruusmaa, R. Gkliva, J. A. Tuhtan, A. Tuvikene, and J. A. Alfredsen, “Salmon behavioural response to robots in an aquaculture sea cage,”Royal Society Open Science, vol. 7, no. 3, p. 191220, 2020

  9. [9]

    Farmed atlantic salmon (salmo salar l.) avoid intrusive objects in cages: The influence of object shape, size and colour, and fish length,

    Q. Zhang, N. Bloecher, L. D. Evjemo, M. Føre, B. Su, E. Eilertsen, M. A. Mulelid, and E. Kelasidi, “Farmed atlantic salmon (salmo salar l.) avoid intrusive objects in cages: The influence of object shape, size and colour, and fish length,”Aquaculture, p. 740429, 2023

  10. [10]

    Computer vision and robotics techniques in fish farms. robotica,

    J. R. Martinez-de Dios, C. Serna , and A. Ollero, “Computer vision and robotics techniques in fish farms. robotica,”Robotica, vol. 21, pp. 233 – 243, 06 2003

  11. [13]

    Stereoyolo + deepsort: A framework to track fish from underwater stereo camera in situ,

    A. Saad, E. Kelasidi, S. Jakobsen, M. A. Mulelid, and M. S. Bondø, “Stereoyolo + deepsort: A framework to track fish from underwater stereo camera in situ,” 2023, presented at the 16th International Conference on Machine Vision. [Online]. Available: https://www.sintef.no/en/publications/publication/2175266/

  12. [14]

    Aqua3dnet: Real-time 3d pose estimation of livestock in aquaculture by monocular machine vision,

    M. E. Koh, M. W. K. Fong, and E. Y . K. Ng, “Aqua3dnet: Real-time 3d pose estimation of livestock in aquaculture by monocular machine vision,”Aquacultural Engineering, vol. 103, p. 102367, 2023

  13. [15]

    Motion tra- jectory estimation of salmon using stereo vision,

    T. A. Nyg ˚ard, J. H. Jahren, C. Schellewald, and A. Stahl, “Motion tra- jectory estimation of salmon using stereo vision,”IFAC-PapersOnLine, vol. 55, no. 31, pp. 363–368, 2022, 14th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles CAMS 2022

  14. [16]

    High-speed tracking-by- detection without using image information,

    E. Bochinski, V . Eiselein, and T. Sikora, “High-speed tracking-by- detection without using image information,” in2017 14th IEEE In- ternational Conference on Advanced Video and Signal Based Surveil- lance (AVSS), 2017, pp. 1–6

  15. [17]

    Early warning through video monitoring: Dissolved hydrogen sulphide (h2s) affects atlantic salmon swimming behavior in recirculating aquaculture systems,

    E. Ciani, B. Kvæstad, M. Stormoen, I. Mayer, S. Gupta, D. Ribi ˇci´c, and R. Netzer, “Early warning through video monitoring: Dissolved hydrogen sulphide (h2s) affects atlantic salmon swimming behavior in recirculating aquaculture systems,”Aquaculture, vol. 581, p. 740201, 2024. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0044...

  16. [18]

    Yolov8 docs,

    Ultralytics, “Yolov8 docs,” accessed on 21.08.2023. [Online]. Available: https://docs.ultralytics.com/

  17. [19]

    Bytetrack: Multi-object tracking by associating every detection box,

    Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” 2022

  18. [20]

    SuperGlue: Learning feature matching with graph neural networks,

    P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperGlue: Learning feature matching with graph neural networks,” inCVPR, 2020. [Online]. Available: https://arxiv.org/abs/1911.11763

  19. [21]

    Raft-stereo: Multilevel recurrent field transforms for stereo matching,

    L. Lipson, Z. Teed, and J. Deng, “Raft-stereo: Multilevel recurrent field transforms for stereo matching,” 2021

  20. [22]

    “Ace,” https://www.sintef.no/en/all-laboratories/ace/, accessed: 2023- 12-19

  21. [23]

    Improving disparity map of a specific object in a stereo image using camera calibration, image rectification, and object segmentation,

    H. Syahputra, A. Harjoko, and R. Pulungan, “Improving disparity map of a specific object in a stereo image using camera calibration, image rectification, and object segmentation,”International Journal of Applied Engineering Research, vol. 9, pp. 17 939–17 949, 01 2014

  22. [24]

    Faugeras, R

    O. Faugeras, R. Hartley, and A. Zisserman,Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press, 2004. [Online]. Available: https://www.example.com/book/ multiple-view-geometry

  23. [25]

    Introduction to wavelets and wavelet transforms: A primer,

    C. S. Burrus, R. A. Gopinath, and H. Guo, “Introduction to wavelets and wavelet transforms: A primer,” 1998

  24. [26]

    Morphological area openings and closings for grey-scale images,

    L. Vincent, “Morphological area openings and closings for grey-scale images,” inShape in Picture, Y .-L. O, A. Toet, D. Foster, H. J. A. M. Heijmans, and P. Meer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1994, pp. 197–208

  25. [27]

    Viii.5. - contrast limited adaptive histogram equaliza- tion,

    K. Zuiderveld, “Viii.5. - contrast limited adaptive histogram equaliza- tion,” inGraphics Gems, P. S. Heckbert, Ed. Academic Press, 1994, pp. 474–485

  26. [28]

    Sea-thru: A method for removing water from underwater images,

    D. Akkaynak and T. Treibitz, “Sea-thru: A method for removing water from underwater images,”Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019- June, pp. 1682–1691, 6 2019

  27. [29]

    “diplib,” https://github.com/DIPlib/diplib, accessed: 2023-09-05

  28. [30]

    Implementation of sea-thru by derya akkaynak and tali treibitz,

    J. Gibson, “Implementation of sea-thru by derya akkaynak and tali treibitz,” https://github.com/hainh/sea-thru, 2020, accessed: 2023-09- 05

  29. [31]

    Superpoint: Self- supervised interest point detection and description,

    D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self- supervised interest point detection and description,” 2018

  30. [32]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2023

  31. [33]

    The hungarian method for the assignment problem,

    H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955

  32. [34]

    Raft: Recurrent all-pairs field transforms for optical flow,

    Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” 2020

  33. [35]

    Joint 3d estimation of vehicles and scene flow,

    M. Menze, C. Heipke, and A. Geiger, “Joint 3d estimation of vehicles and scene flow,”ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. II-3/W5, pp. 427–434, 08 2015

  34. [36]

    High-resolution stereo datasets with subpixel-accurate ground truth,

    D. Scharstein, H. Hirschm ¨uller, Y . Kitajima, G. Krathwohl, N. Neˇsi´c, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” vol. 8753, 09 2014, pp. 31–42

  35. [37]

    Novel tag-based method for measuring tailbeat frequency and variations in amplitude in fish,

    F. Warren-Myers, E. Svendsen, M. Føre, O. Folkedal, F. Oppedal, and M. Hvas, “Novel tag-based method for measuring tailbeat frequency and variations in amplitude in fish,”Animal Biotelemetry, vol. 11, no. 1, p. 12, 2023

  36. [38]

    Smoothing and differentiation of data by simplified least squares procedures

    A. Savitzky and M. J. E. Golay, “Smoothing and differentiation of data by simplified least squares procedures.”Analytical Chemistry, vol. 36, no. 8, pp. 1627–1639, 1964. APPENDIXI DISTANCEESTIMATES ANDFPS Shape Color Average minimum distance [m] Initial MO-WT Initial augmented MO-WT augmented Big Cylinder White 1.18 1.16 1.18 1.18 Yellow 1.66 1.53 1.74 1....