pith. sign in

arxiv: 2605.07653 · v2 · submitted 2026-05-08 · 💻 cs.CV · eess.IV

Aquatic Neuromorphic Optical Flow

Pith reviewed 2026-05-14 21:21 UTC · model grok-4.3

classification 💻 cs.CV eess.IV
keywords event camerasoptical flowspiking neural networksunderwater visionself-supervised learningneuromorphic computingmotion estimationaquatic perception
0
0 comments X

The pith

A self-supervised spiking neural network estimates per-pixel optical flow from underwater event streams without any labeled data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a self-supervised framework built on spiking neural networks that computes optical flow directly from asynchronous event camera streams in aquatic settings. This method sidesteps the scarcity of labeled underwater datasets by training on the intrinsic structure of event data rather than external supervision. If the approach holds, it delivers competitive accuracy alongside much lower computational cost than standard vision pipelines, making real-time motion sensing feasible on power-limited underwater hardware. Readers focused on practical deployment would note that conventional cameras falter in low-light or turbid water, while event sensors paired with this network could sustain agile perception for robots or vehicles. The work positions neuromorphic sensing as a direct route to efficient aquatic intelligence.

Core claim

A spiking neural network trained in a fully self-supervised manner on raw event streams can recover accurate per-pixel optical flow fields in underwater scenes, matching leading supervised methods in visual and quantitative quality while consuming far less power and computation.

What carries the argument

Self-supervised spiking neural network that learns motion from asynchronous event streams by exploiting temporal consistency in the event data itself.

If this is right

  • Real-time per-pixel motion estimation becomes practical on low-power underwater edge devices.
  • Labeled underwater optical-flow datasets are no longer required for training.
  • Neuromorphic pipelines can operate continuously in turbid or low-light aquatic conditions where frame cameras fail.
  • Computational budgets for underwater vehicles drop enough to allow simultaneous execution of other perception tasks.
  • Lightweight autonomous systems gain a pathway to agile navigation without heavy GPU hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same self-supervised event-to-flow pipeline could transfer to other data-scarce environments such as deep-sea or space-based robotics.
  • Combining this network with spiking depth or segmentation heads would test whether a single neuromorphic stack can handle multiple underwater perception tasks.
  • Long-term deployment tests on actual AUVs would reveal whether the efficiency gains translate into measurable increases in mission duration.
  • If event noise characteristics differ sharply across water types, the method may need only minor recalibration rather than full retraining.

Load-bearing premise

Underwater event streams contain enough inherent structure that a spiking network can learn accurate optical flow without labeled examples or any domain-specific tuning.

What would settle it

On real underwater event sequences with independent ground-truth flow, the self-supervised network produces flow fields whose endpoint error exceeds that of a standard supervised baseline by more than 30 percent.

Figures

Figures reproduced from arXiv: 2605.07653 by Kaiqiang Wang, Pei Zhang, Yunkai Liang.

Figure 1
Figure 1. Figure 1: (a) Underwater and on-land vision differ in focus. (b) Neuromorphic [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) An event stream of a coral scene and its corresponding event-count representations. (b) Dynamics of an LIF neuron in which the firing threshold [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual results on AquaticVision, DAVIS-NUIUIED, and Aqua-Eye datasets, with a color-coding scheme shown alongside. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Our neuromorphic optical flow facilitates camouflaged fish detection. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study on how our modules affect learning performance. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
read the original abstract

Underwater environments impose severe constraints on conventional imaging systems and demand solutions that balance high-quality sensing with strict resource efficiency. While emerging event cameras offer a promising alternative, their potential in aquatic scenarios remains largely unexplored. Through the lens of neuromorphic vision, this work pioneers the investigation of motion fields that serve as key media for agile underwater perception. Built upon spiking neural networks, we introduce a self-supervised framework to estimate per-pixel optical flow from asynchronous event streams, elegantly bypassing the long-standing bottleneck of underwater data scarcity. Extensive evaluations demonstrate that our method achieves competitive visual and quantitative results against leading techniques while operating with superior computational efficiency. By bridging neuromorphic sensing and aquatic intelligence, this work opens new frontiers for lightweight, real-time, and low-cost perception on resource-constrained underwater edge platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a self-supervised framework based on spiking neural networks (SNNs) to estimate per-pixel optical flow directly from asynchronous event camera streams in underwater environments. It claims this approach bypasses the need for labeled aquatic datasets, achieves competitive visual and quantitative performance against existing methods, and delivers superior computational efficiency suitable for resource-constrained underwater edge platforms.

Significance. If validated, the work would meaningfully extend neuromorphic vision to challenging aquatic domains where conventional cameras fail due to scattering and attenuation. The self-supervised formulation is a notable strength, as it directly addresses data scarcity without requiring domain-specific labeled data or heavy adaptations, potentially enabling lightweight real-time perception on underwater vehicles.

major comments (2)
  1. [Abstract] Abstract: the central claim that the method 'achieves competitive visual and quantitative results against leading techniques' cannot be evaluated because no datasets, metrics (e.g., average endpoint error, F1 score), baselines, or error analysis are presented; this directly undermines the assertion of bypassing underwater data scarcity.
  2. [Framework] Framework / loss definition (inferred from abstract description of self-supervised SNN): the approach relies on an implicit self-supervised loss (likely contrast maximization or time-surface consistency) without explicit terms for underwater scattering or low event density; if event rates drop below the threshold needed for stable gradients, the optimization can converge to trivial or noisy flow fields, violating the assumption that event streams contain sufficient non-degenerate structure.
minor comments (2)
  1. The abstract refers to 'extensive evaluations' yet provides no reference to figures, tables, or supplementary material containing the quantitative results.
  2. Notation for the event stream representation and SNN spiking mechanism should be defined explicitly to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the manuscript to improve clarity and completeness where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the method 'achieves competitive visual and quantitative results against leading techniques' cannot be evaluated because no datasets, metrics (e.g., average endpoint error, F1 score), baselines, or error analysis are presented; this directly undermines the assertion of bypassing underwater data scarcity.

    Authors: The abstract is a concise summary; the full manuscript details the evaluations in Section 4, including specific underwater-adapted event datasets, metrics such as average endpoint error, F1 score, baselines from prior event-based methods, and error analysis. To address the concern directly, we have revised the abstract to briefly reference these elements and emphasize the self-supervised approach on unlabeled data. revision: yes

  2. Referee: [Framework] Framework / loss definition (inferred from abstract description of self-supervised SNN): the approach relies on an implicit self-supervised loss (likely contrast maximization or time-surface consistency) without explicit terms for underwater scattering or low event density; if event rates drop below the threshold needed for stable gradients, the optimization can converge to trivial or noisy flow fields, violating the assumption that event streams contain sufficient non-degenerate structure.

    Authors: Section 3 explicitly defines the self-supervised loss as a contrast-maximization objective integrated with the spiking network. No dedicated scattering term is present because the event-driven formulation prioritizes motion-induced changes resilient to attenuation. We acknowledge the low-event-density concern and have added discussion plus ablation results on event-rate thresholds to demonstrate avoidance of trivial solutions via network regularization and sparsity handling. revision: partial

Circularity Check

0 steps flagged

No significant circularity in self-supervised event-based optical flow derivation

full rationale

The paper introduces a self-supervised spiking neural network framework for per-pixel optical flow estimation directly from asynchronous underwater event streams. No equations, loss definitions, or training procedures in the abstract or description reduce the output flow field to a fitted parameter or input by construction. The self-supervision claim relies on standard contrast or consistency objectives applied to the event data itself rather than any self-referential definition or prior self-citation that would force the result. The derivation chain remains independent of the target underwater results and does not import uniqueness theorems or ansatzes from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all details on training, architecture, or assumptions are absent.

pith-pipeline@v0.9.0 · 5421 in / 1023 out tokens · 39576 ms · 2026-05-14T21:21:46.678627+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

  1. [1]

    A survey on underwater computer vision,

    S. P. Gonz ´alez-Sabbagh and A. Robles-Kelly, “A survey on underwater computer vision,”ACM Computing Surveys, vol. 55, no. 13s, pp. 1–39, 2023

  2. [2]

    SpectroGen: a physically informed gener- ative artificial intelligence for accelerated cross-modality spectroscopic materials characterization,

    Y . Zhu and L. F. Tadesse, “SpectroGen: a physically informed gener- ative artificial intelligence for accelerated cross-modality spectroscopic materials characterization,”Matter, vol. 9, no. 1, p. 102434, 2026

  3. [3]

    Lossless compression of event camera frames,

    I. Schiopu and R. C. Bilcu, “Lossless compression of event camera frames,”IEEE Signal Processing Letters, vol. 29, pp. 1779–1783, 2022

  4. [4]

    Neuromorphic imaging with density-based spatiotemporal denoising,

    P. Zhang, Z. Ge, L. Song, and E. Y . Lam, “Neuromorphic imaging with density-based spatiotemporal denoising,”IEEE Transactions on Computational Imaging, vol. 9, pp. 530–541, 2023

  5. [5]

    Event-based shutter unrolling and motion deblurring in dynamic scenes,

    Y . Wang, C. Jiang, X. Jia, Y . Guo, and L. Yu, “Event-based shutter unrolling and motion deblurring in dynamic scenes,”IEEE Signal Processing Letters, vol. 31, pp. 1069–1073, 2024

  6. [6]

    Angle- based neuromorphic wave normal sensing,

    C. Wang, S. Zhu, P. Zhang, K. Wang, J. Huang, and E. Y . Lam, “Angle- based neuromorphic wave normal sensing,”Laser & Photonics Reviews, vol. 19, no. 4, p. 2400647, 2025

  7. [7]

    Neuromorphic imaging with super-resolution,

    P. Zhang, S. Zhu, C. Wang, Y . Zhao, and E. Y . Lam, “Neuromorphic imaging with super-resolution,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 2, pp. 1715–1727, 2025

  8. [8]

    Low-latency automotive vision with event cameras,

    D. Gehrig and D. Scaramuzza, “Low-latency automotive vision with event cameras,”Nature, vol. 629, pp. 1034–1040, 2024

  9. [9]

    Ultrafast dynamic defect inspection with computational neuromorphic imaging,

    S. Zhu, Q. Yin, C. Wang, J. Huang, and E. Y . Lam, “Ultrafast dynamic defect inspection with computational neuromorphic imaging,”Advanced Science, vol. 12, no. 44, p. e10338, 2025

  10. [10]

    Event-based stereo depth es- timation by temporal-spatial context learning,

    W. Chen, Y . Zhang, X. Sun, and F. Wu, “Event-based stereo depth es- timation by temporal-spatial context learning,”IEEE Signal Processing Letters, vol. 31, pp. 1429–1433, 2024

  11. [11]

    Self-calibrated neuromorphic hyperspectral derivative imaging,

    R. Chen, C. Wang, Y . Li, Y . Cao, S. Zhu, and E. Y . Lam, “Self-calibrated neuromorphic hyperspectral derivative imaging,”Optica, vol. 13, no. 4, pp. 587–590, 2026

  12. [12]

    Fast event-based optical flow estimation by triplet matching,

    S. Shiba, Y . Aoki, and G. Gallego, “Fast event-based optical flow estimation by triplet matching,”IEEE Signal Processing Letters, vol. 29, pp. 2712–2716, 2023

  13. [13]

    Dark-EvGS: event camera as an eye for radiance field in the dark,

    J. Wu, P. Duan, Z. Wang, C. Wang, B. Shi, and E. Y . Lam, “Dark-EvGS: event camera as an eye for radiance field in the dark,”IEEE Transactions on Image Processing, vol. 35, pp. 3172–3185, 2026

  14. [14]

    AquaticVision: benchmarking visual SLAM in underwater environment with events and frames,

    Y . Peng, Y . Hong, Z. Hong, A. P.-Y . Chui, and J. Wu, “AquaticVision: benchmarking visual SLAM in underwater environment with events and frames,”arXiv preprint arXiv:2505.03448, 2025

  15. [15]

    Event-dataset for underwater SLAM,

    J. H. Klasson, B. Sorensen, K. Brummenaes, and G. B. Ellingsen, “Event-dataset for underwater SLAM,” https: //github.com/OsloMet-OceanLab/underwater event dataset, 2023

  16. [16]

    Event-based circular detection for AUV docking based on spiking neural network,

    F. Zhang, Y . Zhong, L. Chen, and Z. Wang, “Event-based circular detection for AUV docking based on spiking neural network,”Frontiers in Neurorobotics, vol. 15, p. 815144, 2022

  17. [17]

    Millisecond-scale behaviours of plankton quantified in vitro and in situ using the event-based vision sensor,

    S. Takatsuka, N. Miyamoto, H. Sato, Y . Morino, Y . Kurita, A. Yabuki, C. Chen, and S. Kawagucci, “Millisecond-scale behaviours of plankton quantified in vitro and in situ using the event-based vision sensor,” Ecology and Evolution, vol. 14, no. 8, p. e70150, 2024

  18. [18]

    TransCODNet: underwater transparently camouflaged object detection via RGB and event frames collaboration,

    C. Luo, J. Wu, S. Sun, and P. Ren, “TransCODNet: underwater transparently camouflaged object detection via RGB and event frames collaboration,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1444–1451, 2024

  19. [19]

    Non-uniform illumination underwater image enhancement via events and frame fusion,

    X. Bi, P. Wang, T. Wu, F. Zha, and P. Xu, “Non-uniform illumination underwater image enhancement via events and frame fusion,”Applied Optics, vol. 61, no. 29, pp. 8826–8832, 2022

  20. [20]

    RGB/event signal fusion framework for multi-degraded underwater image enhancement,

    X. Bi, P. Wang, W. Guo, F. Zha, and L. Sun, “RGB/event signal fusion framework for multi-degraded underwater image enhancement,” Frontiers in Marine Science, vol. 11, p. 1366815, 2024

  21. [21]

    EV-FlowNet: self-supervised optical flow estimation for event-based cameras,

    A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “EV-FlowNet: self-supervised optical flow estimation for event-based cameras,” in Proceedings of Robotics: Science and Systems, 2018

  22. [22]

    Self-supervised learning of event-based optical flow with spiking neural networks,

    J. Hagenaars, F. Paredes-Vall ´es, and G. de Croon, “Self-supervised learning of event-based optical flow with spiking neural networks,” Advances in Neural Information Processing Systems, vol. 34, pp. 7167– 7179, 2021

  23. [23]

    Taming contrast maximization for learning sequential, low-latency, event-based optical flow,

    F. Paredes-Vall ´es, K. Y . Scheper, C. De Wagter, and G. C. De Croon, “Taming contrast maximization for learning sequential, low-latency, event-based optical flow,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 9661–9671

  24. [24]

    Networks of spiking neurons: the third generation of neural network models,

    W. Maass, “Networks of spiking neurons: the third generation of neural network models,”Neural Networks, vol. 10, no. 9, pp. 1659–1671, 1997

  25. [25]

    Adaptive-SpikeNet: event-based optical flow estimation using spiking neural networks with learnable neuronal dy- namics,

    A. K. Kosta and K. Roy, “Adaptive-SpikeNet: event-based optical flow estimation using spiking neural networks with learnable neuronal dy- namics,” inIEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 6021–6027

  26. [26]

    SU-YOLO: spiking neural network for efficient underwater object detection,

    C. Li, W. Liu, G. Gong, X. Ding, and X. Zhong, “SU-YOLO: spiking neural network for efficient underwater object detection,”Neurocomput- ing, vol. 644, p. 130310, 2025

  27. [27]

    Underwater image enhancement by convolutional spiking neural networks,

    V . Sudevan, F. Zayer, R. Kausar, S. Javed, H. Karki, G. De Masi, and J. Dias, “Underwater image enhancement by convolutional spiking neural networks,”arXiv preprint arXiv:2503.20485, 2025

  28. [28]

    The Bouguer–Beer–Lambert law: shining light on the obscure,

    T. G. Mayerh ¨ofer, S. Pahlow, and J. Popp, “The Bouguer–Beer–Lambert law: shining light on the obscure,”ChemPhysChem, vol. 21, no. 18, pp. 2029–2046, 2020

  29. [29]

    A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,

    G. Gallego, H. Rebecq, and D. Scaramuzza, “A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3867–3876

  30. [30]

    Focus is all you need: loss functions for event-based vision,

    G. Gallego, M. Gehrig, and D. Scaramuzza, “Focus is all you need: loss functions for event-based vision,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12 272–12 281

  31. [31]

    Neuromorphic imag- ing with joint image deblurring and event denoising,

    P. Zhang, H. Liu, Z. Ge, C. Wang, and E. Y . Lam, “Neuromorphic imag- ing with joint image deblurring and event denoising,”IEEE Transactions on Image Processing, vol. 33, pp. 2318–2333, 2024

  32. [32]

    Unsupervised event- based learning of optical flow, depth, and egomotion,

    A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event- based learning of optical flow, depth, and egomotion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 989–997

  33. [33]

    Lapicque’s introduction of the integrate-and-fire model neuron (1907),

    L. F. Abbott, “Lapicque’s introduction of the integrate-and-fire model neuron (1907),”Brain Research Bulletin, vol. 50, no. 5–6, pp. 303–304, 1999

  34. [34]

    Fast image reconstruction with an event camera,

    C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, and D. Scaramuzza, “Fast image reconstruction with an event camera,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 156–163

  35. [35]

    Delving Deeper into Convolutional Networks for Learning Video Representations

    N. Ballas, L. Yao, C. Pal, and A. Courville, “Delving deeper into con- volutional networks for learning video representations,”arXiv preprint arXiv:1511.06432, 2015

  36. [36]

    Adam: a method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” inInternational Conference on Learning Representations (ICLR), 2015

  37. [37]

    RAFT: recurrent all-pairs field transforms for optical flow,

    Z. Teed and J. Deng, “RAFT: recurrent all-pairs field transforms for optical flow,” inEuropean Conference on Computer Vision (ECCV), 2020, pp. 402–419

  38. [38]

    Reducing the sim-to-real gap for event cameras,

    T. Stoffregen, C. Scheerlinck, D. Scaramuzza, T. Drummond, N. Barnes, L. Kleeman, and R. Mahony, “Reducing the sim-to-real gap for event cameras,” inEuropean Conference on Computer Vision (ECCV), 2020, pp. 534–549

  39. [39]

    Spikformer: when spiking neural network meets transformer,

    Z. Zhou, Y . Zhu, C. He, Y . Wang, S. Y AN, Y . Tian, and L. Yuan, “Spikformer: when spiking neural network meets transformer,” inInter- national Conference on Learning Representations (ICLR), 2023