Style-Decoupled Adaptive Routing Network for Underwater Image Enhancement

Bing Wang; Chen Long; Hang Xu; Hao Chen; Zhen Dong

arxiv: 2604.12257 · v1 · submitted 2026-04-14 · 💻 cs.CV

Style-Decoupled Adaptive Routing Network for Underwater Image Enhancement

Hang Xu , Chen Long , Bing Wang , Hao Chen , Zhen Dong This is my paper

Pith reviewed 2026-05-10 14:46 UTC · model grok-4.3

classification 💻 cs.CV

keywords underwater image enhancementadaptive routingstyle decouplingdegradation style embeddingsscene structural representationsimage restorationneural network

0 comments

The pith

SDAR-Net separates degradation style from scene structure in underwater images and routes enhancement adaptively for each input.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current underwater enhancement methods apply one fixed mapping to every image, which over-corrects mild cases and under-corrects severe ones. The paper argues that degradation mostly alters appearance while leaving scene geometry unchanged, so features can be split into dynamic style embeddings and fixed structural representations. An adaptive router then reads the style, predicts soft weights, and blends the right combination of representations for that specific image. This produces a single network that handles the full range of real-world underwater conditions instead of averaging across them. The result matters for marine robots and sensors that need reliable vision across varying water conditions.

Core claim

SDAR-Net formulates input features into dynamic degradation style embeddings and static scene structural representations. It then applies an adaptive routing mechanism that evaluates the style embeddings to predict soft weights at different enhancement states and uses those weights to guide the fusion of the corresponding representations, satisfying the restoration needs of each image individually.

What carries the argument

The adaptive routing mechanism that evaluates style embeddings to predict soft weights for fusing dynamic degradation and static structural representations.

If this is right

Reaches 25.72 dB PSNR on real-world underwater benchmarks, exceeding prior methods.
Delivers appropriate enhancement levels instead of over-processing mild degradations or under-recovering severe ones.
Improves accuracy in downstream tasks such as object detection and segmentation on underwater data.
Allows one network to handle the full spectrum of degradation without dataset-specific retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same style-structure split might apply to other appearance-only degradations such as fog or low light.
Storing reusable structural representations could lower compute when processing image sequences from the same location.
Embedding the router in an underwater robot could enable on-the-fly adjustment without storing multiple enhancement models.

Load-bearing premise

Underwater degradation primarily shifts appearance while leaving the underlying scene structure unchanged, so the two can be cleanly separated.

What would settle it

On paired images of the identical scene captured under increasing degradation levels, the extracted structural representations would change noticeably or the adaptive method would show no gain over a uniform baseline.

Figures

Figures reproduced from arXiv: 2604.12257 by Bing Wang, Chen Long, Hang Xu, Hao Chen, Zhen Dong.

**Figure 1.** Figure 1: (a) Previous methods applying uniform enhancement strategies yield suboptimal results across various degradation degree. (b) Our approach adaptively modulates the enhancement process under the guidance of degradation patterns, addressing the edge-cases and enhancing overall performance. With the degradation patterns explicitly modeled, we proceed to modulate the enhancement process via an adaptive routing… view at source ↗

**Figure 2.** Figure 2: Overall framework of SDAR-Net. This framework consists of two parts: (a) Representation Decoupling; (b) Adaptive Trajectory Modulation [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Design of lightweight encoder (E) and decoder (D). Conv-in and Conv-out are convolutional layers, Grad denotes the gradient extraction operation and Down denotes downsample operation. degradation patterns, we draw inspiration from image style transfer and design a transfer loss to guide the modeling of degradations-related style. 3.1.1. Structure-preserving reconstruction To obtain robust image representat… view at source ↗

**Figure 4.** Figure 4: Design of SREU. REB denotes Representation Enhancement Block, SEB denotes Style Evolution Block. the decoupled degradation pattern and decoupled from scene structure, we proposed a dual strategy combining architecture constraints and loss constraints. In terms of architecture constraints, since the encoder has already anchored the scene structure within the image representation, we employ a decoupled arch… view at source ↗

**Figure 5.** Figure 5: Design of Ada-Route module. T denotes transpose, Proj-A and Proj-B denotes linear projection layer aiming to compress the information from Gram matrix. the candidate states, obtaining the final adaptively adjusted image representation Cw: Cw = ∑ 𝐾 𝑘=0 𝑤𝑘 C𝑘 . (11) Finally, we decode the adaptive adjusted image representation Cw to get the final enhancement result: 𝐼̂ w = D(Cw). (12) 3.2.2. Pseudo-labeling… view at source ↗

**Figure 6.** Figure 6: Qualitative comparison on UIEB dataset. We mark the output image with the corresponding PSNR. 4.2. Main Performance Quantitative comparisons with state-of-the-art methods are summarized in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Enhancement trajectory visualize with routing condition vectors. (a) represents the common cases that can be covered by state 1. (b) represents cases that need a deeper enhancement trajectory. (c) represents cases that only requires a small degree of enhancement. (d) represents that the optimal trajectory is between two candidate states. unified mapping, making it difficult to explicitly model the core cha… view at source ↗

**Figure 8.** Figure 8: Routing effect on mild degradation cases. We mark the output image with the corresponding PSNR and weighted results with its routing weights. S ∈ {S0 , S1 , S2 , Sgt}, we map its routing vector v ∈ {v0 , v1 , v2 , vgt} into a 2D coordinate (𝑥, 𝑦) using PCA. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Routing effect on severe degradation cases. We mark the output image with the corresponding PSNR and weighted results with its routing weights. models struggles to achieve adaptive effects. We first train a model ′ (⋅) = D(SREU(SREU(E(⋅)))) that recursively calls the SREU module twice to test whether the SREU module itself can adapt to different underwater degradations without explicit adjustment, which i… view at source ↗

read the original abstract

Underwater Image Enhancement (UIE) is essential for robust visual perception in marine applications. However, existing methods predominantly rely on uniform mapping tailored to average dataset distributions, leading to over-processing mildly degraded images or insufficient recovery for severe ones. To address this challenge, we propose a novel adaptive enhancement framework, SDAR-Net. Unlike existing uniform paradigms, it first decouples specific degradation styles from the input and subsequently modulates the enhancement process adaptively. Specifically, since underwater degradation primarily shifts the appearance while keeping the scene structure, SDAR-Net formulates image features into dynamic degradation style embeddings and static scene structural representations through a carefully designed training framework. Subsequently, we introduce an adaptive routing mechanism. By evaluating style features and adaptively predicting soft weights at different enhancement states, it guides the weighted fusion of the corresponding image representations, accurately satisfying the adaptive restoration demands of each image. Extensive experiments show that SDAR-Net achieves a new state-of-the-art (SOTA) performance with a PSNR of 25.72 dB on real-world benchmark, and demonstrates its utility in downstream vision tasks. Our code is available at https://github.com/WHU-USI3DV/SDAR-Net.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds explicit style-structure decoupling plus learned adaptive routing to handle varying underwater degradation levels, but the abstract gives almost no experimental backing for whether the separation works.

read the letter

The main new piece is the combination of pulling out a dynamic degradation style embedding while treating scene structure as static, then using a router to predict soft weights and fuse representations accordingly. Most prior UIE networks apply one fixed mapping across the dataset, so conditioning the enhancement on per-image style is a clear step beyond that uniform approach. The motivation is also solid: uniform methods over-process mild cases and under-recover severe ones, and the routing idea directly targets that mismatch. The abstract reports a 25.72 dB PSNR on real-world data plus gains on downstream tasks, which would matter for marine vision if the numbers hold up under scrutiny. The soft spots are straightforward. We have no information on training splits, baseline implementations, ablations for the decoupling or routing modules, or any statistical checks. More critically, the claim that degradation only shifts appearance while leaving structure intact is asserted without supporting evidence such as feature invariance tests or controlled multi-degradation comparisons on the same scene. If scattering and absorption also degrade edges and texture in a content-dependent way, the static representations will carry style information and the routing will not function as intended. This is aimed at researchers working on domain-specific restoration or adaptive networks in computer vision. A reader who wants to test whether per-image routing beats uniform enhancement in underwater settings would get something useful from the architecture description. I would send it to peer review because the core mechanism is described clearly enough for referees to evaluate and improve the experiments, even though the current version needs substantial added detail and validation.

Referee Report

2 major / 1 minor

Summary. The paper proposes SDAR-Net, an adaptive framework for underwater image enhancement that first decouples input features into dynamic degradation style embeddings and static scene structural representations (based on the premise that degradation primarily affects appearance while preserving structure), then applies an adaptive routing mechanism to predict soft weights and fuse representations for per-image enhancement. It reports achieving SOTA performance with a PSNR of 25.72 dB on real-world benchmarks and improved results on downstream vision tasks, with code released.

Significance. If the decoupling premise and empirical gains hold under rigorous validation, the work could meaningfully advance UIE by replacing uniform mappings with style-aware adaptive routing, improving robustness across varying degradation severities in marine applications. The provision of code is a positive for reproducibility.

major comments (2)

Abstract and §1: The load-bearing premise that 'underwater degradation primarily shifts the appearance while keeping the scene structure' is asserted without supporting evidence such as feature invariance metrics, controlled same-scene multi-degradation experiments, or visualizations showing that structural representations remain degradation-free; this directly underpins the style embeddings, static representations, and subsequent adaptive routing, so its unverified status risks rendering the fusion ineffective if scattering/absorption also degrades edges/contrast in a content-dependent manner.
Experiments section: The central SOTA claim (PSNR 25.72 dB on real-world benchmark) and downstream gains are presented without reported details on training/validation splits, exact baseline re-implementations, statistical significance tests, or ablation studies isolating the style-decoupling and routing components; this absence makes it impossible to assess whether the performance edge is robust or attributable to the proposed architecture.

minor comments (1)

Notation: The distinction between 'dynamic degradation style embeddings' and 'static scene structural representations' is introduced without a formal definition or diagram clarifying how they are extracted in the training framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments have identified important areas where additional evidence and reporting will strengthen the presentation of our work. We address each major comment below and describe the revisions we will implement.

read point-by-point responses

Referee: [—] Abstract and §1: The load-bearing premise that 'underwater degradation primarily shifts the appearance while keeping the scene structure' is asserted without supporting evidence such as feature invariance metrics, controlled same-scene multi-degradation experiments, or visualizations showing that structural representations remain degradation-free; this directly underpins the style embeddings, static representations, and subsequent adaptive routing, so its unverified status risks rendering the fusion ineffective if scattering/absorption also degrades edges/contrast in a content-dependent manner.

Authors: We appreciate the referee highlighting the need for explicit validation of this premise. While the assumption aligns with the physical model of underwater imaging (absorption and scattering primarily alter color and contrast rather than geometric structure), we agree that empirical support is currently insufficient. In the revised manuscript, we will add: (1) quantitative feature invariance metrics (e.g., similarity scores of structural embeddings under simulated degradation), (2) controlled experiments using same-scene synthetic pairs with varying degradation levels, and (3) visualizations demonstrating preservation of edges and semantic structure. These will be incorporated into the introduction and experiments sections to better substantiate the decoupling strategy. revision: yes
Referee: [—] Experiments section: The central SOTA claim (PSNR 25.72 dB on real-world benchmark) and downstream gains are presented without reported details on training/validation splits, exact baseline re-implementations, statistical significance tests, or ablation studies isolating the style-decoupling and routing components; this absence makes it impossible to assess whether the performance edge is robust or attributable to the proposed architecture.

Authors: We acknowledge that the current experimental reporting lacks sufficient detail for full reproducibility and robustness assessment. In the revised manuscript, we will expand the Experiments section to include: explicit descriptions of all training/validation/test splits, precise re-implementation details for baselines (including hyperparameters and any modifications), statistical significance tests (e.g., p-values from multiple runs), and comprehensive ablation studies isolating the style-decoupling and adaptive routing components. Standard deviations will also be reported for key metrics. These additions will enable clearer evaluation of the claimed performance gains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture with independent training and evaluation

full rationale

The paper proposes SDAR-Net as a neural network architecture that decouples degradation styles from scene structure under an explicit premise, then applies adaptive routing for fusion. All performance claims (e.g., 25.72 dB PSNR) are presented as outcomes of training on benchmarks and downstream task evaluation, not as quantities that reduce by the paper's own equations or definitions to fitted inputs. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify the core separation or routing; the framework is self-contained as a proposed model with a described training procedure. The decoupling premise is an assumption, not a derived result that loops back on itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about the nature of underwater degradation and on standard neural-network training; no additional free parameters or invented entities are introduced beyond learned network weights.

axioms (1)

domain assumption Underwater degradation primarily shifts the appearance while keeping the scene structure
Invoked to justify formulating features into dynamic style embeddings and static structural representations.

pith-pipeline@v0.9.0 · 5508 in / 1132 out tokens · 56732 ms · 2026-05-10T14:46:58.291349+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

IEEE Transactions on Image Processing 32, 4472–4485

Pugan: Physical model-guided underwater image enhancement using gan with dual-discriminators. IEEE Transactions on Image Processing 32, 4472–4485. Drews, P., Nascimento, E., Moraes, F., Botelho, S., Campos, M., 2013. Transmission estimation in underwater single images, in: Proceedings oftheIEEEinternationalconferenceoncomputervisionworkshops,pp. 825–830. ...

work page 2013
[2]

Underwater image restoration via polymorphic large kernel cnns, in: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. pp. 1–5. Han,J.,Shoeiby,M.,Malthus,T.,Botha,E.,Anstee,J.,Anwar,S.,Wei,R., Armin,M.A.,Li,H.,Petersson,L.,2022. Underwaterimagerestoration via contrastive learning and a real-world data...

work page arXiv 2025

[1] [1]

IEEE Transactions on Image Processing 32, 4472–4485

Pugan: Physical model-guided underwater image enhancement using gan with dual-discriminators. IEEE Transactions on Image Processing 32, 4472–4485. Drews, P., Nascimento, E., Moraes, F., Botelho, S., Campos, M., 2013. Transmission estimation in underwater single images, in: Proceedings oftheIEEEinternationalconferenceoncomputervisionworkshops,pp. 825–830. ...

work page 2013

[2] [2]

Underwater image restoration via polymorphic large kernel cnns, in: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. pp. 1–5. Han,J.,Shoeiby,M.,Malthus,T.,Botha,E.,Anstee,J.,Anwar,S.,Wei,R., Armin,M.A.,Li,H.,Petersson,L.,2022. Underwaterimagerestoration via contrastive learning and a real-world data...

work page arXiv 2025