PRISM: Latent Composition Consistency for Single-Image Reflection Removal

Junseong Shin; Tae Hyun Kim

arxiv: 2606.31513 · v1 · pith:4RUZFHH6new · submitted 2026-06-30 · 💻 cs.CV

PRISM: Latent Composition Consistency for Single-Image Reflection Removal

Junseong Shin , Tae Hyun Kim This is my paper

Pith reviewed 2026-07-01 05:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords single-image reflection removallatent space decompositionflow matchingVAE latent spacecomposition consistencyimage layer separationcontrastive learning

0 comments

The pith

PRISM reframes single-image reflection removal as linear separation in pretrained VAE latent space, recovering transmission and reflection layers via flow matching with consistency losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Single-image reflection removal tries to untangle a main scene from overlaid reflections, but pixel-space methods struggle because the nonlinear image formation process entangles the layers tightly. The paper observes that pretrained VAE latent spaces show much lower coherence between layers, allowing an approximate additive model to work better for decomposition. PRISM builds a flow-matching velocity field on a FLUX backbone to recover both layers in one pass and adds a Latent Composition Consistency strategy that swaps reflection latents across samples to enforce cycle-consistent separation. It also uses a Layer Contrastive Separation loss for semantic disentanglement without needing explicit reflection ground truth. Experiments across six benchmarks show consistent gains and stronger performance on real-world images.

Core claim

By moving the decomposition into the latent space of a pretrained VAE and treating the mixture as approximately additive, a flow-matching model on a FLUX backbone can jointly recover transmission and reflection layers; the Latent Composition Consistency strategy (swapping reflection latents and applying cycle loss) and Layer Contrastive Separation loss together enforce robust disentanglement without explicit reflection targets.

What carries the argument

Pretrained VAE latent space treated as an approximate additive decomposition domain, with a flow-matching velocity field on FLUX backbone, enforced by Latent Composition Consistency (LCC) via latent swapping and cycle loss plus Layer Contrastive Separation (LCS) via patch-level contrastive learning.

If this is right

Both transmission and reflection layers are recovered in a single forward pass without separate networks.
The method requires no explicit reflection targets thanks to the contrastive loss.
Performance gains hold across six standard benchmarks with improved generalization to uncontrolled images.
The latent-space formulation avoids the nonlinear entanglement of sRGB pixel formation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent-space additive assumption and consistency losses could transfer to other single-image layer-separation tasks such as shadow removal or intrinsic image decomposition.
If VAE latents from different backbones show similar layer incoherence, the approach may generalize beyond the FLUX model used here.
The cycle-consistency mechanism via latent swapping offers a template for self-supervised training in other ill-posed inverse problems where paired data are scarce.

Load-bearing premise

Pretrained VAE latent spaces have substantially lower coherence between transmission and reflection layers than pixel space, so an approximate additive formulation works for decomposition.

What would settle it

A controlled test on the same six benchmarks where PRISM fails to exceed the prior state-of-the-art quantitative scores or shows no improvement on in-the-wild images would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.31513 by Junseong Shin, Tae Hyun Kim.

**Figure 1.** Figure 1: (a) Cosine similarity between transmission and reflection in pixel space vs. FLUX VAE [19] latent space on 454 pairs from SIR2 [37]. All points lie below the diagonal, confirming consistently lower coherence in latent space. (b) Latent swap results: composing zˆT with reflections zˆ 1 R, zˆ 2 R separated from different images and decoding via the VAE decoder D yields realistic mixtures that preserve trans… view at source ↗

**Figure 2.** Figure 2: Overview of PRISM. (a) MM-DiT predicts vθ from zI = E(I) (VAE encoder E, decoder D), yielding zˆT = zI + vθ and zˆR = −vθ in a single pass; both are decoded by D to produce Tˆ and Rˆ. (b) Reflection latents are cyclically swapped across batch samples to form synthetic mixtures z˜I ; a second forward pass enforces consistent decomposition (Lcycle). (c) Patch-pooled features of the cycle-recovered transmiss… view at source ↗

**Figure 3.** Figure 3: Visual comparison of different methods on multiple datasets. From top to bottom: Real, Nature, Postcard, Object, and Wild datasets. Zoom in for better visualization of detailed differences. human visual judgment, as well as NIQE [29] as a no-reference image quality metric. 4.2 Comparison with State-of-the-Art Methods We compare PRISM against ten state-of-the-art SIRR approaches: ERRNet [41], IBCLN [20], L… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of estimated reflection layers. The top row shows a solidobject scene and the bottom row shows a wild scene, both from the SIR2 dataset [37]. Best viewed zoomed in. in Tab. 2, where PRISM achieves substantially better perceptual scores (LPIPS −14.0%, DISTS −14.6% relative to RDNet) despite slightly lower SSIM — metrics known to correlate more strongly with human visual judgment [4,… view at source ↗

**Figure 5.** Figure 5: Visual comparison on the OpenRR 1K test set [46]. All methods are evaluated without training on this dataset. While SSIM is lower than RDNet, PRISM outperforms all baselines on perceptual metrics (LPIPS −14.0%, DISTS −14.6% relative to RDNet), consistent with the perception-distortion tradeoff discussed above. Tab. 2 also reports the number of parameters and inference time. Despite having more parameters,… view at source ↗

**Figure 6.** Figure 6: Progressive ablation of each loss component. Lrecon:=Llatent+Lpixel. Each column adds one loss on top of the previous setting. 4.3 Ablation Study We conduct ablation studies on Real(20), Nature(20), and SIR2 (454) to analyze each proposed component. Effect of Lcycle and LLCS. Tab. 3 isolates the contribution of each disentanglement objective. The base model (row 1) uses only Llatent and Lpixel, confirmin… view at source ↗

**Figure 7.** Figure 7: Failure case: dense and saturated reflection on SIR2 [37]. From left to right: input I, RDNet [49], DAI [11], Ours, and ground truth T. the transmission via a velocity field in a single forward pass, without dual-branch architectures or a dedicated pixel-space reflection loss. To enforce robust disentanglement, PRISM introduces a Latent Composition Consistency strategy — uniquely enabled by the broad deco… view at source ↗

read the original abstract

Single-image reflection removal (SIRR) seeks to recover the transmission layer from a mixture corrupted by reflections -- a severely ill-posed problem. Existing methods operate in pixel space, where the nonlinear sRGB formation model entangles the two layers and limits generalization. We observe that pretrained VAE latent spaces exhibit substantially lower coherence between image layers compared to pixel space, providing a more favorable working space for decomposition. Building on this finding, we propose \textbf{PRISM} (Pretrained-latent Reflection Image Separation Model), which reinterprets SIRR as a latent linear separation problem. Under an approximate additive formulation in latent space, PRISM learns a flow matching velocity field on a pretrained FLUX backbone that recovers both transmission and reflection in a single forward pass. To enforce robust disentanglement, we introduce a Latent Composition Consistency (LCC) strategy that constructs synthetic mixtures by swapping reflection latents across samples and enforces consistent decomposition via a cycle loss. We further propose a Layer Contrastive Separation (LCS) loss that promotes semantic separation between layers through patch-level contrastive learning, without requiring explicit reflection targets. Experiments on six benchmarks demonstrate that PRISM consistently outperforms state-of-the-art methods by significant margins, with strong generalization to in-the-wild images.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PRISM moves reflection removal into FLUX VAE latents with flow matching plus LCC and LCS losses, but the approximate additivity premise still needs direct checks.

read the letter

The paper reinterprets single-image reflection removal as latent linear separation on a pretrained FLUX VAE, then trains a flow-matching velocity field to recover both layers in one pass. It adds a Latent Composition Consistency cycle loss that swaps reflection latents across samples and a Layer Contrastive Separation term that pushes semantic separation at patch level without needing reflection ground truth.

What stands out is the observation that VAE latents show lower layer coherence than pixel space, which could make decomposition easier. The claimed gains across six benchmarks and better behavior on in-the-wild images are the concrete results offered.

The stress-test concern holds weight. The VAE encoder is nonlinear, so nothing automatically guarantees that the latent of a mixture sits close to the sum of the separate transmission and reflection latents. The paper notes lower coherence but does not appear to quantify the actual additivity error on real composites. If that error is material, the flow field ends up learning a nonlinear correction and the new losses rest on an unverified premise rather than a clean linear advantage.

Training details, ablations, and error analysis are not visible in the abstract, so it is hard to judge how much of the reported margin comes from the latent space itself versus the backbone or the loss terms. The approach is distinct enough from prior pixel-space methods to be worth checking.

This is for researchers focused on reflection removal and related restoration tasks. A reader in that area could extract useful ideas if the numbers and the additivity checks hold up under review. It deserves a serious referee.

Referee Report

3 major / 2 minor

Summary. The paper proposes PRISM, which reinterprets single-image reflection removal (SIRR) as latent linear separation under an approximate additive model in the pretrained FLUX VAE latent space. It trains a flow-matching velocity field on the FLUX backbone to recover transmission and reflection layers in one pass, augmented by a Latent Composition Consistency (LCC) cycle loss that swaps reflection latents across synthetic mixtures and a Layer Contrastive Separation (LCS) patch-level contrastive loss. Experiments on six benchmarks are reported to show consistent outperformance of prior SOTA methods with improved in-the-wild generalization.

Significance. If the central premise that pretrained VAE latents exhibit sufficiently low layer coherence to support an approximate additive decomposition holds and is validated, the work could meaningfully shift SIRR methods away from pixel-space nonlinear entanglement toward more favorable latent representations, with potential benefits for generalization. The use of a pretrained generative backbone and the introduction of LCC/LCS losses without explicit reflection targets are notable design choices that merit evaluation.

major comments (3)

[§3] §3 (Method), around the latent additive formulation: the claim that 'pretrained VAE latent spaces exhibit substantially lower coherence between image layers' and thereby enable an 'approximate additive formulation' is load-bearing for reinterpreting SIRR as latent linear separation, yet the manuscript provides no direct quantification (e.g., mean ||E(I) − E(T) − E(R)||_2 or cosine similarity statistics) comparing additivity error in VAE latent space versus pixel space on the training or benchmark mixtures. Without this measurement the flow-matching objective may be learning a nonlinear correction rather than exploiting linearity, undermining the stated advantage.
[§4] §4 (Experiments), Table 1 or equivalent benchmark table: the reported 'significant margins' over SOTA are presented without accompanying error bars, statistical significance tests, or ablation isolating the contribution of the latent-space assumption versus the LCC/LCS losses; if the additivity error is large, these margins could be attributable to the flow-matching architecture alone rather than the claimed latent decomposition.
[§3.3] §3.3 (LCC strategy): the cycle loss is defined on synthetic mixtures created by swapping reflection latents, but the manuscript does not report the distribution of the resulting additivity residuals or verify that the swapped latents remain within the support of the VAE decoder; large residuals would make the consistency enforcement operate on an unverified premise.

minor comments (2)

[§3] Notation for the VAE encoder E and the flow-matching velocity field should be introduced with explicit functional forms (e.g., v_θ(z_t, t)) at first use to avoid ambiguity when discussing the approximate linearity.
[§4] The abstract states 'strong generalization to in-the-wild images' but the corresponding qualitative results section would benefit from a failure-case analysis or quantitative metric on a held-out wild set rather than selected visuals.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the empirical grounding of our claims.

read point-by-point responses

Referee: [§3] §3 (Method), around the latent additive formulation: the claim that 'pretrained VAE latent spaces exhibit substantially lower coherence between image layers' and thereby enable an 'approximate additive formulation' is load-bearing for reinterpreting SIRR as latent linear separation, yet the manuscript provides no direct quantification (e.g., mean ||E(I) − E(T) − E(R)||_2 or cosine similarity statistics) comparing additivity error in VAE latent space versus pixel space on the training or benchmark mixtures. Without this measurement the flow-matching objective may be learning a nonlinear correction rather than exploiting linearity, undermining the stated advantage.

Authors: We agree that direct quantification of additivity error is important to support the central premise. In the revised manuscript we will add explicit measurements (mean L2 residuals and cosine similarities) comparing additivity error in FLUX VAE latent space versus pixel space on both the synthetic training mixtures and the benchmark data. This will clarify the degree to which the flow-matching objective exploits approximate linearity. revision: yes
Referee: [§4] §4 (Experiments), Table 1 or equivalent benchmark table: the reported 'significant margins' over SOTA are presented without accompanying error bars, statistical significance tests, or ablation isolating the contribution of the latent-space assumption versus the LCC/LCS losses; if the additivity error is large, these margins could be attributable to the flow-matching architecture alone rather than the claimed latent decomposition.

Authors: We acknowledge that error bars, statistical tests, and component ablations are needed for rigorous attribution. The revision will include standard deviations across multiple training runs, paired statistical significance tests against prior methods, and an ablation table that isolates the latent-space formulation from the LCC and LCS losses. revision: yes
Referee: [§3.3] §3.3 (LCC strategy): the cycle loss is defined on synthetic mixtures created by swapping reflection latents, but the manuscript does not report the distribution of the resulting additivity residuals or verify that the swapped latents remain within the support of the VAE decoder; large residuals would make the consistency enforcement operate on an unverified premise.

Authors: We will add to §3.3 the distribution (mean, std, quantiles) of additivity residuals for the swapped-latent mixtures and will verify decoder support by reporting reconstruction PSNR/SSIM of the swapped latents. These statistics will confirm that the LCC cycle operates on mixtures consistent with the VAE's learned manifold. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with observational modeling choice

full rationale

The paper presents PRISM as a data-driven architecture that reinterprets SIRR via an observational modeling choice (lower layer coherence in pretrained VAE latents) and evaluates it empirically on six benchmarks. No derivation chain, equations, or fitted parameters are shown that reduce any claimed result to its own inputs by construction. The approximate additive formulation is stated as an assumption based on observation rather than derived or self-referential. No self-citations are load-bearing in the provided text. This is a standard non-circular empirical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claim rests on the unverified observation that VAE latents have lower layer coherence and admit an approximate additive model; no free parameters, axioms, or invented entities are detailed in the abstract.

axioms (2)

domain assumption Pretrained VAE latent spaces exhibit substantially lower coherence between image layers compared to pixel space
Stated as the key observation enabling the latent linear separation approach.
domain assumption An approximate additive formulation holds in latent space
Required to reinterpret SIRR as a latent linear separation problem.

pith-pipeline@v0.9.1-grok · 5749 in / 1138 out tokens · 22469 ms · 2026-07-01T05:57:26.926170+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 7 canonical work pages · 7 internal anchors

[1]

In: CVPR (2018)

Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: CVPR (2018)

2018
[2]

IEEE Transactions on Cybernetics (2020)

Chang, Y., Jung, C., Sun, J.: Joint reflection removal and depth estimation from a single image. IEEE Transactions on Cybernetics (2020)

2020
[3]

In: ICML (2020)

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: ICML (2020)

2020
[4]

IEEE TPAMI (2020)

Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: Unifying structure and texture similarity. IEEE TPAMI (2020)

2020
[5]

In: ICCV (2021)

Dong, Z., Xu, K., Yang, Y., Bao, H., Xu, W., Lau, R.W.: Location-aware single image reflection removal. In: ICCV (2021)

2021
[6]

In: ICML (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: ICML (2024)

2024
[7]

IJCV88(2), 303–338 (2010)

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PAS- CAL visual object classes (VOC) challenge. IJCV88(2), 303–338 (2010)

2010
[8]

In: ICCV (2017)

Fan, Q., Yang, J., Hua, G., Chen, B., Wipf, D.: A generic deep architecture for single image reflection removal and image smoothing. In: ICCV (2017)

2017
[9]

In: CVPR (2020)

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

2020
[10]

NeurIPS (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS (2020)

2020
[11]

In: AAAI (2026)

Hu, J., Yang, C., Zhou, Z., Fang, J., Yang, X., Tian, Q., Shen, W.: Dereflection any image with diffusion priors and diversified data. In: AAAI (2026)

2026
[12]

NeurIPS (2021)

Hu, Q., Guo, X.: Trash or treasure? an interactive dual-stream strategy for single image reflection separation. NeurIPS (2021)

2021
[13]

In: CVPR (2023)

Hu, Q., Guo, X.: Single image reflection separation via component synergy. In: CVPR (2023)

2023
[14]

NeurIPS (2024)

Hu, Q., Wang, H., Guo, X.: Single image reflection separation via dual-stream interactive transformers. NeurIPS (2024)

2024
[15]

In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (2023)

Im, E.W., Shin, J., Baik, S., Kim, T.H.: Deep variational bayesian modeling of haze degradation process. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (2023)

2023
[16]

In: CVPR (2025)

Kee, E., Pikielny, A., Blackburn-Matzen, K., Levoy, M.: Removing reflections from raw photos. In: CVPR (2025)

2025
[17]

In: CVPR (2020)

Kim, S., Huo, Y., Yoon, S.E.: Single image reflection removal with physically-based training images. In: CVPR (2020)

2020
[18]

Auto-Encoding Variational Bayes

Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[19]

Labs, B.F.: Flux 2 klein 4b.https://blackforestlabs.ai(2025)

2025
[20]

In: CVPR (2020)

Li, C., Yang, Y., He, K., Lin, S., Hopcroft, J.E.: Single image reflection removal through cascaded refinement. In: CVPR (2020)

2020
[21]

In: CVPR (2014)

Li, Y., Brown, M.S.: Single image layer separation using relative smoothness. In: CVPR (2014)

2014
[22]

In: ECCV (2024)

Lin, X., He, J., Chen, Z., Lyu, Z., Dai, B., Yu, F., Qiao, Y., Ouyang, W., Dong, C.: Diffbir: Toward blind image restoration with generative diffusion prior. In: ECCV (2024)

2024
[23]

Flow Matching for Generative Modeling

Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022) Title Suppressed Due to Excessive Length 17

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[25]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

In: ICLR (2017)

Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. In: ICLR (2017)

2017
[27]

In: ICML (2023)

Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Image restoration with mean-reverting stochastic differential equations. In: ICML (2023)

2023
[28]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[29]

completely blind

Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Sign. Process. Letters (2012)

2012
[30]

IJCV (1997)

Nayar, S.K., Fang, X.S., Boult, T.: Separation of reflection components using color and polarization. IJCV (1997)

1997
[31]

Representation Learning with Contrastive Predictive Coding

Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predic- tive coding. arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

In: ECCV (2020)

Park, T., Efros, A.A., Zhang, R., Zhu, J.Y.: Contrastive learning for unpaired image-to-image translation. In: ECCV (2020)

2020
[33]

In: CVPR (2022)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

2022
[34]

In: CVPR (2015)

Shih,Y.,Krishnan,D.,Durand,F.,Freeman,W.T.:Reflectionremovalusingghost- ing cues. In: CVPR (2015)

2015
[35]

In: ICCV (2025)

Shin, J., Chung, S., Yang, Y., Kim, T.H.: Hazeflow: Revisit haze physical model as ode and non-homogeneous haze generation for real-world dehazing. In: ICCV (2025)

2025
[36]

In: ICML (2023)

Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models. In: ICML (2023)

2023
[37]

In: ICCV (2017)

Wan, R., Shi, B., Duan, L.Y., Tan, A.H., Kot, A.C.: Benchmarking single-image reflection removal algorithms. In: ICCV (2017)

2017
[38]

IJCV (2021)

Wan, R., Shi, B., Li, H., Duan, L.Y., Kot, A.C.: Face image reflection removal. IJCV (2021)

2021
[39]

NeurIPS (2024)

Wang, L., Cheng, K., Lei, S., Wang, S., Yin, W., Lei, C., Long, X., Lu, C.T.: Dc- gaussian: Improving 3d gaussian splatting for reflective dash cam videos. NeurIPS (2024)

2024
[40]

IEEE Trans

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–612 (2004)

2004
[41]

In: CVPR (2019)

Wei, K., Yang, J., Fu, Y., Wipf, D., Huang, H.: Single image reflection removal exploiting misaligned training data and network enhancements. In: CVPR (2019)

2019
[42]

Qwen-Image Technical Report

Wu, C., Li, J., Zhou, J., Lin, J., et al.: Qwen-image technical report. arXiv preprint arXiv:2508.02324 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

In: AAAI (2024)

Wu, G., Jiang, J., Jiang, K., Liu, X.: Learning from history: Task-agnostic model contrastive learning for image restoration. In: AAAI (2024)

2024
[44]

In: CVPR (2021)

Wu, H., Qu, Y., Lin, S., Zhou, J., Qiao, R., Zhang, Z., Xie, Y., Ma, L.: Contrastive learning for compact single image dehazing. In: CVPR (2021)

2021
[45]

In: ICCV (2023)

Xia, B., Zhang, Y., Wang, S., Wang, Y., Wu, X., Tian, Y., Yang, W., Van Gool, L.: Diffir: Efficient diffusion model for image restoration. In: ICCV (2023)

2023
[46]

In: ICIP

Yang, K., Ouyang, L., Sun, H., Cai, J., Fu, L., Ding, J., Ho, C.M., Meng, Z.: Openrr-1k: A scalable dataset for real-world reflection removal. In: ICIP. IEEE (2025) 18 J. Shin and T.H. Kim

2025
[47]

In: CVPR (2018)

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

2018
[48]

In: CVPR (2018)

Zhang, X., Ng, R., Chen, Q.: Single image reflection separation with perceptual losses. In: CVPR (2018)

2018
[49]

In: CVPR (2025)

Zhao, H., Li, M., Hu, Q., Guo, X.: Reversible decoupling network for single image reflection removal. In: CVPR (2025)

2025
[50]

In: CVPR (2021)

Zheng, Q., Shi, B., Chen, J., Jiang, X., Duan, L.Y., Kot, A.C.: Single image reflec- tion removal with absorption effect. In: CVPR (2021)

2021
[51]

In: CVPR (2024)

Zhu, Y., Zhao, W., Li, A., Tang, Y., Zhou, J., Lu, J.: Flowie: Efficient image enhancement via rectified flow. In: CVPR (2024)

2024
[52]

Lpixel only

Zhu, Y., Fu, X., Jiang, P.T., Zhang, H., Sun, Q., Chen, J., Zha, Z.J., Li, B.: Revisiting single image reflection removal in the wild. In: CVPR (2024) Title Suppressed Due to Excessive Length 19 PRISM: Latent Composition Consistency for Single-Image Reflection Removal Supplementary Material Overview This supplementary material provides additional experime...

2024

[1] [1]

In: CVPR (2018)

Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: CVPR (2018)

2018

[2] [2]

IEEE Transactions on Cybernetics (2020)

Chang, Y., Jung, C., Sun, J.: Joint reflection removal and depth estimation from a single image. IEEE Transactions on Cybernetics (2020)

2020

[3] [3]

In: ICML (2020)

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: ICML (2020)

2020

[4] [4]

IEEE TPAMI (2020)

Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: Unifying structure and texture similarity. IEEE TPAMI (2020)

2020

[5] [5]

In: ICCV (2021)

Dong, Z., Xu, K., Yang, Y., Bao, H., Xu, W., Lau, R.W.: Location-aware single image reflection removal. In: ICCV (2021)

2021

[6] [6]

In: ICML (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: ICML (2024)

2024

[7] [7]

IJCV88(2), 303–338 (2010)

Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PAS- CAL visual object classes (VOC) challenge. IJCV88(2), 303–338 (2010)

2010

[8] [8]

In: ICCV (2017)

Fan, Q., Yang, J., Hua, G., Chen, B., Wipf, D.: A generic deep architecture for single image reflection removal and image smoothing. In: ICCV (2017)

2017

[9] [9]

In: CVPR (2020)

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

2020

[10] [10]

NeurIPS (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS (2020)

2020

[11] [11]

In: AAAI (2026)

Hu, J., Yang, C., Zhou, Z., Fang, J., Yang, X., Tian, Q., Shen, W.: Dereflection any image with diffusion priors and diversified data. In: AAAI (2026)

2026

[12] [12]

NeurIPS (2021)

Hu, Q., Guo, X.: Trash or treasure? an interactive dual-stream strategy for single image reflection separation. NeurIPS (2021)

2021

[13] [13]

In: CVPR (2023)

Hu, Q., Guo, X.: Single image reflection separation via component synergy. In: CVPR (2023)

2023

[14] [14]

NeurIPS (2024)

Hu, Q., Wang, H., Guo, X.: Single image reflection separation via dual-stream interactive transformers. NeurIPS (2024)

2024

[15] [15]

In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (2023)

Im, E.W., Shin, J., Baik, S., Kim, T.H.: Deep variational bayesian modeling of haze degradation process. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (2023)

2023

[16] [16]

In: CVPR (2025)

Kee, E., Pikielny, A., Blackburn-Matzen, K., Levoy, M.: Removing reflections from raw photos. In: CVPR (2025)

2025

[17] [17]

In: CVPR (2020)

Kim, S., Huo, Y., Yoon, S.E.: Single image reflection removal with physically-based training images. In: CVPR (2020)

2020

[18] [18]

Auto-Encoding Variational Bayes

Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[19] [19]

Labs, B.F.: Flux 2 klein 4b.https://blackforestlabs.ai(2025)

2025

[20] [20]

In: CVPR (2020)

Li, C., Yang, Y., He, K., Lin, S., Hopcroft, J.E.: Single image reflection removal through cascaded refinement. In: CVPR (2020)

2020

[21] [21]

In: CVPR (2014)

Li, Y., Brown, M.S.: Single image layer separation using relative smoothness. In: CVPR (2014)

2014

[22] [22]

In: ECCV (2024)

Lin, X., He, J., Chen, Z., Lyu, Z., Dai, B., Yu, F., Qiao, Y., Ouyang, W., Dong, C.: Diffbir: Toward blind image restoration with generative diffusion prior. In: ECCV (2024)

2024

[23] [23]

Flow Matching for Generative Modeling

Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022) Title Suppressed Due to Excessive Length 17

work page internal anchor Pith review Pith/arXiv arXiv 2022

[24] [24]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[25] [25]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

In: ICLR (2017)

Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. In: ICLR (2017)

2017

[27] [27]

In: ICML (2023)

Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Image restoration with mean-reverting stochastic differential equations. In: ICML (2023)

2023

[28] [28]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[29] [29]

completely blind

Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Sign. Process. Letters (2012)

2012

[30] [30]

IJCV (1997)

Nayar, S.K., Fang, X.S., Boult, T.: Separation of reflection components using color and polarization. IJCV (1997)

1997

[31] [31]

Representation Learning with Contrastive Predictive Coding

Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predic- tive coding. arXiv preprint arXiv:1807.03748 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

In: ECCV (2020)

Park, T., Efros, A.A., Zhang, R., Zhu, J.Y.: Contrastive learning for unpaired image-to-image translation. In: ECCV (2020)

2020

[33] [33]

In: CVPR (2022)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

2022

[34] [34]

In: CVPR (2015)

Shih,Y.,Krishnan,D.,Durand,F.,Freeman,W.T.:Reflectionremovalusingghost- ing cues. In: CVPR (2015)

2015

[35] [35]

In: ICCV (2025)

Shin, J., Chung, S., Yang, Y., Kim, T.H.: Hazeflow: Revisit haze physical model as ode and non-homogeneous haze generation for real-world dehazing. In: ICCV (2025)

2025

[36] [36]

In: ICML (2023)

Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models. In: ICML (2023)

2023

[37] [37]

In: ICCV (2017)

Wan, R., Shi, B., Duan, L.Y., Tan, A.H., Kot, A.C.: Benchmarking single-image reflection removal algorithms. In: ICCV (2017)

2017

[38] [38]

IJCV (2021)

Wan, R., Shi, B., Li, H., Duan, L.Y., Kot, A.C.: Face image reflection removal. IJCV (2021)

2021

[39] [39]

NeurIPS (2024)

Wang, L., Cheng, K., Lei, S., Wang, S., Yin, W., Lei, C., Long, X., Lu, C.T.: Dc- gaussian: Improving 3d gaussian splatting for reflective dash cam videos. NeurIPS (2024)

2024

[40] [40]

IEEE Trans

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–612 (2004)

2004

[41] [41]

In: CVPR (2019)

Wei, K., Yang, J., Fu, Y., Wipf, D., Huang, H.: Single image reflection removal exploiting misaligned training data and network enhancements. In: CVPR (2019)

2019

[42] [42]

Qwen-Image Technical Report

Wu, C., Li, J., Zhou, J., Lin, J., et al.: Qwen-image technical report. arXiv preprint arXiv:2508.02324 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

In: AAAI (2024)

Wu, G., Jiang, J., Jiang, K., Liu, X.: Learning from history: Task-agnostic model contrastive learning for image restoration. In: AAAI (2024)

2024

[44] [44]

In: CVPR (2021)

Wu, H., Qu, Y., Lin, S., Zhou, J., Qiao, R., Zhang, Z., Xie, Y., Ma, L.: Contrastive learning for compact single image dehazing. In: CVPR (2021)

2021

[45] [45]

In: ICCV (2023)

Xia, B., Zhang, Y., Wang, S., Wang, Y., Wu, X., Tian, Y., Yang, W., Van Gool, L.: Diffir: Efficient diffusion model for image restoration. In: ICCV (2023)

2023

[46] [46]

In: ICIP

Yang, K., Ouyang, L., Sun, H., Cai, J., Fu, L., Ding, J., Ho, C.M., Meng, Z.: Openrr-1k: A scalable dataset for real-world reflection removal. In: ICIP. IEEE (2025) 18 J. Shin and T.H. Kim

2025

[47] [47]

In: CVPR (2018)

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

2018

[48] [48]

In: CVPR (2018)

Zhang, X., Ng, R., Chen, Q.: Single image reflection separation with perceptual losses. In: CVPR (2018)

2018

[49] [49]

In: CVPR (2025)

Zhao, H., Li, M., Hu, Q., Guo, X.: Reversible decoupling network for single image reflection removal. In: CVPR (2025)

2025

[50] [50]

In: CVPR (2021)

Zheng, Q., Shi, B., Chen, J., Jiang, X., Duan, L.Y., Kot, A.C.: Single image reflec- tion removal with absorption effect. In: CVPR (2021)

2021

[51] [51]

In: CVPR (2024)

Zhu, Y., Zhao, W., Li, A., Tang, Y., Zhou, J., Lu, J.: Flowie: Efficient image enhancement via rectified flow. In: CVPR (2024)

2024

[52] [52]

Lpixel only

Zhu, Y., Fu, X., Jiang, P.T., Zhang, H., Sun, Q., Chen, J., Zha, Z.J., Li, B.: Revisiting single image reflection removal in the wild. In: CVPR (2024) Title Suppressed Due to Excessive Length 19 PRISM: Latent Composition Consistency for Single-Image Reflection Removal Supplementary Material Overview This supplementary material provides additional experime...

2024