OSOR: One-Step Diffusion Inpainting for Effect-Aware Object Removal
Pith reviewed 2026-06-29 04:06 UTC · model grok-4.3
The pith
OSOR performs effect-aware object removal in one diffusion step while handling imperfect masks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OSOR introduces an occupancy-guided discriminator for precise boundary supervision that stabilizes single-step diffusion training, an alpha head that uses pretrained diffusion knowledge to predict removal regions and tolerate imperfect masks, and a semantic-anchored verification pipeline that filters noisy triplets to generate effect-aware supervision at scale; using the resulting 280K-pair CORNE dataset, the model surpasses multi-step diffusion baselines in perceptual quality at 4x to 30x faster inference.
What carries the argument
Occupancy-guided discriminator and alpha head that together enable stable single-step training and mask-robust, effect-aware inpainting.
If this is right
- Object removal becomes practical for interactive applications and edge devices.
- Users no longer need accurate masks for good results.
- Effect-aware training data can be produced automatically at large scale.
- Single-step diffusion training generalizes to other removal tasks with non-local effects.
Where Pith is reading between the lines
- The same single-step stabilization tricks could accelerate other diffusion editing operations.
- Verification pipelines like SAVP may help curate paired data for additional generative tasks.
- The method's success on anime and text benchmarks suggests it may transfer to stylized or text-heavy images.
Load-bearing premise
The semantic-anchored verification pipeline can reliably filter noisy instruction-based triplets into high-quality effect-aware supervision at scale.
What would settle it
A side-by-side evaluation of perceptual scores and measured inference time for OSOR versus multi-step baselines on AnimeEraseBench or TextEraseBench would confirm or refute the quality and speed claims.
Figures
read the original abstract
Real-world object removal is challenging due to two key difficulties: the target object's non-local effects, such as shadows and reflections, which are difficult to model, and the fact that user-provided masks are often inaccurate or incomplete. With billions of parameters and tens of denoising steps, diffusion-based models achieve strong removal performance at the expense of substantial computational cost, limiting their use in interactive applications and on edge devices. To address these challenges, we present OSOR (One-Step Object Removal), which simultaneously achieves efficient, effect-aware, and mask-robust object removal. Concretely, OSOR introduces: (1) an occupancy-guided discriminator for precise boundary supervision, enabling stable single-step diffusion training; (2) an alpha head that leverages knowledge from pretrained diffusion models to predict appropriate removal regions with minimal overhead, thereby handling imperfect masks; and (3) a semantic-anchored verification pipeline (SAVP) that filters noisy instruction-based triplets to produce effect-aware supervision at scale. Using SAVP, we curate CORNE, which contains 280K verified removal pairs, and further annotate AnimeEraseBench and TextEraseBench to evaluate performance on more complex removal tasks. Experiments show that OSOR surpasses strong multi-step diffusion baselines in perceptual quality while achieving $4\times$ to $30\times$ faster inference.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents OSOR, a one-step diffusion inpainting model for effect-aware object removal that handles non-local effects (shadows, reflections) and inaccurate user masks. It introduces an occupancy-guided discriminator for stable single-step training, an alpha head leveraging pretrained diffusion knowledge for mask robustness, and a semantic-anchored verification pipeline (SAVP) to filter instruction-based triplets and curate the CORNE dataset of 280K verified pairs, along with new benchmarks AnimeEraseBench and TextEraseBench. The central claim is that OSOR achieves superior perceptual quality to strong multi-step diffusion baselines at 4×–30× faster inference.
Significance. If the performance and stability claims hold with rigorous validation, the work would be significant for enabling interactive, edge-deployable object removal by reducing diffusion steps while addressing real-world effect modeling and mask issues. The dataset curation approach and architectural components for one-step stability could influence efficient generative editing methods, though the current evidence base is limited to high-level descriptions without metrics or ablations.
major comments (2)
- [Abstract] Abstract / SAVP description: The headline claims of better perceptual quality and one-step stability rest on effect-aware supervision from the SAVP-curated CORNE dataset, yet the manuscript provides no quantitative validation of SAVP (e.g., filter precision/recall, human agreement on retained non-local effects, or ablation of OSOR trained on raw vs. filtered triplets). This is load-bearing for both the occupancy-guided discriminator and alpha head contributions.
- [Abstract] Abstract: Performance gains are stated without any reported metrics (FID, LPIPS, user studies), baseline details, or experimental setup, preventing verification of the 4×–30× speedup and quality superiority over multi-step models.
minor comments (1)
- [Abstract] Abstract: The description of the alpha head as adding 'minimal overhead' would benefit from a parameter/FLOPs comparison to the base diffusion model.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract / SAVP description: The headline claims of better perceptual quality and one-step stability rest on effect-aware supervision from the SAVP-curated CORNE dataset, yet the manuscript provides no quantitative validation of SAVP (e.g., filter precision/recall, human agreement on retained non-local effects, or ablation of OSOR trained on raw vs. filtered triplets). This is load-bearing for both the occupancy-guided discriminator and alpha head contributions.
Authors: We acknowledge that the manuscript would benefit from explicit quantitative validation of SAVP. The current description focuses on the pipeline design and its role in curating CORNE, with downstream performance serving as indirect evidence. In the revision we will add an ablation comparing OSOR trained on raw versus SAVP-filtered triplets, along with verification statistics such as inter-annotator agreement on retained non-local effects. This directly supports the load-bearing role of SAVP for the other components. revision: yes
-
Referee: [Abstract] Abstract: Performance gains are stated without any reported metrics (FID, LPIPS, user studies), baseline details, or experimental setup, preventing verification of the 4×–30× speedup and quality superiority over multi-step models.
Authors: The abstract is written as a concise summary and therefore omits specific numbers and setup details. The full manuscript contains a complete experiments section that reports FID, LPIPS, user-study results, the exact multi-step diffusion baselines, and the inference-time measurements supporting the 4×–30× speedup claim. To improve immediate verifiability we will expand the abstract with the key quantitative highlights while retaining brevity. revision: partial
Circularity Check
No circularity: claims rest on new architectural components and empirical validation
full rationale
The paper introduces three new components—an occupancy-guided discriminator, an alpha head leveraging pretrained diffusion knowledge, and the SAVP filtering pipeline used to curate the CORNE dataset—without presenting equations, derivations, or first-principles predictions. Performance claims (perceptual quality and speed) are evaluated via external comparisons to multi-step baselines on annotated benchmarks, not by construction from fitted parameters or self-referential definitions. No load-bearing self-citations or uniqueness theorems are invoked in the provided text; the derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: CVPR
Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: CVPR. pp. 18187–18197 (2022)
2022
-
[2]
Black Forest Labs: Flux.https://github.com/black-forest-labs/flux(2024)
2024
-
[3]
In: CVPR
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: CVPR. pp. 18392–18402 (2023)
2023
-
[4]
Cherti, M., Beaumont, R., Wightman, R., Wortsman, M., Ilharco, G., Gordon, C., Schuhmann, C., Schmidt, L., Jitsev, J.: Reproducible scaling laws for contrastive language-image learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. pp. 2818–2829 (2023). https://doi.org/10.1109/CVPR5...
-
[5]
In: CVPR
Dong, Q., Cao, C., Fu, Y .: Incremental transformer structure enhanced image inpainting with masking positional encoding. In: CVPR. pp. 11348–11358 (2022)
2022
-
[6]
In: Globersons, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J.M., Zhang, C
Ekin, Y ., Yildirim, A.B., Caglar, E.E., Erdem, A., Erdem, E., Dundar, A.: Clipaway: Harmonizing focused embeddings for removing objects via diffusion models. In: Globersons, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J.M., Zhang, C. (eds.) Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Proce...
2024
-
[7]
In: NeurIPS
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y .: Generative adversarial nets. In: NeurIPS. pp. 2672–2680 (2014)
2014
-
[8]
In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time- scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. pp. 6626–6637 (2017), https://proceed...
2017
-
[9]
In: NeurIPS
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS. pp. 6840–6851 (2020)
2020
-
[10]
In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 (2022),https://openreview.net/forum?id=nZeVKeeFYf9
Hu, E.J., Shen, Y ., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 (2022),https://openreview.net/forum?id=nZeVKeeFYf9
2022
-
[11]
Hu, X., Peng, X., Luo, D., Ji, X., Peng, J., Jiang, Z., Zhang, J., Jin, T., Wang, C., Ji, R.: Diffumatting: Synthesizing arbitrary objects with matting-level annotation. In: Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXVIII. vol. 15126, pp. 396–413 (2024). https://doi.org/10.1007/9...
-
[12]
ACM TOG36(4), 107:1–107:14 (2017) 11 APREPRINT- JUNE29, 2026
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM TOG36(4), 107:1–107:14 (2017) 11 APREPRINT- JUNE29, 2026
2017
-
[13]
Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632
-
[14]
Jayasumana, S., Ramalingam, S., Veit, A., Glasner, D., Chakrabarti, A., Kumar, S.: Rethinking FID: towards a better evaluation metric for image generation. In: IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pp. 9307–9315 (2024). https://doi.org/10.1109/CVPR52733.2024.00889
-
[15]
Showui: One vision-language- action model for GUI visual agent
Jiang, L., Wang, Z., Bao, J., Zhou, W., Chen, D., Shi, L., Chen, D., Li, H.: Smarteraser: Remove anything from images using masked-region guidance. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. pp. 24452–24462 (2025). https://doi.org/10.1109/CVPR52734.2025.02277
-
[17]
Levin, A., Lischinski, D., Weiss, Y .: A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell.30(2), 228–242 (2008). https://doi.org/10.1109/TPAMI.2007.1177
-
[18]
In: CVPR
Li, W., Lin, Z., Zhou, K., Qi, L., Wang, Y ., Jia, J.: Mat: Mask-aware transformer for large hole image inpainting. In: CVPR. pp. 10748–10758 (2022)
2022
-
[19]
Li, X., Yang, Z., Quan, R., Yang, Y .: DRIP: unleashing diffusion priors for joint foreground and al- pha prediction in image matting. In: Advances in Neural Information Processing Systems 38: An- nual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 (2024), http://papers.nips.cc/paper_f...
2024
-
[20]
Lin, X., Yu, F., Hu, J., You, Z., Shi, W., Ren, J.S., Gu, J., Dong, C.: Harnessing diffusion-yielded score priors for image restoration. ACM Trans. Graph.44(6), 208:1–208:21 (2025). https://doi.org/10.1145/3763346
-
[21]
Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., Zhu, J., Zhang, L.: Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. In: Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XLVII. vol. 15105, pp. 38–55 (202...
-
[22]
Showui: One vision-language- action model for GUI visual agent
Liu, Y ., Zhou, H., Cui, B., Shang, W., Lin, R.: Erase diffusion: Empowering object removal through calibrating diffusion pathways. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. pp. 2418–2427 (2025). https://doi.org/10.1109/CVPR52734.2025.00231
-
[23]
Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pp. 11966–11976 (2022). https://doi.org/10.1109/CVPR52688.2022.01167
-
[24]
In: CVPR
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: Inpainting using denoising diffusion probabilistic models. In: CVPR. pp. 11451–11461 (2022)
2022
-
[26]
In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 (2022)
Meng, C., He, Y ., Song, Y ., Song, J., Wu, J., Zhu, J., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 (2022)
2022
-
[27]
(eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018
Mescheder, L.M., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80, pp. 3478–3487 (2018), http: //proceedings.mlr.press/v80/mescheder18a.html
2018
-
[28]
In: CVPR
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. In: CVPR. pp. 2536–2544 (2016)
2016
-
[29]
In: The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 (2024)
Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: SDXL: improving latent diffusion models for high-resolution image synthesis. In: The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 (2024)
2024
-
[30]
Porter, T.K., Duff, T.: Compositing digital images. In: Christiansen, H. (ed.) Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1984, Minneapolis, Minnesota, USA, July 23-27, 1984. pp. 253–259 (1984). https://doi.org/10.1145/800031.808606 12 APREPRINT- JUNE29, 2026
-
[31]
In: Meila, M., Zhang, T
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual E...
2021
-
[32]
In: The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025 (2025),https://openreview.net/forum?id=Ha6RTeWMd0
Ravi, N., Gabeur, V ., Hu, Y ., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., Mintun, E., Pan, J., Alwala, K.V ., Carion, N., Wu, C., Girshick, R.B., Dollár, P., Feichtenhofer, C.: SAM 2: Segment anything in images and videos. In: The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, Apri...
2025
-
[33]
In: CVPR
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR. pp. 10674–10685 (2022)
2022
-
[34]
In: 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022
Sagong, M., Yeo, Y ., Jung, S., Ko, S.: RORD: A real-world object removal dataset. In: 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022. p. 542 (2022)
2022
-
[35]
In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 (2022), https:// openreview.net/forum?id=TIdIXIpzhoI
Salimans, T., Ho, J.: Progressive distillation for fast sampling of diffusion models. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 (2022), https:// openreview.net/forum?id=TIdIXIpzhoI
2022
-
[36]
In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G
Sauer, A., Lorenz, D., Blattmann, A., Rombach, R.: Adversarial diffusion distillation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXXVI. vol. 15144, pp. 87–103 (2024). https://doi.org/10.1007/97...
-
[37]
In: International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA
Song, Y ., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models. In: International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. vol. 202, pp. 32211–32252 (2023)
2023
-
[38]
Sun, W., Dong, X., Cui, B., Tang, J.: Attentive eraser: Unleashing diffusion model’s object removal poten- tial via self-attention redirection guidance. In: AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA. pp. 20734–20742 (2025). https://doi.org/10.1609/AAAI.V39I19.34285
-
[39]
Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A., Kong, N., Goka, H., Park, K., Lempitsky, V .: Resolution-robust large mask inpainting with fourier convolutions. In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022. pp. 3172–3182 (2022). https://doi.org/10.1...
work page doi:10.1109/w 2022
-
[40]
Wei, R., Yin, Z., Zhang, S., Zhou, L., Wang, X., Ban, C., Cao, T., Sun, H., He, Z., Liang, K., Ma, Z.: Omnieraser: Remove objects and their effects in images with paired video-frame data (2025), https://arxiv.org/abs/ 2501.07397
arXiv 2025
-
[41]
In: Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXVII
Winter, D., Cohen, M., Fruchter, S., Pritch, Y ., Rav-Acha, A., Hoshen, Y .: Objectdrop: Bootstrapping counter- factuals for photorealistic object removal and insertion. In: Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXVII. vol. 15135, pp. 112–129 (2024)
2024
-
[42]
Xu, N., Price, B.L., Cohen, S., Huang, T.S.: Deep image matting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. pp. 311–320 (2017). https://doi.org/10.1109/CVPR.2017.41
-
[43]
Yin, T., Gharbi, M., Zhang, R., Shechtman, E., Durand, F., Freeman, W.T., Park, T.: One-step diffusion with distri- bution matching distillation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, W A, USA, June 16-22, 2024. pp. 6613–6623 (2024). https://doi.org/10.1109/CVPR52733.2024.00632
-
[44]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Free-form image inpainting with gated convolution. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. pp. 4470–4479 (2019). https://doi.org/10.1109/ICCV .2019.00457
-
[45]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Yu, Y ., Zeng, Z., Zheng, H., Luo, J.: Omnipaint: Mastering object-oriented editing via disentangled insertion- removal inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 17324–17334 (October 2025)
2025
-
[46]
Zeng, Y ., Fu, J., Chao, H., Guo, B.: Aggregated contextual transformations for high-resolution image inpainting. IEEE Trans. Vis. Comput. Graph.29(7), 3266–3280 (2023). https://doi.org/10.1109/TVCG.2022.3156949
-
[47]
Zhang, L., Agrawala, M.: Transparent image layer diffusion using latent transparency. ACM Trans. Graph.43(4), 100:1–100:15 (2024). https://doi.org/10.1145/3658150 13 APREPRINT- JUNE29, 2026
-
[48]
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. pp. 586–595 (2018). https://doi.org/10.1109/CVPR.2018.00068
-
[49]
Design Initiative for a 10 TeV pCM Wakefield Collider,
Zhao, J., Zhou, S., Wang, Z., Yang, P., Loy, C.C.: Objectclear: Complete object removal via object-effect attention. CoRRabs/2505.22636(2025). https://doi.org/10.48550/ARXIV .2505.22636 14 APREPRINT- JUNE29, 2026 Supplementary Material A Supplementary Details of SA VP and CORNE This section provides additional implementation details and dataset statistics...
work page internal anchor Pith review doi:10.48550/arxiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.