{"paper":{"title":"LiWi: Layering in the Wild","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Agent-driven synthesis creates over 100,000 layered natural images and trains models to decompose them with state-of-the-art accuracy.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Dong Chen, Fang Li, Haoyang Tong, Jingling Fu, Junshi Huang, Lichen Ma, Luohang Liu, Xinyuan Shan, Yan Li, Yu He","submitted_at":"2026-05-14T08:30:34Z","abstract_excerpt":"Recent advances in generative models have empowered impressive layered image generation, yet their success is largely confined to graphic design domains. The layering of in-the-wild images remains an underexplored problem, limiting fine-grained editing and applications of images in real-world scenarios. Specifically, challenges remain in scalable layered data and the modeling of object interaction in natural images, such as illumination effects and structural boundary. To address these bottlenecks, we propose a novel framework for high-fidelity natural image decomposition. First, we introduce "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"our framework achieves state-of-the-art (SoTA) performance in natural image decomposition, outperforming existing models in RGB L1 and Alpha IoU metrics.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The Agent-driven Data Decomposition (ADD) pipeline produces high-quality, accurate layered ground truth for natural images without manual intervention, and the shadow-guided and degradation-restoration objectives correctly capture real-world illumination and boundary interactions.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LiWi uses an agent-driven data synthesis pipeline to build the LiWi-100k dataset and a model with shadow-guided and degradation-restoration objectives that achieves SoTA performance on RGB L1 and Alpha IoU for natural image layering decomposition.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Agent-driven synthesis creates over 100,000 layered natural images and trains models to decompose them with state-of-the-art accuracy.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"394849c647ec0933cf9d07971150fe3feb031a73d9bc88c558722783548cbb31"},"source":{"id":"2605.14552","kind":"arxiv","version":1},"verdict":{"id":"02280ee4-9496-4f73-ac51-9476d9bd2c00","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T01:58:02.797406Z","strongest_claim":"our framework achieves state-of-the-art (SoTA) performance in natural image decomposition, outperforming existing models in RGB L1 and Alpha IoU metrics.","one_line_summary":"LiWi uses an agent-driven data synthesis pipeline to build the LiWi-100k dataset and a model with shadow-guided and degradation-restoration objectives that achieves SoTA performance on RGB L1 and Alpha IoU for natural image layering decomposition.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The Agent-driven Data Decomposition (ADD) pipeline produces high-quality, accurate layered ground truth for natural images without manual intervention, and the shadow-guided and degradation-restoration objectives correctly capture real-world illumination and boundary interactions.","pith_extraction_headline":"Agent-driven synthesis creates over 100,000 layered natural images and trains models to decompose them with state-of-the-art accuracy."},"references":{"count":41,"sample":[{"doi":"","year":2021,"title":"Layered neural atlases for consistent video editing.ACM Transactions on Graphics, 40(6):1–12, 2021","work_id":"1074bcb1-614f-4461-a703-8d4fb1b61bf5","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Text2live: Text-driven layered image and video editing","work_id":"e88983ba-b42e-4f99-a55e-a27634333a60","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Shape- aware text-driven layered video editing","work_id":"13db528e-e5ac-49de-9b72-e336efa0ba26","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Resolution-robust large mask inpainting with fourier convolutions","work_id":"69b58502-2fe9-4185-8606-bb00dc941478","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Layerd: Decomposing raster graphic designs into layers","work_id":"f975a9a9-3ddd-442e-b06b-7c34428fe4ed","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":41,"snapshot_sha256":"c3a1d0ae14be5c8e730cfc518679ed38e7dcef5d9404e49b50cf6bd6201f102f","internal_anchors":6},"formal_canon":{"evidence_count":2,"snapshot_sha256":"003f0091d120581cb1ca1a33fea5a6e5de59696441c7753b39b22967b36a0a26"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}