pith. machine review for the scientific record. sign in

arxiv: 2511.20211 · v2 · submitted 2025-11-25 · 💻 cs.CV · cs.AI

Recognition: unknown

OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning

Authors on Pith no claims yet
classification 💻 cs.CV cs.AI
keywords generationomnialphatransparency-awareunifiedlayermattingmulti-taskrgba
0
0 comments X
read the original abstract

Transparency-aware generation requires modeling not only RGB appearance but also alpha-based opacity and cross-layer composition, which are essential for tasks such as image matting, object removal, layer decomposition, and multi-layer content creation. However, existing RGBA-related methods remain largely fragmented, with separate pipelines designed for individual tasks. While a unified model is desirable, supervised fine-tuning alone is insufficient, as localized regression objectives cannot directly optimize the compositional fidelity, alpha-boundary precision, and structural consistency required for high-quality RGBA generation. To address this, we propose OmniAlpha, a unified multi-task reinforcement learning framework for transparency-aware generation and manipulation. OmniAlpha combines an end-to-end alpha-aware VAE and a sequence-to-sequence Diffusion Transformer, with a bi-directional layer axis in positional encoding to jointly model multiple RGBA inputs and outputs within a single forward pass. Built on a multi-task SFT cold start, it further performs GRPO-style post-training with layer-aware rewards defined on decoded RGBA outputs, enabling direct optimization of cross-layer coherence and fine transparency details. Experiments across five categories of transparency-aware tasks show that OmniAlpha consistently outperforms its unified SFT baseline and achieves strong performance against specialized expert models, including a 9.07% relative reduction in RGB L1 on layer decomposition and 74%/68% improvements over conventional matting tools on SAD/Grad for automatic matting.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. RevealLayer: Disentangling Hidden and Visible Layers via Occlusion-Aware Image Decomposition

    cs.CV 2026-05 unverdicted novelty 7.0

    RevealLayer decomposes natural images into multiple RGBA layers using diffusion models with region-aware attention, occlusion-guided adaptation, and a composite loss, outperforming prior methods on a new benchmark dataset.

  2. UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

    cs.CV 2026-05 unverdicted novelty 6.0

    UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.