DACG-IR adds a lightweight degradation-aware module that generates prompts to adaptively gate attention temperature, output features, and spatial-channel fusion in an encoder-decoder network for unified image restoration.
Controlling vision-language models for multi-task image restoration
9 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
PhySe-RPO enables diffusion-based surgical smoke removal by converting restoration into a stochastic policy optimized with physics consistency and CLIP semantic rewards under limited supervision.
FakeReasoning is an MLLM-based framework for unified forgery detection and reasoning on AI-generated images, supported by the new MMFR-Dataset of 120K images and 378K annotations across 10 generators.
EvoIR-Agent formulates experience components into a hierarchical pool with a self-evolving update mechanism to improve performance and efficiency of training-free MLLM image restoration agents over prior paradigms.
The paper proposes the Degradation Frequency Curve (DFC) as an explicit spectral representation for quantifying degradations and develops a DFC-guided multi-scale restorer that achieves state-of-the-art performance on composite and real-world benchmarks.
VLM-IMI adapts VLMs with iterative and manual instructions plus a learnable fusion module to guide diffusion-based generative low-light image enhancement, outperforming prior methods in perceptual quality.
TPGDiff introduces hierarchical triple-prior guidance in a diffusion network, placing degradation priors throughout, structural priors in shallow layers, and semantic priors in deep layers for improved all-in-one image restoration.
Q-Agent uses CoT decomposition on a fine-tuned MLLM for multi-degradation perception plus IQA-driven greedy selection of restoration algorithms to claim better performance than All-in-One IR models.
citing papers explorer
-
Degradation-Aware Adaptive Context Gating for Unified Image Restoration
DACG-IR adds a lightweight degradation-aware module that generates prompts to adaptively gate attention temperature, output features, and spatial-channel fusion in an encoder-decoder network for unified image restoration.
-
PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal
PhySe-RPO enables diffusion-based surgical smoke removal by converting restoration into a stochastic policy optimized with physics consistency and CLIP semantic rewards under limited supervision.
-
Toward Generalizable Forgery Detection and Reasoning
FakeReasoning is an MLLM-based framework for unified forgery detection and reasoning on AI-generated images, supported by the new MMFR-Dataset of 120K images and 378K annotations across 10 generators.
-
EvoIR-Agent: Self-Evolving Image Restoration Agentic System via Experience-Driven Learning
EvoIR-Agent formulates experience components into a hierarchical pool with a self-evolving update mechanism to improve performance and efficiency of training-free MLLM image restoration agents over prior paradigms.
-
Degradation Frequency Curve: An Explicit Frequency-Quantified Representation for All-in-One Image Restoration
The paper proposes the Degradation Frequency Curve (DFC) as an explicit spectral representation for quantifying degradations and develops a DFC-guided multi-scale restorer that achieves state-of-the-art performance on composite and real-world benchmarks.
-
Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement
VLM-IMI adapts VLMs with iterative and manual instructions plus a learnable fusion module to guide diffusion-based generative low-light image enhancement, outperforming prior methods in perceptual quality.
-
TPGDiff: Hierarchical Triple-Prior Guided Diffusion for Image Restoration
TPGDiff introduces hierarchical triple-prior guidance in a diffusion network, placing degradation priors throughout, structural priors in shallow layers, and semantic priors in deep layers for improved all-in-one image restoration.
-
Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model
Q-Agent uses CoT decomposition on a fine-tuned MLLM for multi-degradation perception plus IQA-driven greedy selection of restoration algorithms to claim better performance than All-in-One IR models.
- SCRIPT: Scalable Diffusion Policy with Multi-stage Training for Language-driven Physics-based Humanoid Control