SenseBench is the first physics-based benchmark with 10K+ instances and dual protocols to evaluate VLMs on remote sensing low-level perception and diagnostic description, revealing domain bias and specific failure modes.
Exploring clip for assessing the look and feel of images
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 8roles
background 1polarities
background 1representative citing papers
FuScore uses MLLMs to output continuous quality scores for IVIF images, constructs per-image soft labels from four sub-dimensions, and applies a tripartite objective with Thurstone fidelity to achieve higher correlation with human preferences than prior metrics.
Chain-of-Zoom factorizes extreme super-resolution into an autoregressive sequence of intermediate scales using a reused backbone model plus GRPO-tuned multi-scale VLM prompts.
SEGA adaptively scales RoPE attention components using spectral-energy guidance from the latent to improve structural coherence and fine details in high-resolution DiT synthesis.
DiRotQ uses PCA-based rotation-aware activation quantization combined with GPTQ to achieve better FID and PSNR in 4-bit diffusion transformers than prior methods like SVDQuant.
OPERA jointly optimizes restoration planning via RL over tool compositions and execution via agent-guided co-training of tools, claiming consistent gains over all-in-one models and prior agent methods on multi-degradation benchmarks.
FusionProxy is a distilled diffusion-based fusion module that adds thermal awareness to RGB vision systems in real time as an independent plug-and-play component.
citing papers explorer
-
SenseBench: A Benchmark for Remote Sensing Low-Level Visual Perception and Description in Large Vision-Language Models
SenseBench is the first physics-based benchmark with 10K+ instances and dual protocols to evaluate VLMs on remote sensing low-level perception and diagnostic description, revealing domain bias and specific failure modes.
-
Bringing Multimodal Large Language Models to Infrared-Visible Image Fusion Quality Assessment
FuScore uses MLLMs to output continuous quality scores for IVIF images, constructs per-image soft labels from four sub-dimensions, and applies a tripartite objective with Thurstone fidelity to achieve higher correlation with human preferences than prior metrics.
-
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
Chain-of-Zoom factorizes extreme super-resolution into an autoregressive sequence of intermediate scales using a reused backbone model plus GRPO-tuned multi-scale VLM prompts.
-
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers
SEGA adaptively scales RoPE attention components using spectral-energy guidance from the latent to improve structural coherence and fine details in high-resolution DiT synthesis.
-
DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers
DiRotQ uses PCA-based rotation-aware activation quantization combined with GPTQ to achieve better FID and PSNR in 4-bit diffusion transformers than prior methods like SVDQuant.
-
OPERA: An Agent for Image Restoration with End-to-End Joint Planning-Execution Optimization
OPERA jointly optimizes restoration planning via RL over tool compositions and execution via agent-guided co-training of tools, claiming consistent gains over all-in-one models and prior agent methods on multi-degradation benchmarks.
-
Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models
FusionProxy is a distilled diffusion-based fusion module that adds thermal awareness to RGB vision systems in real time as an independent plug-and-play component.
- EvoIR-Agent: Self-Evolving Image Restoration Agentic System via Experience-Driven Learning