hub Canonical reference

Bovik, Hamid R

Zhou Wang, Alan C · 2004 · arXiv 2003.819861

Canonical reference. 73% of citing Pith papers cite this work as background.

50 Pith papers citing it

Background 73% of classified citations

read on arXiv browse 50 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 2 dataset 1

citation-polarity summary

background 8 use method 2 use dataset 1

representative citing papers

PanoPlane: Plane-Aware Panoramic Completion for Sparse-View Indoor 3D Gaussian Splatting

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PanoPlane achieves up to 17.8% PSNR gains in sparse-view indoor novel view synthesis by using training-free plane-aware panoramic completion to supervise 3D Gaussian Splatting.

GuardMarkGS: Unified Ownership Tracing and Edit Deterrence for 3D Gaussian Splatting

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

GuardMarkGS unifies watermarking and adversarial edit deterrence into a single optimization framework for protecting 3D Gaussian Splatting assets.

SyMTRS: Benchmark Multi-Task Synthetic Dataset for Depth, Domain Adaptation and Super-Resolution in Aerial Imagery

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

A new large-scale synthetic multi-task benchmark dataset supplying pixel-perfect depth, domain-shifted night imagery, and multi-scale low-resolution pairs for aerial remote sensing.

MESA: A Training-Free Multi-Exemplar Deep Framework for Restoring Ancient Inscription Textures

cs.CV · 2026-04-19 · unverdicted · novelty 7.0

MESA restores ancient inscription textures via multi-exemplar style transfer from VGG19 features with per-layer exemplar selection and OCR-derived weights, without any model training.

GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic

cs.CV · 2026-04-10 · unverdicted · novelty 7.0 · 2 refs

GeRM learns a distribution transfer vector field via a multi-condition ControlNet to convert physically-based renders into photorealistic images using text prompts and a 50K expert-curated dataset.

LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

LumaFlux is a physically and perceptually guided diffusion transformer for SDR-to-HDR conversion that introduces PGA, PCM, and HDR Residual Coupler modules plus a new training corpus and benchmark, outperforming prior ITM methods.

SNIC: Synthesized Noisy Images using Calibration

eess.IV · 2025-12-17 · unverdicted · novelty 7.0

A sensor-specific calibration pipeline using dark frames produces synthesized noisy RAW images that close 54-64% of the PSNR gap to real noise versus manufacturer profiles, accompanied by the open SNIC dataset of over 6600 paired images.

Delta Rectified Flow Sampling for Text-to-Image Editing

cs.CV · 2025-09-01 · unverdicted · novelty 7.0

DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.

Task complexity shapes internal representations and robustness in neural networks

cs.LG · 2025-08-07 · unverdicted · novelty 7.0

Harder classification tasks produce neural representations whose accuracy collapses under binarization and shuffling while easier tasks remain robust, defining task complexity via the performance gap between full-precision and perturbed networks.

PhotIQA: A photoacoustic image data set with image quality ratings

eess.IV · 2025-07-04 · conditional · novelty 7.0

PhotIQA is a new public dataset of 1134 expert-rated photoacoustic images for benchmarking image quality assessment in medical imaging.

SLAM&Render: A Benchmark for the Intersection Between Neural Rendering, Gaussian Splatting and SLAM

cs.RO · 2025-04-18 · unverdicted · novelty 7.0

Presents SLAM&Render, a robot-recorded benchmark dataset with 40 multi-modal sequences for testing SLAM, novel view synthesis, and Gaussian Splatting under controlled variations in lighting, arrangements, and occlusions.

Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET

eess.IV · 2024-06-18 · unverdicted · novelty 7.0

Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

cs.CV · 2023-12-28 · conditional · novelty 7.0

Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.

LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

LiFT factorizes 3D medical volume synthesis into per-slice 2D generation and inter-slice trajectory learning, using a tri-planar drifting loss for unconditional coherence and a z-context mixer for paired translation tasks.

MSIQ: Moment-based Scale-Invariant Quality Measure for Single Image Super-Resolution

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

MSIQ is a scale-invariant, model-free quality metric for single image super-resolution using normalized central geometric moments for direct comparison of different-resolution images.

MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery

q-bio.NC · 2026-05-16 · unverdicted · novelty 6.0

MIRAGE achieves state-of-the-art mental image reconstruction from fMRI on the NSD-Imagery benchmark by using a linear backbone with multi-modal text and image features fed to a diffusion model.

Learning Developmental Scaffoldings to Guide Self-Organisation

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

Joint training of NCA rules and SIREN pre-patterns improves robustness, encoding capacity, and symmetry breaking compared to purely self-organizing models by offloading information to initial conditions.

LiBrA-Net: Lie-Algebraic Bilateral Affine Fields for Real-Time 4K Video Dehazing

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

LiBrA-Net achieves real-time native 4K video dehazing via Lie-algebraic bilateral affine fields and releases the first 4K paired dehazing video benchmark with per-frame annotations.

FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.

AsyncEvGS: Asynchronous Event-Assisted Gaussian Splatting for Handheld Motion-Blurred Scenes

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

AsyncEvGS reconstructs high-fidelity 3D scenes from motion-blurred images by first deblurring via event data then using VGGT-based pose estimation and structure-driven losses inside Gaussian Splatting.

SIFT-VTON: Geometric Correspondence Supervision on Cross-Attention for Virtual Try-On

cs.CV · 2026-05-02 · unverdicted · novelty 6.0

SIFT-VTON adds explicit geometric supervision from SIFT keypoints to diffusion-based virtual try-on to improve spatial alignment and detail preservation.

Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.

Defining Robust Ultrasound Quality Metrics via an Ultrasound Foundation Model

eess.IV · 2026-04-21 · unverdicted · novelty 6.0 · 2 refs

Proposes TinyUSFM-uLPIPS and TinyUSFM-NRQ metrics that show better alignment with segmentation task performance and expert preference than PSNR or VGG-LPIPS in ultrasound imaging.

CAHAL: Clinically Applicable resolution enHAncement for Low-resolution MRI scans

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

CAHAL introduces a physics-informed mixture-of-experts super-resolution network for clinical MRI that conditions on resolution and anisotropy and uses edge-penalised, Fourier, and segmentation-guided losses to reduce hallucinations compared with prior generative methods.

citing papers explorer

Showing 8 of 8 citing papers after filters.

FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry cs.LG · 2026-05-11 · unverdicted · none · ref 26
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism cs.DC · 2026-04-16 · unverdicted · none · ref 41
CoCoDiff achieves 3.6x average and 8.4x peak speedup for distributed DiT inference on up to 96 GPU tiles via tile-aware all-to-all, V-first scheduling, and selective V communication.
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios cs.CV · 2026-04-15 · unverdicted · none · ref 42
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment cs.CV · 2026-04-12 · unverdicted · none · ref 57
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation cs.CV · 2026-04-09 · unverdicted · none · ref 53
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
Seeing enough: non-reference perceptual resolution selection for power-efficient client-side rendering cs.GR · 2026-04-09 · unverdicted · none · ref 35 · 2 links
A neural network trained on full-reference perceptual quality labels predicts minimal sufficient resolution for rendered video to enable power-efficient client-side rendering.
UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment cs.LG · 2026-05-08 · unverdicted · none · ref 53
UMEDA is a new graph federated learning method that uses low-rank spectral filtering and diffusion over a shared integral operator to fuse multi-modal data privately, outperforming baselines on MM-Fi and RELI11D under high heterogeneity and tight privacy budgets.
MSDS: Deep Structural Similarity with Multiscale Representation cs.CV · 2026-04-21 · unverdicted · none · ref 4
MSDS computes DeepSSIM at multiple pyramid scales and fuses the scores with learned weights, producing consistent improvements over single-scale DeepSSIM on IQA benchmarks with negligible extra cost.

Bovik, Hamid R

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer