Animation2Code benchmark with 1,069 videos tests VLMs on generating animation code, showing persistent failures in temporal consistency despite good visual matches.
hub Canonical reference
Bovik, Hamid R
Canonical reference. 75% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
AutoMedBench evaluates AI agents on long-horizon medical workflows across five stages and finds validation and submission as dominant failure points based on thousands of runs.
On the public ReMIND dataset, a systematic benchmark of six synthesis models across 48 experiments finds LPIPS correlates with downstream segmentation utility while SSIM does not, with SynDiff-2.5D performing best.
DirectorBench is a profile-aware diagnostic benchmark that localizes bottlenecks in long-form video generation workflows using structured checkpoints and multi-agent evaluation.
Loki replaces RGB conditioning stacks with identity-orthogonal parametric face encodings rasterized for diffusion, achieving efficient cross-ID portrait animation without cross-ID training data.
Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.
PanoPlane achieves up to 17.8% PSNR gains in sparse-view indoor novel view synthesis by using training-free plane-aware panoramic completion to supervise 3D Gaussian Splatting.
GuardMarkGS unifies watermarking and adversarial edit deterrence into a single optimization framework for protecting 3D Gaussian Splatting assets.
A new large-scale synthetic multi-task benchmark dataset supplying pixel-perfect depth, domain-shifted night imagery, and multi-scale low-resolution pairs for aerial remote sensing.
MESA restores ancient inscription textures via multi-exemplar style transfer from VGG19 features with per-layer exemplar selection and OCR-derived weights, without any model training.
GeRM learns a distribution transfer vector field via a multi-condition ControlNet to convert physically-based renders into photorealistic images using text prompts and a 50K expert-curated dataset.
LumaFlux is a physically and perceptually guided diffusion transformer for SDR-to-HDR conversion that introduces PGA, PCM, and HDR Residual Coupler modules plus a new training corpus and benchmark, outperforming prior ITM methods.
A sensor-specific calibration pipeline using dark frames produces synthesized noisy RAW images that close 54-64% of the PSNR gap to real noise versus manufacturer profiles, accompanied by the open SNIC dataset of over 6600 paired images.
DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.
Harder classification tasks produce neural representations whose accuracy collapses under binarization and shuffling while easier tasks remain robust, defining task complexity via the performance gap between full-precision and perturbed networks.
PhotIQA is a new public dataset of 1134 expert-rated photoacoustic images for benchmarking image quality assessment in medical imaging.
Presents SLAM&Render, a robot-recorded benchmark dataset with 40 multi-modal sequences for testing SLAM, novel view synthesis, and Gaussian Splatting under controlled variations in lighting, arrangements, and occlusions.
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
A PINN framework with separate networks for conductivity and potentials, multiscale wavelet excitations, and FFE recovers dominant conductivity structures from finite DtN data with 3-12% relative error on synthetic tests, with FFE aiding sharp features.
Differential Unfolding replaces uniform stacking in deep unfolding networks with a heterogeneous structure of anchoring and differential evolution stages to achieve better accuracy-efficiency trade-offs in video SCI reconstruction.
Introduces Visibility-Aware Densification with Temporally-Adaptive Thresholding and Temporal Offset Warping to improve dynamic region quality in 3D Gaussian Splatting on three benchmarks.
Scene-adaptive nonlinear tone curves (ASE and AP3) with percentile normalisation and offset outperform linear gain for pseudo-GT generation in low-light 3DGS, delivering PSNR gains up to 4.34 dB on LOM and 3.25 dB on RealX3D across 21 scenes.
A plug-and-play perceptual wrapper using common random noise and Wasserstein Distortion supervision improves texture quality and reduces model size in 3D Gaussian Splatting.
citing papers explorer
-
Animation2Code: Evaluating Temporal Visual Reasoning in Video-to-Code Generation
Animation2Code benchmark with 1,069 videos tests VLMs on generating animation code, showing persistent failures in temporal consistency despite good visual matches.
-
A Systematic Benchmark of Intraoperative Ultrasound-to-MR Synthesis for Brain Tumour Surgery
On the public ReMIND dataset, a systematic benchmark of six synthesis models across 48 experiments finds LPIPS correlates with downstream segmentation utility while SSIM does not, with SynDiff-2.5D performing best.
-
Loki: Representation over Architecture for Diffusion-Based Portrait Animation
Loki replaces RGB conditioning stacks with identity-orthogonal parametric face encodings rasterized for diffusion, achieving efficient cross-ID portrait animation without cross-ID training data.
-
PanoPlane: Plane-Aware Panoramic Completion for Sparse-View Indoor 3D Gaussian Splatting
PanoPlane achieves up to 17.8% PSNR gains in sparse-view indoor novel view synthesis by using training-free plane-aware panoramic completion to supervise 3D Gaussian Splatting.
-
GuardMarkGS: Unified Ownership Tracing and Edit Deterrence for 3D Gaussian Splatting
GuardMarkGS unifies watermarking and adversarial edit deterrence into a single optimization framework for protecting 3D Gaussian Splatting assets.
-
SyMTRS: Benchmark Multi-Task Synthetic Dataset for Depth, Domain Adaptation and Super-Resolution in Aerial Imagery
A new large-scale synthetic multi-task benchmark dataset supplying pixel-perfect depth, domain-shifted night imagery, and multi-scale low-resolution pairs for aerial remote sensing.
-
MESA: A Training-Free Multi-Exemplar Deep Framework for Restoring Ancient Inscription Textures
MESA restores ancient inscription textures via multi-exemplar style transfer from VGG19 features with per-layer exemplar selection and OCR-derived weights, without any model training.
-
GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic
GeRM learns a distribution transfer vector field via a multi-condition ControlNet to convert physically-based renders into photorealistic images using text prompts and a 50K expert-curated dataset.
-
LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers
LumaFlux is a physically and perceptually guided diffusion transformer for SDR-to-HDR conversion that introduces PGA, PCM, and HDR Residual Coupler modules plus a new training corpus and benchmark, outperforming prior ITM methods.
-
Differential Unfolding: Efficient Unfolding Reconstruction for Video Snapshot Compressive Imaging
Differential Unfolding replaces uniform stacking in deep unfolding networks with a heterogeneous structure of anchoring and differential evolution stages to achieve better accuracy-efficiency trade-offs in video SCI reconstruction.
-
Temporally Aware Densification for Dynamic 3D Gaussian Splatting
Introduces Visibility-Aware Densification with Temporally-Adaptive Thresholding and Temporal Offset Warping to improve dynamic region quality in 3D Gaussian Splatting on three benchmarks.
-
Scene-Adaptive Nonlinear Tone Curves for Pseudo Ground-Truth Generation in Low-Light 3D Gaussian Splatting
Scene-adaptive nonlinear tone curves (ASE and AP3) with percentile normalisation and offset outperform linear gain for pseudo-GT generation in low-light 3DGS, delivering PSNR gains up to 4.34 dB on LOM and 3.25 dB on RealX3D across 21 scenes.
-
Seeing What Matters: Perceptual Wrapper with Common Randomness for 3D Gaussian Splatting
A plug-and-play perceptual wrapper using common random noise and Wasserstein Distortion supervision improves texture quality and reduces model size in 3D Gaussian Splatting.
-
LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators
LiFT factorizes 3D medical volume synthesis into per-slice 2D generation and inter-slice trajectory learning, using a tri-planar drifting loss for unconditional coherence and a z-context mixer for paired translation tasks.
-
MSIQ: Moment-based Scale-Invariant Quality Measure for Single Image Super-Resolution
MSIQ is a scale-invariant, model-free quality metric for single image super-resolution using normalized central geometric moments for direct comparison of different-resolution images.
-
LiBrA-Net: Lie-Algebraic Bilateral Affine Fields for Real-Time 4K Video Dehazing
LiBrA-Net achieves real-time native 4K video dehazing via Lie-algebraic bilateral affine fields and releases the first 4K paired dehazing video benchmark with per-frame annotations.
-
AsyncEvGS: Asynchronous Event-Assisted Gaussian Splatting for Handheld Motion-Blurred Scenes
AsyncEvGS reconstructs high-fidelity 3D scenes from motion-blurred images by first deblurring via event data then using VGGT-based pose estimation and structure-driven losses inside Gaussian Splatting.
-
SIFT-VTON: Geometric Correspondence Supervision on Cross-Attention for Virtual Try-On
SIFT-VTON adds explicit geometric supervision from SIFT keypoints to diffusion-based virtual try-on to improve spatial alignment and detail preservation.
-
CAHAL: Clinically Applicable resolution enHAncement for Low-resolution MRI scans
CAHAL introduces a physics-informed mixture-of-experts super-resolution network for clinical MRI that conditions on resolution and anisotropy and uses edge-penalised, Fourier, and segmentation-guided losses to reduce hallucinations compared with prior generative methods.
-
CDSA-Net:Collaborative Decoupling of Vascular Structure and Background for High-Fidelity Coronary Digital Subtraction Angiography
CDSA-Net decouples vascular structure extraction and background restoration in coronary DSA via hierarchical geometric priors and adaptive noise modeling to eliminate artifacts while preserving tissue fidelity.
-
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
-
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
-
GIF: A Conditional Multimodal Generative Framework for IR Drop Imaging in Chip Layouts
GIF fuses geometrical image features and logical graph topology in a conditional diffusion model to generate high-quality IR drop images for chip layouts, outperforming prior ML methods on CircuitNet-N28 with SSIM 0.78, Pearson 0.95, PSNR 21.77, and NMAE 0.026.
-
SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
-
Single-Stage Signal Attenuation Diffusion Model for Low-Light Image Enhancement and Denoising
SADM adds a signal attenuation coefficient to the diffusion forward process so that reverse denoising simultaneously recovers brightness and suppresses noise without extra stages or correction modules.
-
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
-
RefGlass-GS: A UAV-Enabled Fusion Framework for Photorealistic, Semantic and Interactive Digitization of Reflective Glass Facades via Gaussian Splatting
RefGlass-GS is a fusion framework using UAV data, MAP-based panel segmentation, viewpoint optimization, and modified Gaussian Splatting with Reflection MLP to achieve improved photorealistic and semantic modeling of reflective glass facades.
-
SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks
The paper releases SignNet-1M, a 1M-scale augmented dataset for ASL, CSL and DGS with 3DGS and diffusion-based variations, plus benchmarks showing improved cross-shift generalization.
-
StereoGenBench: A Synthetic Multi-Camera Benchmark for Stereo Generation under Controlled Baseline Regimes
StereoGenBench is a new synthetic benchmark dataset featuring calibrated multi-baseline stereo pairs with dense metric depth, intrinsics, and poses from Unreal Engine renders for controlled evaluation of stereo generation.
-
Flow matching for Sentinel-2 super-resolution: implementation, application, and implications
Flow matching achieves single-step pixel accuracy and 20-step perceptual quality for Sentinel-2 super-resolution, outperforming diffusion and Real-ESRGAN while enabling large-scale 2.5 m land-cover products.
-
Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
-
GeneralVLA-2: Geometry-Aware Reconstruction and Governed Memory for Robot Planning
GeneralVLA-2 introduces GeoFuse-MV3D for improved multi-view 3D reconstruction and a governed memory system, demonstrating modest gains on 3D object and task benchmarks.
-
Contrast-Informed Augmentation and Domain-Adversarial Training for Adult-to-Neonatal MR Reconstruction Generalization
Mixed training with contrast-informed augmentation and domain-adversarial training improves E2E-VarNet performance on neonatal T2-weighted brain MR reconstruction at R=4 and R=8 compared to adult-only training.
-
Low-Magnification SEM May Suffice: Interpretable Deep Learning for Multi-Scale Fracture-Cause Classification in Zirconia-Toughened Alumina
A fine-tuned ViT on 8493 SEM images classifies fracture causes in zirconia-toughened alumina at 0.907 accuracy and 0.888 macro-F1, with comparable performance at 50x versus higher magnifications.
-
MSDS: Deep Structural Similarity with Multiscale Representation
MSDS computes DeepSSIM at multiple pyramid scales and fuses the scores with learned weights, producing consistent improvements over single-scale DeepSSIM on IQA benchmarks with negligible extra cost.