Canonical reference

In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022

Minesh Mathew, Viraj Bagal, Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny · 2022 · DOI 10.1109/w

Canonical reference. 79% of citing Pith papers cite this work as background.

43 Pith papers citing it

Background 79% of classified citations

open at publisher browse 43 citing papers

citation-role summary

background 11 baseline 2 method 1

citation-polarity summary

background 11 baseline 2 use method 1

representative citing papers

How Neural Losses Shape VAE Latents

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

Neural reconstruction losses in VAEs reduce latent information content and produce more isotropic latent geometries with even uncertainty distribution.

PanoPlane: Plane-Aware Panoramic Completion for Sparse-View Indoor 3D Gaussian Splatting

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PanoPlane achieves up to 17.8% PSNR gains in sparse-view indoor novel view synthesis by using training-free plane-aware panoramic completion to supervise 3D Gaussian Splatting.

AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation

cs.CV · 2026-05-11 · conditional · novelty 7.0

AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.

Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

CEA assembles per-token low-rank residual updates via dense affinities over hyper-adapter-generated components to improve all-in-one image restoration on spatially non-uniform degradations.

Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.

Circular Phase Representation and Geometry-Aware Optimization for Ptychographic Image Reconstruction

eess.IV · 2026-04-29 · unverdicted · novelty 7.0

A deep learning framework represents phase on the unit circle with a geodesic loss for improved ptychographic amplitude and phase reconstruction.

No More Guessing: a Verifiable Gradient Inversion Attack in Federated Learning

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

VGIA certifies exact recovery of individual records from aggregated gradients in federated learning using a subspace verification test on ReLU hyperplanes.

Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.

High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models

eess.IV · 2025-05-28 · unverdicted · novelty 7.0

Diffusion models reconstruct high-resolution 3D cardiac ultrasound volumes from heavily undersampled elevation planes and outperform traditional interpolation and supervised deep learning baselines.

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

cs.CV · 2024-12-31 · accept · novelty 7.0

OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.

OSOR: One-Step Diffusion Inpainting for Effect-Aware Object Removal

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

OSOR is a one-step diffusion inpainting method using an occupancy-guided discriminator, alpha head, and semantic-anchored verification pipeline to achieve effect-aware object removal, outperforming multi-step baselines in quality at 4-30x speed.

On the QUEST for Uncertainty Quantification via Highest Density Regions

cs.LG · 2026-06-17 · unverdicted · novelty 6.0

QUEST measures uncertainty via the Lebesgue volume of highest-density regions of a distribution's support, evaluated at robustness parameter alpha, and claims to satisfy UQ axioms while outperforming variance and differential entropy on selective prediction tasks.

DiffPC: Diffusion-Based Projector Photometric Compensation

cs.MM · 2026-06-16 · unverdicted · novelty 6.0

DiffPC reformulates projector photometric compensation as a diffusion-based denoising task guided by photometry and image content to achieve better results in unseen environments.

Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning

cs.RO · 2026-06-11 · unverdicted · novelty 6.0

GRASP maps natural language to bounding-box goals via VLM for neuro-symbolic planning and reports 73.3% success in 90 real-robot trials without task-specific training.

MoRE: A Mixture-of-Experts-Based Task-Adaptive End-to-End Network for Multimodal MRI Reconstruction

eess.IV · 2026-06-01 · unverdicted · novelty 6.0

MoRE integrates a sparsely activated MoE module with unsupervised routing into a variational network for stable multimodal MRI reconstruction on fastMRI brain and knee data at 8x undersampling.

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

IPO-Mine releases a toolkit and large multimodal dataset for structured analysis of IPO filings and shows state-of-the-art models diverge from human judgments on chart quality and misleadingness.

Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

Existing visual attribution methods often fail to identify the visual evidence used by LVLMs in chest X-ray reasoning, while MedFocus using unbalanced optimal transport and targeted interventions substantially outperforms them across multiple models and settings.

LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

LiFT factorizes 3D medical volume synthesis into per-slice 2D generation and inter-slice trajectory learning, using a tri-planar drifting loss for unconditional coherence and a z-context mixer for paired translation tasks.

Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

SCOUP decouples 2D sparse code learning from 3D Gaussian optimization to deliver up to 400x training speedup and 3x better memory efficiency while matching accuracy on open-vocabulary 3D queries.

SAGE: Scalable Agentic Grounded Evaluation for Crop Disease Diagnosis

cs.MA · 2026-05-10 · unverdicted · novelty 6.0

A new 839K-image plant disease dataset paired with an agentic visual reasoning system that uses source-grounded symptoms raises diagnosis accuracy by 16.2 points on average and generalizes to unseen crops without retraining.

MAG-VLAQ: Multi-modal Aerial-Ground Query Aggregation for Cross-View Place Recognition

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

MAG-VLAQ fuses multi-modal ground and aerial data via ODE-conditioned vector-of-locally-aggregated-queries to nearly double recall@1 on aerial-ground place recognition benchmarks.

Communicating Sound Through Natural Language

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Lexical acoustic coding lets LLMs transmit audio waveforms as editable natural-language sentences that another LLM can parse and reconstruct into sound.

DINO-MVR: Multi-View Readout of Frozen DINOv3 for Annotation-Efficient Medical Segmentation

cs.CV · 2026-05-08 · conditional · novelty 6.0

Frozen DINOv3 features with multi-view MLP probes, entropy-weighted fusion, and spatial regularization achieve 0.895 Dice on Kvasir-SEG, 0.897 on ISIC 2018, and 0.908 on BraTS FLAIR, recovering 98.4% of full-data performance with only five annotated patients.

LARGO: Low-Rank Hypernetwork for Handling Missing Modalities

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

LARGO uses a low-rank hypernetwork with CP decomposition to unify 2^N-1 missing-modality models into one, ranking first in 47 of 52 configurations on BraTS and ISLES with small Dice gains over baselines.

citing papers explorer

Showing 24 of 24 citing papers after filters.

PanoPlane: Plane-Aware Panoramic Completion for Sparse-View Indoor 3D Gaussian Splatting cs.CV · 2026-05-13 · unverdicted · none · ref 50
PanoPlane achieves up to 17.8% PSNR gains in sparse-view indoor novel view synthesis by using training-free plane-aware panoramic completion to supervise 3D Gaussian Splatting.
AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation cs.CV · 2026-05-11 · conditional · none · ref 26
AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.
Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration cs.CV · 2026-05-07 · unverdicted · none · ref 45
CEA assembles per-token low-rank residual updates via dense affinities over hyper-adapter-generated components to improve all-in-one image restoration on spatially non-uniform degradations.
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting cs.CV · 2026-05-04 · unverdicted · none · ref 54
Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification cs.CV · 2026-04-09 · unverdicted · none · ref 23
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning cs.CV · 2024-12-31 · accept · none · ref 140
OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.
OSOR: One-Step Diffusion Inpainting for Effect-Aware Object Removal cs.CV · 2026-06-26 · unverdicted · none · ref 39
OSOR is a one-step diffusion inpainting method using an occupancy-guided discriminator, alpha head, and semantic-anchored verification pipeline to achieve effect-aware object removal, outperforming multi-step baselines in quality at 4-30x speed.
Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models cs.CV · 2026-05-19 · unverdicted · none · ref 16
Existing visual attribution methods often fail to identify the visual evidence used by LVLMs in chest X-ray reasoning, while MedFocus using unbalanced optimal transport and targeted interventions substantially outperforms them across multiple models and settings.
LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators cs.CV · 2026-05-18 · unverdicted · none · ref 22
LiFT factorizes 3D medical volume synthesis into per-slice 2D generation and inter-slice trajectory learning, using a tri-planar drifting loss for unconditional coherence and a z-context mixer for paired translation tasks.
Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting cs.CV · 2026-05-13 · unverdicted · none · ref 39
SCOUP decouples 2D sparse code learning from 3D Gaussian optimization to deliver up to 400x training speedup and 3x better memory efficiency while matching accuracy on open-vocabulary 3D queries.
MAG-VLAQ: Multi-modal Aerial-Ground Query Aggregation for Cross-View Place Recognition cs.CV · 2026-05-10 · unverdicted · none · ref 2
MAG-VLAQ fuses multi-modal ground and aerial data via ODE-conditioned vector-of-locally-aggregated-queries to nearly double recall@1 on aerial-ground place recognition benchmarks.
DINO-MVR: Multi-View Readout of Frozen DINOv3 for Annotation-Efficient Medical Segmentation cs.CV · 2026-05-08 · conditional · none · ref 4
Frozen DINOv3 features with multi-view MLP probes, entropy-weighted fusion, and spatial regularization achieve 0.895 Dice on Kvasir-SEG, 0.897 on ISIC 2018, and 0.908 on BraTS FLAIR, recovering 98.4% of full-data performance with only five annotated patients.
LARGO: Low-Rank Hypernetwork for Handling Missing Modalities cs.CV · 2026-05-07 · unverdicted · none · ref 10
LARGO uses a low-rank hypernetwork with CP decomposition to unify 2^N-1 missing-modality models into one, ranking first in 47 of 52 configurations on BraTS and ISLES with small Dice gains over baselines.
Amodal SAM: A Unified Amodal Segmentation Framework with Generalization cs.CV · 2026-04-22 · unverdicted · none · ref 31
Amodal SAM extends SAM with a Spatial Completion Adapter, Target-Aware Occlusion Synthesis for data, and consistency losses to reach SOTA amodal segmentation with strong generalization to new objects and scenes.
From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation cs.CV · 2026-04-16 · unverdicted · none · ref 52
Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.
Reliability-Aware Prototype Calibration for Frozen Pose-Flow Video Anomaly Detection cs.CV · 2026-06-18 · unverdicted · none · ref 40
RPC is a post-hoc calibration technique that augments flow-based anomaly scores with nearest-prototype deviation in the frozen latent space, gated by keypoint confidence, yielding consistent AUROC gains on video anomaly detection tasks.
Image Quality Assessment of Identity Cards Using Measures from Open Face Image Quality cs.CV · 2026-06-10 · unverdicted · none · ref 8
OFIQ quality measures applied to preprocessed ID card images show correlation with improved presentation attack detection performance across four datasets containing both real and printed mock cards.
Do Composed Image Retrieval Benchmarks Require Multimodal Composition? cs.CV · 2026-05-14 · unverdicted · none · ref 19
CIR benchmarks contain many unimodal shortcuts and noisy queries, leading to overestimation of models' multimodal composition capabilities.
Beyond Masks: The Case for Medical Image Parsing cs.CV · 2026-05-12 · unverdicted · none · ref 16
Medical image parsing is proposed as the central output for the field instead of masks, with an audit showing that none of eleven representative systems produces a well-formed parse containing attributes, relationships, and closure.
Weighted Knowledge Distillation for Semi-Supervised Segmentation of Maxillary Sinus in Panoramic X-ray Images cs.CV · 2026-04-22 · unverdicted · none · ref 20
A semi-supervised framework using weighted knowledge distillation and SinusCycle-GAN refinement achieves 96.35% Dice score for maxillary sinus segmentation in panoramic X-rays from 2,511 patients.
Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction cs.CV · 2026-04-03 · unverdicted · none · ref 61
A multilevel perceptual CRF model using Swin Transformer, HPF fusion, HA adapters, and dynamic scaling attention achieves state-of-the-art monocular depth estimation on NYU Depth v2, KITTI, and MatterPort3D with reduced error and fast inference.
3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography cs.CV · 2025-02-04 · unverdicted · none · ref 26
A 3D self-supervised foundation model trained on over 360k head CT scans improves downstream disease classification on limited-label internal and external datasets versus scratch-trained and prior models.
A Heterogeneous Two-Stream Framework for Video Action Recognition with Comparative Fusion Analysis cs.CV · 2026-04-25 · unverdicted · none · ref 17
DualStreamHybrid assigns ViT-Tiny to RGB and MobileNetV2 to 20-channel flow, projects features to common space, and finds cross-attention best on UCF11 (98.12%) while weighted fusion is most consistent on UCF50 (96.86%).
Variational Latent Entropy Estimation Disentanglement: Controlled Attribute Leakage for Face Recognition cs.CV · 2026-04-13 · unverdicted · none · ref 7
VLEED uses variational latent entropy estimation to separate categorical attributes from identity in face embeddings, achieving wider privacy-utility tradeoffs and bias reduction than prior methods on IJB-C, RFW, and VGGFace2.

In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer