Canonical reference

In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022

Minesh Mathew, Viraj Bagal, Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny · 2022 · DOI 10.1109/w

Canonical reference. 79% of citing Pith papers cite this work as background.

43 Pith papers citing it

Background 79% of classified citations

open at publisher browse 43 citing papers

citation-role summary

background 11 baseline 2 method 1

citation-polarity summary

background 11 baseline 2 use method 1

representative citing papers

How Neural Losses Shape VAE Latents

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

Neural reconstruction losses in VAEs reduce latent information content and produce more isotropic latent geometries with even uncertainty distribution.

PanoPlane: Plane-Aware Panoramic Completion for Sparse-View Indoor 3D Gaussian Splatting

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

PanoPlane achieves up to 17.8% PSNR gains in sparse-view indoor novel view synthesis by using training-free plane-aware panoramic completion to supervise 3D Gaussian Splatting.

AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation

cs.CV · 2026-05-11 · conditional · novelty 7.0

AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.

Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

CEA assembles per-token low-rank residual updates via dense affinities over hyper-adapter-generated components to improve all-in-one image restoration on spatially non-uniform degradations.

Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.

Circular Phase Representation and Geometry-Aware Optimization for Ptychographic Image Reconstruction

eess.IV · 2026-04-29 · unverdicted · novelty 7.0

A deep learning framework represents phase on the unit circle with a geodesic loss for improved ptychographic amplitude and phase reconstruction.

No More Guessing: a Verifiable Gradient Inversion Attack in Federated Learning

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

VGIA certifies exact recovery of individual records from aggregated gradients in federated learning using a subspace verification test on ReLU hyperplanes.

Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.

High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models

eess.IV · 2025-05-28 · unverdicted · novelty 7.0

Diffusion models reconstruct high-resolution 3D cardiac ultrasound volumes from heavily undersampled elevation planes and outperform traditional interpolation and supervised deep learning baselines.

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

cs.CV · 2024-12-31 · accept · novelty 7.0

OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.

OSOR: One-Step Diffusion Inpainting for Effect-Aware Object Removal

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

OSOR is a one-step diffusion inpainting method using an occupancy-guided discriminator, alpha head, and semantic-anchored verification pipeline to achieve effect-aware object removal, outperforming multi-step baselines in quality at 4-30x speed.

On the QUEST for Uncertainty Quantification via Highest Density Regions

cs.LG · 2026-06-17 · unverdicted · novelty 6.0

QUEST measures uncertainty via the Lebesgue volume of highest-density regions of a distribution's support, evaluated at robustness parameter alpha, and claims to satisfy UQ axioms while outperforming variance and differential entropy on selective prediction tasks.

DiffPC: Diffusion-Based Projector Photometric Compensation

cs.MM · 2026-06-16 · unverdicted · novelty 6.0

DiffPC reformulates projector photometric compensation as a diffusion-based denoising task guided by photometry and image content to achieve better results in unseen environments.

Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning

cs.RO · 2026-06-11 · unverdicted · novelty 6.0

GRASP maps natural language to bounding-box goals via VLM for neuro-symbolic planning and reports 73.3% success in 90 real-robot trials without task-specific training.

MoRE: A Mixture-of-Experts-Based Task-Adaptive End-to-End Network for Multimodal MRI Reconstruction

eess.IV · 2026-06-01 · unverdicted · novelty 6.0

MoRE integrates a sparsely activated MoE module with unsupervised routing into a variational network for stable multimodal MRI reconstruction on fastMRI brain and knee data at 8x undersampling.

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

IPO-Mine releases a toolkit and large multimodal dataset for structured analysis of IPO filings and shows state-of-the-art models diverge from human judgments on chart quality and misleadingness.

Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

Existing visual attribution methods often fail to identify the visual evidence used by LVLMs in chest X-ray reasoning, while MedFocus using unbalanced optimal transport and targeted interventions substantially outperforms them across multiple models and settings.

LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

LiFT factorizes 3D medical volume synthesis into per-slice 2D generation and inter-slice trajectory learning, using a tri-planar drifting loss for unconditional coherence and a z-context mixer for paired translation tasks.

Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

SCOUP decouples 2D sparse code learning from 3D Gaussian optimization to deliver up to 400x training speedup and 3x better memory efficiency while matching accuracy on open-vocabulary 3D queries.

SAGE: Scalable Agentic Grounded Evaluation for Crop Disease Diagnosis

cs.MA · 2026-05-10 · unverdicted · novelty 6.0

A new 839K-image plant disease dataset paired with an agentic visual reasoning system that uses source-grounded symptoms raises diagnosis accuracy by 16.2 points on average and generalizes to unseen crops without retraining.

MAG-VLAQ: Multi-modal Aerial-Ground Query Aggregation for Cross-View Place Recognition

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

MAG-VLAQ fuses multi-modal ground and aerial data via ODE-conditioned vector-of-locally-aggregated-queries to nearly double recall@1 on aerial-ground place recognition benchmarks.

Communicating Sound Through Natural Language

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Lexical acoustic coding lets LLMs transmit audio waveforms as editable natural-language sentences that another LLM can parse and reconstruct into sound.

DINO-MVR: Multi-View Readout of Frozen DINOv3 for Annotation-Efficient Medical Segmentation

cs.CV · 2026-05-08 · conditional · novelty 6.0

Frozen DINOv3 features with multi-view MLP probes, entropy-weighted fusion, and spatial regularization achieve 0.895 Dice on Kvasir-SEG, 0.897 on ISIC 2018, and 0.908 on BraTS FLAIR, recovering 98.4% of full-data performance with only five annotated patients.

LARGO: Low-Rank Hypernetwork for Handling Missing Modalities

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

LARGO uses a low-rank hypernetwork with CP decomposition to unify 2^N-1 missing-modality models into one, ranking first in 47 of 52 configurations on BraTS and ISLES with small Dice gains over baselines.

citing papers explorer

Showing 1 of 1 citing paper after filters.

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents cs.CL · 2026-05-27 · unverdicted · none · ref 42
IPO-Mine releases a toolkit and large multimodal dataset for structured analysis of IPO filings and shows state-of-the-art models diverge from human judgments on chart quality and misleadingness.

In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer