hub Baseline reference

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work- shops (CVPR W), pp

Yang, Z · 2024 · arXiv 3382.2024

Baseline reference. 60% of citing Pith papers use this work as a benchmark or comparison.

14 Pith papers citing it

Baseline 60% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 3 background 2

citation-polarity summary

use dataset 3 background 2

representative citing papers

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction

cs.MM · 2026-04-15 · unverdicted · novelty 8.0

AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0 · 2 refs

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation

cs.CV · 2026-05-11 · conditional · novelty 7.0

AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.

AdaGScale: Viewpoint-Adaptive Gaussian Scaling in 3D Gaussian Splatting to Reduce Gaussian-Tile Pairs

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

AdaGScale uses viewpoint-adaptive scaling of Gaussians in 3D-GS by estimating peripheral color contributions to reduce Gaussian-tile pairs, delivering 13.8x geometric mean speedup with ~0.5 dB PSNR loss on city-scale scenes.

DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

DPC-VQA decouples a frozen MLLM perceptual prior from a lightweight residual calibration branch to adapt video quality assessment to new scenarios with under 2% trainable parameters and 20% of typical MOS labels.

No One Knows the State of the Art in Geospatial Foundation Models

cs.CV · 2026-05-12 · accept · novelty 6.0

An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.

Causal Attribution via Activation Patching

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.

ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.

RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.

Deep Light Pollution Removal in Night Cityscape Photographs

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

A deep learning method with an enhanced physical degradation model incorporating anisotropic light spread and hidden skyglow, trained via generative models and synthetic-real coupling, removes light pollution from night cityscape images more effectively than prior restoration techniques.

Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions

cs.CV · 2025-09-17 · unverdicted · novelty 5.0

STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.

SPECTRA-Net: Scalable Pipeline for Explainable Cross-domain Tensor Representations for AI-generated Images Detection

cs.CV · 2026-05-06 · unverdicted · novelty 4.0

SPECTRA-Net fuses multi-view tensor representations from vision foundation models, spectral analysis, local anomaly detection, and statistical descriptors to achieve state-of-the-art cross-domain AI-generated image detection with explainable artifact localization.

Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges

cs.CV · 2026-04-09 · unverdicted · novelty 3.0

A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.

Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions

cs.CV · 2025-07-06 · unverdicted · novelty 2.0

A literature review that categorizes deep learning approaches for visual hand gesture recognition, summarizes state-of-the-art methods across tasks, reviews datasets and metrics, and identifies challenges and future directions.

citing papers explorer

Showing 14 of 14 citing papers.

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction cs.MM · 2026-04-15 · unverdicted · none · ref 3
AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 140 · 2 links
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation cs.CV · 2026-05-11 · conditional · none · ref 4
AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.
AdaGScale: Viewpoint-Adaptive Gaussian Scaling in 3D Gaussian Splatting to Reduce Gaussian-Tile Pairs cs.CV · 2026-04-21 · unverdicted · none · ref 6
AdaGScale uses viewpoint-adaptive scaling of Gaussians in 3D-GS by estimating peripheral color contributions to reduce Gaussian-tile pairs, delivering 13.8x geometric mean speedup with ~0.5 dB PSNR loss on city-scale scenes.
DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment cs.CV · 2026-04-14 · unverdicted · none · ref 3
DPC-VQA decouples a frozen MLLM perceptual prior from a lightweight residual calibration branch to adapt video quality assessment to new scenarios with under 2% trainable parameters and 20% of typical MOS labels.
No One Knows the State of the Art in Geospatial Foundation Models cs.CV · 2026-05-12 · accept · none · ref 15
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.
Causal Attribution via Activation Patching cs.CV · 2026-03-13 · unverdicted · none · ref 20
CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance cs.CV · 2026-04-29 · unverdicted · none · ref 14
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.
RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery cs.CV · 2026-04-21 · unverdicted · none · ref 23
RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.
Deep Light Pollution Removal in Night Cityscape Photographs cs.CV · 2026-04-10 · unverdicted · none · ref 12
A deep learning method with an enhanced physical degradation model incorporating anisotropic light spread and hidden skyglow, trained via generative models and synthetic-real coupling, removes light pollution from night cityscape images more effectively than prior restoration techniques.
Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions cs.CV · 2025-09-17 · unverdicted · none · ref 18
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.
SPECTRA-Net: Scalable Pipeline for Explainable Cross-domain Tensor Representations for AI-generated Images Detection cs.CV · 2026-05-06 · unverdicted · none · ref 16
SPECTRA-Net fuses multi-view tensor representations from vision foundation models, spectral analysis, local anomaly detection, and statistical descriptors to achieve state-of-the-art cross-domain AI-generated image detection with explainable artifact localization.
Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges cs.CV · 2026-04-09 · unverdicted · none · ref 49
A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.
Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions cs.CV · 2025-07-06 · unverdicted · none · ref 51
A literature review that categorizes deep learning approaches for visual hand gesture recognition, summarizes state-of-the-art methods across tasks, reviews datasets and metrics, and identifies challenges and future directions.

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work- shops (CVPR W), pp

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer