hub Baseline reference

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work- shops (CVPR W), pp

Bugarin, N · 2024 · arXiv 3382.2024

Baseline reference. 60% of citing Pith papers use this work as a benchmark or comparison.

28 Pith papers citing it

Baseline 60% of classified citations

read on arXiv browse 28 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 3 background 2

citation-polarity summary

use dataset 3 background 2

representative citing papers

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction

cs.MM · 2026-04-15 · unverdicted · novelty 8.0

AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.

Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions

cs.LG · 2026-05-22 · unverdicted · novelty 7.0 · 2 refs

Introduces a unified benchmark for continual anomaly detection with discrete and continuous protocols plus a training-free DINOSaur method that outperforms prior CAD approaches with zero forgetting and sub-100ms edge inference.

COSY: Compositional 3DGS Synthesis for Disentangled Human Head Editing

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

COSY uses independent per-component 3DGS generators plus context tokens to achieve disentangled semantic editing of human heads without masks or classifiers.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0 · 2 refs

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation

cs.CV · 2026-05-11 · conditional · novelty 7.0

AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.

AdaGScale: Viewpoint-Adaptive Gaussian Scaling in 3D Gaussian Splatting to Reduce Gaussian-Tile Pairs

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

AdaGScale uses viewpoint-adaptive scaling of Gaussians in 3D-GS by estimating peripheral color contributions to reduce Gaussian-tile pairs, delivering 13.8x geometric mean speedup with ~0.5 dB PSNR loss on city-scale scenes.

DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

DPC-VQA decouples a frozen MLLM perceptual prior from a lightweight residual calibration branch to adapt video quality assessment to new scenarios with under 2% trainable parameters and 20% of typical MOS labels.

Fleet: Few Shots Lead Effective AI-generated Image Detection

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

Fleet achieves dynamic few-shot adaptation for AIGI detection via avoidance routing in decoupled subspaces, raising accuracy from 20.4% to 73.1% on new generators like Doubao Seedream 4.0 with 10 shots.

RBE-Flow: Recurrent Bayesian Estimation on Feature Manifolds for Cross-Modal Registration

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

RBE-Flow recasts dense cross-modal flow estimation as closed-loop recurrent Bayesian estimation on learned feature manifolds with uncertainty-adaptive updates and achieves SOTA on three registration benchmarks.

Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

Rotary positional encodings reduce the symmetry group of functional equivalence in attention compared to sinusoidal encodings, increasing expressivity and altering linear mode connectivity patterns.

Harnessing Streaming Video in the Wild

cs.CV · 2026-06-07 · unverdicted · novelty 6.0

Presents Streaming-Train-248K dataset, Streaming Harness system, and Streaming-Eval benchmark to enable VLMs for proactive, memory-equipped streaming video understanding.

Customer-Agent: Overcoming Context Limitations in Ultra-Long Shopping Trajectories via Tool-Augmented Agents and RLVR

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

Introduces ShopTrajQA long-context benchmark and an RLVR-trained tool-augmented agent that bypasses LLM context limits by external file storage and code-based retrieval for shopping trajectories.

AFUN: Towards an Affordance Foundation Model for Functionality Understanding

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

AFUN predicts task-conditional functional masks and 3D post-contact motion curves from RGB-D and language, trained via a standardized multi-source data pipeline, and reports large gains over baselines on segmentation, contact prediction, and motion tasks.

MultiAct: Text-to-Motion Generation from Composite Text via Tailored Attention Guidance

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

MultiAct is an unpaired inference-time method that adaptively amplifies cross-attention for underrepresented components in composite text prompts to improve semantic coverage in motion generation while preserving realism.

Automated Estimation of Impact Time, Impact Location, and Shuttlecock Speed in Badminton Smashes Using Event Cameras

cs.CV · 2026-05-27 · conditional · novelty 6.0 · 2 refs

Two event cameras automatically estimate impact time, racket-face location, and shuttlecock speed in badminton smashes, validated against high-speed cameras on 124 trials with small biases and no proportional error.

No One Knows the State of the Art in Geospatial Foundation Models

cs.CV · 2026-05-12 · accept · novelty 6.0

An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.

Causal Attribution via Activation Patching

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.

SmartWalkCoach: An AI Companion for End-to-End Walking Guidance, Motivation, and Reflection

cs.HC · 2026-05-14 · conditional · novelty 5.0

A three-agent mobile system for end-to-end walking support shows motivational companion dialogue boosts affect and UX in a 12-person in-the-wild crossover study.

ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.

RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.

Deep Light Pollution Removal in Night Cityscape Photographs

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

A deep learning method with an enhanced physical degradation model incorporating anisotropic light spread and hidden skyglow, trained via generative models and synthetic-real coupling, removes light pollution from night cityscape images more effectively than prior restoration techniques.

Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions

cs.CV · 2025-09-17 · unverdicted · novelty 5.0

STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.

A Theory-grounded Hybrid Neural Network Integrating Complementary Estimation Mechanisms for Stable Visual Object TrackingA

cs.NE · 2026-06-21 · unverdicted · novelty 4.0

Hybrid ANN-CANN network for visual object tracking that operationalizes bias-variance complementarity to outperform baselines on nine benchmarks.

citing papers explorer

Showing 28 of 28 citing papers.

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction cs.MM · 2026-04-15 · unverdicted · none · ref 3
AVID is the first large-scale benchmark for audio-visual inconsistency detection, grounding, classification, and reasoning in long videos, constructed via agent-driven methods and showing that state-of-the-art models struggle while a fine-tuned baseline improves performance.
Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning cs.CV · 2026-06-08 · unverdicted · none · ref 6
FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.
Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions cs.LG · 2026-05-22 · unverdicted · none · ref 5 · 2 links
Introduces a unified benchmark for continual anomaly detection with discrete and continuous protocols plus a training-free DINOSaur method that outperforms prior CAD approaches with zero forgetting and sub-100ms edge inference.
COSY: Compositional 3DGS Synthesis for Disentangled Human Head Editing cs.CV · 2026-05-22 · unverdicted · none · ref 3
COSY uses independent per-component 3DGS generators plus context tokens to achieve disentangled semantic editing of human heads without masks or classifiers.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 140 · 2 links
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation cs.CV · 2026-05-11 · conditional · none · ref 4
AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.
AdaGScale: Viewpoint-Adaptive Gaussian Scaling in 3D Gaussian Splatting to Reduce Gaussian-Tile Pairs cs.CV · 2026-04-21 · unverdicted · none · ref 6
AdaGScale uses viewpoint-adaptive scaling of Gaussians in 3D-GS by estimating peripheral color contributions to reduce Gaussian-tile pairs, delivering 13.8x geometric mean speedup with ~0.5 dB PSNR loss on city-scale scenes.
DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment cs.CV · 2026-04-14 · unverdicted · none · ref 3
DPC-VQA decouples a frozen MLLM perceptual prior from a lightweight residual calibration branch to adapt video quality assessment to new scenarios with under 2% trainable parameters and 20% of typical MOS labels.
Fleet: Few Shots Lead Effective AI-generated Image Detection cs.CV · 2026-06-30 · unverdicted · none · ref 3
Fleet achieves dynamic few-shot adaptation for AIGI detection via avoidance routing in decoupled subspaces, raising accuracy from 20.4% to 73.1% on new generators like Doubao Seedream 4.0 with 10 shots.
RBE-Flow: Recurrent Bayesian Estimation on Feature Manifolds for Cross-Modal Registration cs.CV · 2026-06-29 · unverdicted · none · ref 47
RBE-Flow recasts dense cross-modal flow estimation as closed-loop recurrent Bayesian estimation on learned feature manifolds with uncertainty-adaptive updates and achieves SOTA on three registration benchmarks.
Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity cs.LG · 2026-06-16 · unverdicted · none · ref 12
Rotary positional encodings reduce the symmetry group of functional equivalence in attention compared to sinusoidal encodings, increasing expressivity and altering linear mode connectivity patterns.
Harnessing Streaming Video in the Wild cs.CV · 2026-06-07 · unverdicted · none · ref 35
Presents Streaming-Train-248K dataset, Streaming Harness system, and Streaming-Eval benchmark to enable VLMs for proactive, memory-equipped streaming video understanding.
Customer-Agent: Overcoming Context Limitations in Ultra-Long Shopping Trajectories via Tool-Augmented Agents and RLVR cs.CL · 2026-06-06 · unverdicted · none · ref 111
Introduces ShopTrajQA long-context benchmark and an RLVR-trained tool-augmented agent that bypasses LLM context limits by external file storage and code-based retrieval for shopping trajectories.
AFUN: Towards an Affordance Foundation Model for Functionality Understanding cs.RO · 2026-06-01 · unverdicted · none · ref 58
AFUN predicts task-conditional functional masks and 3D post-contact motion curves from RGB-D and language, trained via a standardized multi-source data pipeline, and reports large gains over baselines on segmentation, contact prediction, and motion tasks.
MultiAct: Text-to-Motion Generation from Composite Text via Tailored Attention Guidance cs.CV · 2026-05-29 · unverdicted · none · ref 3
MultiAct is an unpaired inference-time method that adaptively amplifies cross-attention for underrepresented components in composite text prompts to improve semantic coverage in motion generation while preserving realism.
Automated Estimation of Impact Time, Impact Location, and Shuttlecock Speed in Badminton Smashes Using Event Cameras cs.CV · 2026-05-27 · conditional · none · ref 7 · 2 links
Two event cameras automatically estimate impact time, racket-face location, and shuttlecock speed in badminton smashes, validated against high-speed cameras on 124 trials with small biases and no proportional error.
No One Knows the State of the Art in Geospatial Foundation Models cs.CV · 2026-05-12 · accept · none · ref 15
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.
Causal Attribution via Activation Patching cs.CV · 2026-03-13 · unverdicted · none · ref 20
CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.
SmartWalkCoach: An AI Companion for End-to-End Walking Guidance, Motivation, and Reflection cs.HC · 2026-05-14 · conditional · none · ref 87
A three-agent mobile system for end-to-end walking support shows motivational companion dialogue boosts affect and UX in a 12-person in-the-wild crossover study.
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance cs.CV · 2026-04-29 · unverdicted · none · ref 14
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.
RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery cs.CV · 2026-04-21 · unverdicted · none · ref 23
RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.
Deep Light Pollution Removal in Night Cityscape Photographs cs.CV · 2026-04-10 · unverdicted · none · ref 12
A deep learning method with an enhanced physical degradation model incorporating anisotropic light spread and hidden skyglow, trained via generative models and synthetic-real coupling, removes light pollution from night cityscape images more effectively than prior restoration techniques.
Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions cs.CV · 2025-09-17 · unverdicted · none · ref 18
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.
A Theory-grounded Hybrid Neural Network Integrating Complementary Estimation Mechanisms for Stable Visual Object TrackingA cs.NE · 2026-06-21 · unverdicted · none · ref 15
Hybrid ANN-CANN network for visual object tracking that operationalizes bias-variance complementarity to outperform baselines on nine benchmarks.
From Full Boards to Tiny Defects: Scale-Aware Tile Inference with Topology-Aware Merging for High-Resolution PCB Defect Detection cs.CV · 2026-05-23 · unverdicted · none · ref 12
Tile-based inference with topology-aware merging improves small PCB defect detection by preserving scale and resolving edge artifacts on two datasets.
SPECTRA-Net: Scalable Pipeline for Explainable Cross-domain Tensor Representations for AI-generated Images Detection cs.CV · 2026-05-06 · unverdicted · none · ref 16
SPECTRA-Net fuses multi-view tensor representations from vision foundation models, spectral analysis, local anomaly detection, and statistical descriptors to achieve state-of-the-art cross-domain AI-generated image detection with explainable artifact localization.
Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges cs.CV · 2026-04-09 · unverdicted · none · ref 49
A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.
Visual Hand Gesture Recognition with Deep Learning: A Comprehensive Review of Methods, Datasets, Challenges and Future Research Directions cs.CV · 2025-07-06 · unverdicted · none · ref 51
A literature review that categorizes deep learning approaches for visual hand gesture recognition, summarizes state-of-the-art methods across tasks, reviews datasets and metrics, and identifies challenges and future directions.

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work- shops (CVPR W), pp

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer