AgroVG is a new multi-source benchmark for agricultural visual grounding formulated as generalized set prediction, with protocols for box and mask grounding across single-target, multi-target, and target-absent queries from six object families.
hub
In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, Waikoloa, HI, USA, January 2-7
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
The paper presents AIGaitor, a privacy-preserving on-device monocular motion analysis system that performs end-to-end pose estimation and deep learning gait analysis on consumer smartphones.
GEODE uses per-sample cosine-similarity scaling in a norm loss to preserve feature geometry for universal scorer-compatible OOD detection, matching or exceeding OE performance on CIFAR benchmarks.
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
SARR modifies trigonometric rotation encodings with object symmetry orders to produce unique continuous poses, enabling standard CNNs to outperform existing methods on symmetry-aware 6D pose estimation without custom losses or 3D models.
Deep UCSL uses a contrastive EM loss on patient-control labels to isolate disease-driven subgroups in medical imaging by suppressing shared healthy variability.
DeFakerOne integrates InternVL2 and SAM2 into a single model that achieves state-of-the-art results on 39 detection and 9 localization benchmarks for unified fake image detection and localization.
Late fusion of asynchronous vehicle predictions improves trajectory success rate (TSR_0.5) by 1.22-1.69% on real-world V2V4Real data compared to single-vehicle forecasting.
MaMe is a differentiable matrix-only token merging method that doubles ViT-B throughput with a 2% accuracy drop on pre-trained models and enables faster, higher-quality image synthesis when paired with MaRe.
RealLiFe optimizes multi-plane images with HSGD to deliver real-time light field reconstruction from sparse views, claiming 100x speedup over offline methods and 2 dB PSNR gain over online ones.
COPRA introduces conditional parameter adaptation via RL to dynamically tune frozen VLMs for video anomaly detection, outperforming static methods in in-domain and cross-domain settings while generalizing to other video tasks.
A responsible computing framework substitutes real protest imagery with labeled synthetic reproductions from conditional image synthesis to enable privacy-aware analysis of collective action patterns.
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.
Foundation models excel at pattern recognition in biomedical imaging but lack causal reasoning, robustness, and safety for real-world use, so they should augment rather than replace clinical expertise according to the proposed REAL-FM assessment framework.
citing papers explorer
-
AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding
AgroVG is a new multi-source benchmark for agricultural visual grounding formulated as generalized set prediction, with protocols for box and mask grounding across single-target, multi-target, and target-absent queries from six object families.
-
AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing
The paper presents AIGaitor, a privacy-preserving on-device monocular motion analysis system that performs end-to-end pose estimation and deep learning gait analysis on consumer smartphones.
-
GEODE: Angle-Adaptive OOD Detection with Universal Scorer Compatibility
GEODE uses per-sample cosine-similarity scaling in a norm loss to preserve feature geometry for universal scorer-compatible OOD detection, matching or exceeding OE performance on CIFAR benchmarks.
-
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
-
Towards Symmetry-sensitive Pose Estimation: A Rotation Representation for Symmetric Object Classes
SARR modifies trigonometric rotation encodings with object symmetry orders to produce unique continuous poses, enabling standard CNNs to outperform existing methods on symmetry-aware 6D pose estimation without custom losses or 3D models.
-
Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls
Deep UCSL uses a contrastive EM loss on patient-control labels to isolate disease-driven subgroups in medical imaging by suppressing shared healthy variability.
-
Venus-DeFakerOne: Unified Fake Image Detection & Localization
DeFakerOne integrates InternVL2 and SAM2 into a single model that achieves state-of-the-art results on 39 detection and 9 localization benchmarks for unified fake image detection and localization.
-
Collaborative Trajectory Prediction via Late Fusion
Late fusion of asynchronous vehicle predictions improves trajectory success rate (TSR_0.5) by 1.22-1.69% on real-world V2V4Real data compared to single-vehicle forecasting.
-
MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis
MaMe is a differentiable matrix-only token merging method that doubles ViT-B throughput with a 2% accuracy drop on pre-trained models and enables faster, higher-quality image synthesis when paired with MaRe.
-
RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent
RealLiFe optimizes multi-plane images with HSGD to deliver real-time light field reconstruction from sparse views, claiming 100x speedup over offline methods and 2 dB PSNR gain over online ones.
-
COPRA: Conditional Parameter Adaptation with Reinforcement Learning for Video Anomaly Detection
COPRA introduces conditional parameter adaptation via RL to dynamically tune frozen VLMs for video anomaly detection, outperforming static methods in in-domain and cross-domain settings while generalizing to other video tasks.
-
Protecting and Preserving Protest Dynamics for Responsible Analysis
A responsible computing framework substitutes real protest imagery with labeled synthetic reproductions from conditional image synthesis to enable privacy-aware analysis of collective action patterns.
-
Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.
-
Foundation Models in Biomedical Imaging: Turning Hype into Reality
Foundation models excel at pattern recognition in biomedical imaging but lack causal reasoning, robustness, and safety for real-world use, so they should augment rather than replace clinical expertise according to the proposed REAL-FM assessment framework.
- Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization