Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9representative citing papers
DPC-VQA decouples a frozen MLLM perceptual prior from a lightweight residual calibration branch to adapt video quality assessment to new scenarios with under 2% trainable parameters and 20% of typical MOS labels.
EASE closes three residual anchors in federated multimodal unlearning using bilateral displacement, cosine-sine decomposition, and forget lock, achieving near-retrain performance on forget and retain data.
Explicit dropout reformulates stochastic dropout as deterministic loss penalties for Transformers, matching or exceeding standard performance with independent control per component.
AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.
LGTrack achieves 258.7 FPS real-time UAV tracking with 82.8% precision on UAVDT by combining dynamic layer selection, Global-Grouped Coordinate Attention, and Similarity-Guided Layer Adaptation.
A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.
LLaMA-XR fine-tunes LLaMA 3.1 with QLoRA on DenseNet-121 embeddings to generate radiology reports from chest X-rays, reporting ROUGE-L of 0.433 and METEOR of 0.336 on the IU X-ray benchmark.
citing papers explorer
-
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
-
DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment
DPC-VQA decouples a frozen MLLM perceptual prior from a lightweight residual calibration branch to adapt video quality assessment to new scenarios with under 2% trainable parameters and 20% of typical MOS labels.
-
EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure
EASE closes three residual anchors in federated multimodal unlearning using bilateral displacement, cosine-sine decomposition, and forget lock, achieving near-retrain performance on forget and retain data.
-
Explicit Dropout: Deterministic Regularization for Transformer Architectures
Explicit dropout reformulates stochastic dropout as deterministic loss penalties for Transformers, matching or exceeding standard performance with independent control per component.
-
AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning
AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.
-
Layer-Guided UAV Tracking: Enhancing Efficiency and Occlusion Robustness
LGTrack achieves 258.7 FPS real-time UAV tracking with 82.8% precision on UAVDT by combining dynamic layer selection, Global-Grouped Coordinate Attention, and Similarity-Guided Layer Adaptation.
-
Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges
A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.
-
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.
-
LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning
LLaMA-XR fine-tunes LLaMA 3.1 with QLoRA on DenseNet-121 embeddings to generate radiology reports from chest X-rays, reporting ROUGE-L of 0.433 and METEOR of 0.336 on the IU X-ray benchmark.