Canonical reference

Delving deeper: Hierarchical visual perception for robust video-text retrieval

· 2026 · arXiv 2601.12768

Canonical reference. 80% of citing Pith papers cite this work as background.

8 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 8 citing papers

citation-role summary

background 5

citation-polarity summary

background 4 unclear 1

representative citing papers

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

TEMA is the first framework for multi-modification composed image retrieval, using entity mapping to improve accuracy on both new complex datasets and existing benchmarks while balancing efficiency.

ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

ConeSep tackles noisy triplet correspondences in composed image retrieval by introducing geometric fidelity quantization to locate noise, negative boundary learning for semantic opposites, and targeted unlearning via optimal transport, outperforming prior methods on FashionIQ and CIRR.

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Air-Know decouples MLLM-based external arbitration from proxy learning via knowledge internalization and dual-stream training to overcome noisy triplet correspondence in composed image retrieval.

INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

INTENT mitigates cross-modal correspondence noise and modality-inherent noise in composed image retrieval via FFT-based visual invariant composition and bi-objective discriminative learning.

HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

HABIT improves robustness in composed image retrieval under noisy triplets by quantifying sample cleanliness via mutual information transition rates and applying dual-consistency progressive learning to retain good patterns and correct bad ones.

ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

ReTrack calibrates directional bias in composed video features using semantic disentanglement and bidirectional evidence alignment to improve retrieval performance on CVR and CIR tasks.

Mitigating Hallucination on Hallucination in RAG via Ensemble Voting

cs.CL · 2026-03-28 · unverdicted · novelty 4.0

VOTE-RAG applies retrieval voting across diverse queries and response voting across independent generations to mitigate hallucination-on-hallucination in RAG, matching or exceeding complex baselines on six benchmarks with a parallelizable design.

Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search

cs.CV · 2026-04-25

citing papers explorer

Showing 8 of 8 citing papers.

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval cs.CV · 2026-04-23 · unverdicted · none · ref 1
TEMA is the first framework for multi-modification composed image retrieval, using entity mapping to improve accuracy on both new complex datasets and existing benchmarks while balancing efficiency.
ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval cs.CV · 2026-04-22 · unverdicted · none · ref 22
ConeSep tackles noisy triplet correspondences in composed image retrieval by introducing geometric fidelity quantization to locate noise, negative boundary learning for semantic opposites, and targeted unlearning via optimal transport, outperforming prior methods on FashionIQ and CIRR.
Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval cs.CV · 2026-04-21 · unverdicted · none · ref 23
Air-Know decouples MLLM-based external arbitration from proxy learning via knowledge internalization and dual-stream training to overcome noisy triplet correspondence in composed image retrieval.
INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval cs.CV · 2026-04-20 · unverdicted · none · ref 70
INTENT mitigates cross-modal correspondence noise and modality-inherent noise in composed image retrieval via FFT-based visual invariant composition and bi-objective discriminative learning.
HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval cs.CV · 2026-04-20 · unverdicted · none · ref 44
HABIT improves robustness in composed image retrieval under noisy triplets by quantifying sample cleanliness via mutual information transition rates and applying dual-consistency progressive learning to retain good patterns and correct bad ones.
ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval cs.CV · 2026-04-20 · unverdicted · none · ref 20
ReTrack calibrates directional bias in composed video features using semantic disentanglement and bidirectional evidence alignment to improve retrieval performance on CVR and CIR tasks.
Mitigating Hallucination on Hallucination in RAG via Ensemble Voting cs.CL · 2026-03-28 · unverdicted · none · ref 15
VOTE-RAG applies retrieval voting across diverse queries and response voting across independent generations to mitigate hallucination-on-hallucination in RAG, matching or exceeding complex baselines on six benchmarks with a parallelizable design.
Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search cs.CV · 2026-04-25 · unreviewed · ref 26

Delving deeper: Hierarchical visual perception for robust video-text retrieval

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer