hub Canonical reference

Delving deeper: Hierarchi- cal visual perception for robust video-text retrieval

· 2026 · arXiv 2601.12768

Canonical reference. 80% of citing Pith papers cite this work as background.

13 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 4 unclear 1

representative citing papers

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

TEMA is the first framework for multi-modification composed image retrieval, using entity mapping to improve accuracy on both new complex datasets and existing benchmarks while balancing efficiency.

ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

ConeSep tackles noisy triplet correspondences in composed image retrieval by introducing geometric fidelity quantization to locate noise, negative boundary learning for semantic opposites, and targeted unlearning via optimal transport, outperforming prior methods on FashionIQ and CIRR.

Organizational Control Layer: Governance Infrastructure at the Execution Boundary of LLM Agent Systems

cs.MA · 2026-06-03 · unverdicted · novelty 6.0

OCL is a governance layer for LLM agents that cuts unsafe executions from 88% to near-zero and raises valid success from 12% to 96% in adversarial buyer-seller negotiations across frontier LLMs.

Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance

cs.CL · 2026-05-29 · unverdicted · novelty 6.0 · 2 refs

TOPD improves on-policy distillation for LLM reasoning by using near-future guidance to identify divergent states, raising average accuracy from 47.8% to 52.2% on math benchmarks including AIME24 and AIME25.

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation

cs.CL · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

Rock Tokens in on-policy distillation persist at high loss, account for up to 18% of outputs, absorb large gradient norms, but add negligible value to reasoning performance.

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Air-Know decouples MLLM-based external arbitration from proxy learning via knowledge internalization and dual-stream training to overcome noisy triplet correspondence in composed image retrieval.

INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

INTENT mitigates cross-modal correspondence noise and modality-inherent noise in composed image retrieval via FFT-based visual invariant composition and bi-objective discriminative learning.

HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

HABIT improves robustness in composed image retrieval under noisy triplets by quantifying sample cleanliness via mutual information transition rates and applying dual-consistency progressive learning to retain good patterns and correct bad ones.

ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

ReTrack calibrates directional bias in composed video features using semantic disentanglement and bidirectional evidence alignment to improve retrieval performance on CVR and CIR tasks.

ROGLE: Robust Global-Local Alignment with Automated Region Supervision for Text-Based Person Search

cs.CV · 2026-06-01 · unverdicted · novelty 5.0 · 2 refs

ROGLE introduces automated pseudo region-sentence pairs via RSM and multi-granular learning to boost fine-grained alignment in text-based person search, plus the P-VLG benchmark with over 100k annotated regions.

Mitigating Hallucination on Hallucination in RAG via Ensemble Voting

cs.CL · 2026-03-28 · unverdicted · novelty 4.0

VOTE-RAG applies retrieval voting across diverse queries and response voting across independent generations to mitigate hallucination-on-hallucination in RAG, matching or exceeding complex baselines on six benchmarks with a parallelizable design.

On the Controllability-Fidelity Frontier in Diffusion Editing

cs.GR · 2026-06-05 · unverdicted · novelty 3.0

A study deriving mathematical formulations and bounds for diffusion editing objectives while empirically comparing methods on fidelity and control metrics and discussing ethical issues.

Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search

cs.CV · 2026-04-25

citing papers explorer

Showing 1 of 1 citing paper after filters.

Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search cs.CV · 2026-04-25 · unreviewed · ref 26

Delving deeper: Hierarchi- cal visual perception for robust video-text retrieval

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer