{"total":13,"items":[{"citing_arxiv_id":"2606.09901","ref_index":87,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"On the Controllability-Fidelity Frontier in Diffusion Editing","primary_cat":"cs.GR","submitted_at":"2026-06-05T13:24:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A study deriving mathematical formulations and bounds for diffusion editing objectives while empirically comparing methods on fidelity and control metrics and discussing ethical issues.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04306","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Organizational Control Layer: Governance Infrastructure at the Execution Boundary of LLM Agent Systems","primary_cat":"cs.MA","submitted_at":"2026-06-03T00:25:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OCL is a governance layer for LLM agents that cuts unsafe executions from 88% to near-zero and raises valid success from 12% to 96% in adversarial buyer-seller negotiations across frontier LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01825","ref_index":58,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ROGLE: Robust Global-Local Alignment with Automated Region Supervision for Text-Based Person Search","primary_cat":"cs.CV","submitted_at":"2026-06-01T07:41:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ROGLE introduces automated pseudo region-sentence pairs via RSM and multi-granular learning to boost fine-grained alignment in text-based person search, plus the P-VLG benchmark with over 100k annotated regions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00305","ref_index":98,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance","primary_cat":"cs.CL","submitted_at":"2026-05-29T19:32:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TOPD improves on-policy distillation for LLM reasoning by using near-future guidance to identify divergent states, raising average accuracy from 47.8% to 52.2% on math benchmarks including AIME24 and AIME25.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09253","ref_index":79,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation","primary_cat":"cs.CL","submitted_at":"2026-05-10T01:41:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Rock Tokens in on-policy distillation persist at high loss, account for up to 18% of outputs, absorb large gradient norms, but add negligible value to reasoning performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23282","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search","primary_cat":"cs.CV","submitted_at":"2026-04-25T12:53:15+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21806","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval","primary_cat":"cs.CV","submitted_at":"2026-04-23T16:03:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TEMA is the first framework for multi-modification composed image retrieval, using entity mapping to improve accuracy on both new complex datasets and existing benchmarks while balancing efficiency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20358","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval","primary_cat":"cs.CV","submitted_at":"2026-04-22T08:59:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ConeSep tackles noisy triplet correspondences in composed image retrieval by introducing geometric fidelity quantization to locate noise, negative boundary learning for semantic opposites, and targeted unlearning via optimal transport, outperforming prior methods on FashionIQ and CIRR.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[20] Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen, and Wei Yang. Sf2t: Self-supervised fragment finetuning of video-llms for fine-grained under- standing.arXiv preprint arXiv:2504.07745, 2025. [21] Zequn Xie, Boyun Zhang, Yuxiao Lin, and Tao Jin. Delving deeper: Hierarchical visual perception for robust video-text retrieval.arXiv preprint arXiv:2601.12768, 2026. [22] Wenyuan Zhang, Xinghua Zhang, Haiyang Yu, Shuaiyi Nie, Bingli Wu, Juwei Yue, Tingwen Liu, and Yongbin Li. Expseek: Self-triggered experience seeking for web agents, 2026. [23] Xi Xiao, Chenrui Ma, Yunbei Zhang, Chen Liu, Zhuxu- anzi Wang, Yanshu Li, Lin Zhao, Guosheng Hu, Tianyang Wang, and Hao Xu. Not all directions matter: Toward struc- tured and task-aware low-rank adaptation."},{"citing_arxiv_id":"2604.19386","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval","primary_cat":"cs.CV","submitted_at":"2026-04-21T12:10:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Air-Know decouples MLLM-based external arbitration from proxy learning via knowledge internalization and dual-stream training to overcome noisy triplet correspondence in composed image retrieval.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"timodal disaster severity assessment through preference op- timization and explainable vision-language reasoning.Re- liability Engineering & System Safety, page 112674, 2026. [22] Zhenyu Yu, MOHD Y AMANI IDNA IDRIS, Pei Wang, and Rizwan Qureshi. Cotextor: Training-free modular multi- lingual text editing via layered disentanglement and depth- aware fusion. InNeurIPS, 2025. [23] Zequn Xie, Boyun Zhang, Yuxiao Lin, and Tao Jin. Delving deeper: Hierarchical visual perception for robust video-text retrieval.arXiv preprint arXiv:2601.12768, 2026. [24] Yunbo Long, Jiaquan Zhang, Xi Chen, and Alexandra Brin- trup. Topological federated clustering via gravitational po- tential fields under local differential privacy.AAAI, 40(28):"},{"citing_arxiv_id":"2604.18051","ref_index":70,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval","primary_cat":"cs.CV","submitted_at":"2026-04-20T10:19:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"INTENT mitigates cross-modal correspondence noise and modality-inherent noise in composed image retrieval via FFT-based visual invariant composition and bi-objective discriminative learning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"; Wang, X.; Lin, Y .; and Zhang, C. 2024. Synergized data efficiency and compression (sec) optimization for large language models. InEIECS, 586-591. IEEE. [69] Zeng, Y .; Yu, W.; Li, Z.; Ren, T.; Ma, Y .; Cao, J.; Chen, X.; and Yu, T. 2025. Bridging the editing gap in LLMs: Fi- neEdit for precise and targeted text modifications.EMNLP Findings, 2193-2206. [70] Cao, J.; Ma, Y .; Li, X.; Ren, Q.; and Chen, X. 2026. Task- Specific Efficiency Analysis: When Small Language Mod- els Outperform Large Language Models.arXiv preprint arXiv:2603.21389. [71] Sun, Y .; Li, Y .; Ren, Z.; Duan, G.; Peng, D.; and Hu, P. 2025. Roll: Robust noisy pseudo-label learning for multi-view clustering with noisy correspondence. InCVPR, 30732-"},{"citing_arxiv_id":"2604.18037","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval","primary_cat":"cs.CV","submitted_at":"2026-04-20T10:02:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HABIT improves robustness in composed image retrieval under noisy triplets by quantifying sample cleanliness via mutual information transition rates and applying dual-consistency progressive learning to retain good patterns and correct bad ones.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"human motion analysis.arXiv preprint arXiv:2502.18180. [42] Jia, S.; and Li, L. 2024. Adaptive masking enhances visual grounding.arXiv preprint arXiv:2410.03161. [43] Liu, L.; Chen, S.; Jia, S.; Shi, J.; Jiang, Z.; Jin, C.; Zongkai, W.; Hwang, J.-N.; and Li, L. 2024. Graph canvas for control- lable 3d scene generation.arXiv preprint arXiv:2412.00091. [44] Liu, P.; Yang, J.; Wang, L.; Wang, S.; Hao, Y .; and Bai, H. 2023. Retrieval-Based Unsupervised Noisy Label Detection on Text Data. InCIKM, 4099-4104. [45] Yang, Q.; Chen, Z.; Hu, Y .; Li, Z.; Fu, Z.; and Nie, L. 2026. STABLE: Efficient Hybrid Nearest Neighbor Search via Magnitude-Uniformity and Cardinality-Robustness.arXiv preprint arXiv:2604.01617."},{"citing_arxiv_id":"2604.17898","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval","primary_cat":"cs.CV","submitted_at":"2026-04-20T07:17:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ReTrack calibrates directional bias in composed video features using semantic disentanglement and bidirectional evidence alignment to improve retrieval performance on CVR and CIR tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"arXiv preprint arXiv:2601.16155. [18] Xie, Z.; Zhang, B.; Lin, Y .; and Jin, T. 2026. Delving deeper: Hierarchical visual perception for robust video-text retrieval.arXiv preprint arXiv:2601.12768. [19] Meng, C.; Luo, J.; Yan, Z.; Yu, Z.; Fu, R.; Gan, Z.; and Ouyang, C. 2026. Tri-Subspaces Disentanglement for Mul- timodal Sentiment Analysis.CVPR. [20] Feng, C.; Tzimiropoulos, G.; and Patras, I. 2024. CLIP- Cleaner: Cleaning Noisy Labels with CLIP. InACM MM. [21] Sun, Y .; Qin, Y .; Li, Y .; Peng, D.; Peng, X.; and Hu, P. 2024. Robust multi-view clustering with noisy correspon- dence.IEEE TKDE, 36(12): 9150-9162. [22] Yu, X.; Xu, C.; Zhang, G.; Chen, Z.; Zhang, Y .; He, Y .; Jiang, P.-T.; Zhang, J."},{"citing_arxiv_id":"2603.27253","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mitigating Hallucination on Hallucination in RAG via Ensemble Voting","primary_cat":"cs.CL","submitted_at":"2026-03-28T12:07:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"VOTE-RAG applies retrieval voting across diverse queries and response voting across independent generations to mitigate hallucination-on-hallucination in RAG, matching or exceeding complex baselines on six benchmarks with a parallelizable design.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}