RankVR: Low-Rank Structure Perception and Value Recalibration for Robust Composed Image Retrieval

Jiale Huang; Qinlei Huang; Yupeng Hu; Zhiheng Fu; Zhiwei Chen; Zixu Li

arxiv: 2606.11689 · v1 · pith:HP36UCSNnew · submitted 2026-06-10 · 💻 cs.CV

RankVR: Low-Rank Structure Perception and Value Recalibration for Robust Composed Image Retrieval

Jiale Huang , Zixu Li , Zhiheng Fu , Zhiwei Chen , Qinlei Huang , Yupeng Hu This is my paper

Pith reviewed 2026-06-27 10:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords composed image retrievalnoisy tripletslow-rank structurevalue recalibrationcorrelation matrixfashioniqcirrdenoising

0 comments

The pith

The RankVR framework improves composed image retrieval by detecting noisy triplets through changes in the rank of their correlation matrix and recalibrating their training value.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Composed image retrieval requires models to understand a reference image plus a text modification to find the target image. Large datasets often contain noisy triplets where the image and text do not correspond correctly, which hurts performance. The paper shows that tracking the effective rank of the correlation matrix across samples reveals which ones break the global semantic pattern. It also tracks how much each triplet can still improve during training to decide its value. This allows the model to focus on useful hard examples while ignoring those with logical conflicts.

Core claim

RankVR addresses noisy triplet correspondence in composed image retrieval by introducing the Global Structure Consistency Perception module that uses the effective rank of the correlation matrix to identify samples disrupting macroscopic semantic symmetry, and the Adaptive Semantic Value Calibration module that integrates training potential and reliability to quantify each triplet's semantic value, resulting in more robust models on benchmarks like FashionIQ and CIRR.

What carries the argument

The Effective Rank of the Correlation Matrix in the GSCP module, which identifies samples disrupting macroscopic semantic symmetry by rank difference.

If this is right

RankVR achieves superior performance compared to existing state-of-the-art methods on FashionIQ and CIRR under noisy conditions.
The GSCP module decouples clean samples from those causing global structural inconsistency.
The ASVC module ensures effective use of hard clean samples while suppressing noise with logical conflicts.
The approach handles both global correlations and dynamic value changes during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying similar rank-based detection could help in other multimodal tasks with noisy alignments, such as video-text retrieval.
The dynamic calibration might reduce the need for extensive data cleaning in large-scale training pipelines.
If the correlation matrix is computed efficiently, this could scale to very large datasets without much overhead.

Load-bearing premise

Differences in the effective rank of the correlation matrix reliably flag samples that break semantic symmetry, and combining training potential with reliability gives an accurate value score without adding bias.

What would settle it

Running the same experiments on FashionIQ and CIRR but replacing the rank-based selection with random or scalar-based selection and finding no performance improvement would falsify the claim that the rank difference provides unique value for denoising.

Figures

Figures reproduced from arXiv: 2606.11689 by Jiale Huang, Qinlei Huang, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Zixu Li.

**Figure 4.** Figure 4: Impact of Weights on CIRR and FashionIQ. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 3.** Figure 3: Ablation Study of GSCP Module on CIRR and Fash [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 6.** Figure 6: Sensitivity analysis of hyperparameters on CIRR [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Case Study on (a) FashionIQ and (b) CIRR. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Composed Image Retrieval (CIR) constitutes a pivotal paradigm requiring models to perform joint reasoning on reference images and modification texts. However, the prevalence of Noisy Triplet Correspondence (NTC) in large-scale datasets severely constrains model performance. Existing denoising methods either target binary mismatches or rely on scalar-based point-wise estimation, neglecting rich global structural correlations among sample populations and dynamic value variations during training, thereby yielding suboptimal results. This paper identifies two critical unresolved challenges: Global Structural Inconsistency of Semantic Correlations and Hard Sample Discrimination Uncertainty. To address these, we propose RankVR, a framework designed to construct a robust CIR model via global structure consistency and dynamic value perception. Specifically, we introduce the Global Structure Consistency Perception (GSCP) module, which utilizes the Effective Rank of the Correlation Matrix to decouple clean samples from structural noise. By measuring rank difference, GSCP identifies samples disrupting macroscopic semantic symmetry. Furthermore, we develop the Adaptive Semantic Value Calibration (ASVC) module to distinguish high-value hard clean samples. By integrating training potential and reliability, it dynamically quantifies the semantic value of each triplet, ensuring effective utilization of hard samples while suppressing noise characterized by logical conflicts. Extensive experiments on the FashionIQ and CIRR benchmark datasets demonstrate that RankVR significantly outperforms existing state-of-the-art methods, validating its superior robustness in noisy environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RankVR adds GSCP (effective rank on correlation matrix to flag structural noise) and ASVC (dynamic value scoring for hard samples) to noisy composed image retrieval, but the abstract shows no numbers, ablations, or derivations so the gains cannot be checked.

read the letter

The new elements are the specific pairing of a global rank-based detector with a training-time value recalibrator for this task. It names two gaps in prior denoising work—ignoring batch-level structural correlations and treating value as static—and offers modules aimed at each. That framing is clear and stays within the CIR literature.

The soft spots sit right at the center. The abstract asserts outperformance on FashionIQ and CIRR under noisy triplet correspondence yet supplies zero quantitative results, error bars, or ablation numbers. There is also no derivation or sensitivity check showing how the correlation matrix is formed, what embeddings and batch sizes are used, or that rank difference actually tracks injected logical conflicts rather than discarding difficult clean triplets. The same holds for ASVC: integrating training potential and reliability is asserted to quantify value without new bias, but no supporting analysis appears. If those links do not hold, the reported robustness cannot be attributed to the proposed modules.

This is a standard incremental method paper aimed at people already working on robust vision-language retrieval. A reader who needs practical denoising tricks for web-scale CIR data might find the modules worth testing, but only after seeing the full experiments and controls. The thinking is coherent on its own terms and engages the relevant prior work.

Send it to peer review. The core idea is reasonable and the benchmarks are the right ones; referees can verify the missing evidence and derivations.

Referee Report

3 major / 0 minor

Summary. The paper proposes RankVR, a framework for robust Composed Image Retrieval under Noisy Triplet Correspondence (NTC). It introduces the Global Structure Consistency Perception (GSCP) module, which uses the effective rank of the correlation matrix to identify samples disrupting macroscopic semantic symmetry via rank difference, and the Adaptive Semantic Value Calibration (ASVC) module, which integrates training potential and reliability to dynamically quantify triplet semantic value and distinguish high-value hard clean samples from noise. The central claim is that RankVR significantly outperforms existing SOTA methods on the FashionIQ and CIRR benchmarks, demonstrating superior robustness in noisy environments.

Significance. If the GSCP and ASVC modules are shown to reliably separate structural noise from clean hard samples without introducing new biases, the work could provide a useful global-structure-aware alternative to point-wise denoising methods in CIR. The approach leverages population-level correlations and dynamic training signals, which addresses stated limitations of prior binary-mismatch or scalar-estimation techniques.

major comments (3)

[Abstract] Abstract: The central claim of significant outperformance on FashionIQ and CIRR is stated without any quantitative metrics, tables, error bars, ablation results, or statistical details, which is load-bearing for assessing whether the reported robustness gains can be attributed to the proposed modules rather than implementation details.
[Abstract] Abstract (GSCP description): No derivation or construction details are supplied for the correlation matrix (e.g., which embeddings, batch size, centering), nor any controlled experiment showing that rank difference correlates with injected NTC levels or cleanly isolates noise without discarding hard clean samples; this directly underpins the claim that GSCP decouples clean samples from structural noise.
[Abstract] Abstract (ASVC description): The integration of training potential and reliability is asserted to quantify semantic value without new biases, but no sensitivity analysis, formulation, or validation against the weakest assumption (rank-difference heuristic separating clean vs. noisy triplets) is referenced, which is load-bearing for the hard-sample discrimination claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment regarding the abstract below and outline the planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of significant outperformance on FashionIQ and CIRR is stated without any quantitative metrics, tables, error bars, ablation results, or statistical details, which is load-bearing for assessing whether the reported robustness gains can be attributed to the proposed modules rather than implementation details.

Authors: We agree that the abstract would be strengthened by including quantitative support. In the revised version, we will incorporate specific metrics such as the percentage improvements in recall@K on FashionIQ and CIRR under noisy conditions, with references to the primary results tables and error bars from our experiments. revision: yes
Referee: [Abstract] Abstract (GSCP description): No derivation or construction details are supplied for the correlation matrix (e.g., which embeddings, batch size, centering), nor any controlled experiment showing that rank difference correlates with injected NTC levels or cleanly isolates noise without discarding hard clean samples; this directly underpins the claim that GSCP decouples clean samples from structural noise.

Authors: The full derivation and construction of the correlation matrix (using reference image and modification text embeddings within each batch, with centering applied) are detailed in Section 3.2. We will add a concise description of this construction to the abstract. The controlled experiments demonstrating the correlation of rank difference with NTC injection levels and isolation of noise (without discarding hard clean samples) appear in Section 4.3; we will reference these results in the revised abstract. revision: partial
Referee: [Abstract] Abstract (ASVC description): The integration of training potential and reliability is asserted to quantify semantic value without new biases, but no sensitivity analysis, formulation, or validation against the weakest assumption (rank-difference heuristic separating clean vs. noisy triplets) is referenced, which is load-bearing for the hard-sample discrimination claim.

Authors: We agree the abstract description of ASVC is high-level. The formulation combining training potential and reliability, along with sensitivity analysis and validation against the rank-difference heuristic, is presented in Section 3.3 and the supplementary material. We will revise the abstract to briefly describe the formulation and reference the validation experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: modules reference external rank concepts without self-referential reduction

full rationale

The abstract and described framework introduce GSCP (using effective rank of correlation matrix to measure rank difference for identifying structural noise) and ASVC (integrating training potential and reliability for semantic value). No equations, derivations, or self-citations are shown that would reduce any claimed prediction or uniqueness result to a fitted input or prior author ansatz by construction. The central robustness claims are positioned as empirically validated on FashionIQ/CIRR rather than tautological. This matches the default non-circular outcome when no load-bearing step exhibits the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; modules are described conceptually without mathematical definitions or assumptions listed.

pith-pipeline@v0.9.1-grok · 5788 in / 1010 out tokens · 30617 ms · 2026-06-27T10:33:07.753229+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

116 extracted references · 1 canonical work pages

[1]

Xinxing Xu, Yong Liu, Salman Khan, Fahad Khan, Wangmeng Zuo, Rick Siow Mong Goh, Chun-Mei Feng, et al . 2024. Sentence-level Prompts Bene- fit Composed Image Retrieval. InICLR

2024
[2]

Haokun Wen, Xuemeng Song, Jianhua Yin, Jianlong Wu, Weili Guan, and Liqiang Nie. 2023. Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval.IEEE TPAMI(2023)

2023
[3]

Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan
[4]

InAAAI, Vol

Encoder: Entity mining and modification relation binding for composed image retrieval. InAAAI, Vol. 39. 5101–5109
[5]

Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhiheng Fu, Xuemeng Song, and Yupeng Hu. 2026. MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network.arXiv preprint arXiv:2603.29291(2026)

arXiv 2026
[6]

Peng Hu, Zhenyu Huang, Dezhong Peng, Xu Wang, and Xi Peng. 2023. Cross- modal retrieval with partially mismatched pairs.IEEE TPAMI45, 8 (2023), 9595– 9610

2023
[7]

Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, and Peng Hu
[8]

Noisy-correspondence learning for text-to-image person re-identification. InCVPR. 27197–27206
[9]

Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. 2026. REFINE: Composed Video Retrieval via Shared and Differential Semantics Enhancement.ACM ToMM(2026)

2026
[10]

Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. 2026. ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval. InAAAI, Vol. 40. 23373– 23381

2026
[11]

Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, and Liqiang Nie
[12]

FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval.https://arxiv.org/abs/2503.21309(2025)

arXiv 2025
[13]

Jiale Huang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Chunxiao Wang, and Yupeng Hu. 2026. IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval.arXiv preprint arXiv:2606.08144(2026)

Pith/arXiv arXiv 2026
[14]

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Guozhi Qiu, Weili Guan, and Liqiang Nie. 2026. EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge.arXiv preprint arXiv:2605.24500(2026)

Pith/arXiv arXiv 2026
[15]

Yuxuan Jiang and Francis Ferraro. 2026. Beyond math: Stories as a testbed for memorization-constrained reasoning in llms. InEACL. 5590–5607

2026
[16]

Jinhe Bi, Minglai Yang, Xingcheng Zhou, Wenke Huang, Sikuan Yan, Yujun Wang, Zixuan Cao, Michael Färber, Xun Xiao, Volker Tresp, et al. 2026. EchoRL: Reinforcement Learning via Rollout Echoing.arXiv preprint arXiv:2605.31228 (2026)

Pith/arXiv arXiv 2026
[17]

Jinghan Cao, Yu Ma, Xinjin Li, Qingyang Ren, and Xiangyun Chen. 2026. Task- Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models.arXiv preprint arXiv:2603.21389(2026)

arXiv 2026
[18]

Zequn Xie, Chuxin Wang, Yeqiang Wang, Sihang Cai, Shulei Wang, and Tao Jin
[19]

Chat-driven text generation and interaction for person retrieval. InEMNLP. 5259–5270
[20]

Panqi Yang, Haodong Jing, Nanning Zheng, and Yongqiang Ma. 2026. UniBVR: Balancing visual and reasoning abilities in unified 3D scene understanding.Neu- rocomputing671 (2026), 132599

2026
[21]

Yun Song, Wenjia Zheng, Tiedan Chen, Ziyu Wang, Jiazhao Shi, and Yisong Chen
[22]

Deep neural network architectures for electrocardiogram classification: A comprehensive evaluation.arXiv preprint arXiv:2602.17701(2026)

arXiv 2026
[23]

Yang Shi, Yifeng Xie, Minzhe Guo, Liangsi Lu, Mingxuan Huang, Jingchao Wang, Zhihong Zhu, Boyan Xu, and Zhiqi Huang. 2026. MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models.arXiv preprint arXiv:2601.03331 (2026)

Pith/arXiv arXiv 2026
[24]

Yuan Sun, Xu Wang, Dezhong Peng, Zhenwen Ren, and Xiaobo Shen. 2023. Hierarchical hashing learning for image set classification.IEEE TIP32 (2023), 1732–1744

2023
[25]

Zong Ke, Yuqing Cao, Zhenrui Chen, Yuchen Yin, Shouchao He, and Yu Cheng
[26]

Finance Research Letters(2025), 107890

Early warning of cryptocurrency reversal risks via multi-source data. Finance Research Letters(2025), 107890

2025
[27]

Yang Tian, Fan Liu, Jingyuan Zhang, Yupeng Hu, Liqiang Nie, et al. 2025. CoRe- MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG. InACL. 32967–32982

2025
[28]

Qianyun Yang, Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, and Liqiang Nie. 2026. STABLE: Efficient Hybrid Nearest Neighbor Search via Magnitude- Uniformity and Cardinality-Robustness.IEEE TKDE(2026)

2026
[29]

Qianyun Yang, Peizhuo Lv, Yingjiu Li, Shengzhi Zhang, Yuxuan Chen, Zhiwei Chen, Zixu Li, and Yupeng Hu. 2026. ERASE: Bypassing Collaborative Detection of AI Counterfeit Via Comprehensive Artifacts Elimination.IEEE TDSC(March 2026), 1–18. doi:10.1109/TDSC.2026.3677794

work page doi:10.1109/tdsc.2026.3677794 2026
[30]

Senmao Tian, Xiang Wei, and Shunli Zhang. 2026. Neutral-Reference Prompting for Vision-Language Models.arXiv(2026), 2605.15615

Pith/arXiv arXiv 2026
[31]

Xingfeng Li, Yinghui Sun, Quansen Sun, Zhenwen Ren, and Yuan Sun. 2023. Cross-view graph matching guided anchor alignment for incomplete multi-view clustering.Information Fusion100 (2023), 101941

2023
[32]

Zixu Li, Yupeng Hu, Zhiwei Chen, Haokun Wen, Xuemeng Song, and Liqiang Nie. 2026. COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations.IEEE TIP(2026)

2026
[33]

Senmao Tian, Xiang Wei, and Shunli Zhang. 2026. Sampling Control for Imbal- anced Calibration in Semi-Supervised Learning. InAAAI, Vol. 40. 25914–25922

2026
[34]

Liangsi Lu, Xuhang Chen, Minzhe Guo, Shichu Li, Jingchao Wang, and Yang Shi. 2026. Chordedit: One-step low-energy transport for image editing. InCVPR. 14398–14407

2026
[35]

Zichao Li and Zong Ke. 2025. Domain meets typology: Predicting verb-final order from universal dependencies for financial and blockchain nlp. InSIGTYP. 156–164

2025
[36]

Zichao Li and Zong Ke. 2025. Cross-modal augmentation for low-resource language understanding and generation. InMAGMaR. 90–99

2025
[37]

Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. 2024. Mace: Mass concept erasure in diffusion models. InCVPR. 6430–6440

2024
[38]

Jinlai Zhang, Xiaolong Song, Yucheng Li, Diqing Liang, Zhiyong Zhang, and Jinhu Cai. 2026. Adaptive dual cross-attention network for multispectral object detection in autonomous driving.ESW A(2026), 132012

2026
[39]

Zhiheng Fu, Zixu Li, Zhiwei Chen, Fangxu Liu, Yupeng Hu, Weili Guan, and Liqiang Nie. 2026. EgoAction: Egocentric Action Composition with Reliability- Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026.arXiv preprint arXiv:2605.24496(2026)

Pith/arXiv arXiv 2026
[40]

Panqi Yang, Haodong Jing, Nanning Zheng, and Yongqiang Ma. 2026. UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space. In AAAI, Vol. 40. 11640–11648

2026
[41]

Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, and Volker Tresp. 2023. SPOT! Revisiting Video-Language Models for Event Understanding.arXiv preprint arXiv:2311.12919(2023)

arXiv 2023
[42]

Xinlei Yu, Chengming Xu, Zhangquan Chen, Yudong Zhang, Shilin Lu, Cheng Yang, Jiangning Zhang, Shuicheng Yan, and Xiaobin Hu. 2025. Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling.arXiv preprint arXiv:2508.03404(2025)

arXiv 2025
[43]

Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Weili Guan, and Liqiang Nie. 2026. R3: Composed Video Retrieval via Reasoning-Guided Recalling and Re-ranking. arXiv preprint arXiv:2606.01113(2026)

Pith/arXiv arXiv 2026
[44]

Yuxuan Jiang, Dawei Li, and Frank Ferraro. 2025. Drp: Distilled reasoning pruning with skill-aware step decomposition for efficient large reasoning models.arXiv preprint arXiv:2505.13975(2025)

Pith/arXiv arXiv 2025
[45]

Yuan Sun, Yang Qin, Yongxiang Li, Dezhong Peng, Xi Peng, and Peng Hu. 2024. Robust multi-view clustering with noisy correspondence.IEEE TKDE36, 12 (2024), 9150–9162

2024
[46]

Zixu Li, Yupeng Hu, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Weili Guan, and Liqiang Nie. 2026. TempRet: Temporal Enhancement and Two-Stage Reranking for CVPR 2026 EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge.arXiv preprint arXiv:2605.24470(2026)

Pith/arXiv arXiv 2026
[47]

Jinhe Bi, Danqi Yan, Yifan Wang, Wenke Huang, Haokun Chen, Guancheng Wan, Mang Ye, Xun Xiao, Hinrich Schuetze, Volker Tresp, and Yunpu Ma. 2026. The Geometry of Reasoning: Self-Evaluation via Layerwise Trajectory Evolution. In ICML. https://openreview.net/forum?id=WQyrwQwzmK

2026
[48]

Yuxuan Jiang and Francis Ferraro. 2026. SCRIBE: Structured Mid-Level Supervi- sion for Tool-Using Language Models.arXiv preprint arXiv:2601.03555(2026)

Pith/arXiv arXiv 2026
[49]

Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, et al. 2025. Se-agent: Self- evolution trajectory optimization in multi-step reasoning with llm-based agents. arXiv preprint arXiv:2508.02085(2025)

arXiv 2025
[50]

Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, and Xi Peng. 2021. Learning with noisy correspondence for cross-modal matching. NeurIPS34 (2021), 29406–29419

2021
[51]

Jingjie Zhang, Lingli Tang, Jiachen Li, Jinyu Xu, Yanchun Ma, and Qing Xie
[52]

InACM MM Asia

Robust Dual Embedding Contrastive Learning for Text-to-Image Person Re-identification with Noisy Correspondence. InACM MM Asia. 1–8
[53]

Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. 2026. INTENT: Invariance and Discrimination-aware Noise Mitiga- tion for Robust Composed Image Retrieval. InAAAI, Vol. 40. 20463–20471

2026
[54]

Shuxian Li, Changhao He, Xiting Liu, Joey Tianyi Zhou, Xi Peng, and Peng Hu. 2025. Learning with Noisy Triplet Correspondence for Composed Image Retrieval. InCVPR. 19628–19637

2025
[55]

Siyuan Li, Youyuan Zhang, Fangming Liu, and Jing Li. 2026. Modality-Decoupled Online Recursive Editing.arXiv preprint arXiv:2605.20273(2026)

Pith/arXiv arXiv 2026
[56]

Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays
[57]

Composing Text and Image for Image Retrieval - an Empirical Odyssey. In CVPR. IEEE, 6439–6448
[58]

Yanbei Chen, Shaogang Gong, and Loris Bazzani. 2020. Image Search With Text Feedback by Visiolinguistic Attention Learning. InCVPR. IEEE, 2998–3008

2020
[59]

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie
[60]

Retrieval

OFFSET: Segmentation-based Focus Shift Revision for Composed Image ICMR ’26, June 16–19, 2026, Amsterdam, Netherlands Huang et al. Retrieval. InACM MM. 6113–6122

2026
[61]

Yida Zhao, Yuqing Song, and Qin Jin. 2022. Progressive learning for image retrieval with hybrid-modality queries. InACM SIGIR. 1012–1021

2022
[62]

Matan Levy, Rami Ben-Ari, Nir Darshan, and Dani Lischinski. 2024. Data roaming and quality assessment for composed image retrieval. InAAAI, Vol. 38. 2991– 2999

2024
[63]

Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, and Yupeng Hu. 2026. Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341 (2026)

arXiv 2026
[64]

Qinlei Huang, Zhiwei Chen, Zixu Li, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. 2025. Median: Adaptive intermediate-grained aggregation network for composed image retrieval. InICASSP. IEEE, 1–5

2025
[65]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InICML. PMLR, 8748–8763

2021
[66]

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InICML. PMLR, 12888–12900

2022
[67]

Zhiheng Fu, Zixu Li, Zhiwei Chen, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. 2025. Pair: Complementarity-guided disentanglement for composed image retrieval. InICASSP. IEEE, 1–5

2025
[68]

Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. 2026. HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval. InAAAI, Vol. 40. 6762–6770

2026
[69]

Yiyang Chen, Zhedong Zheng, Wei Ji, Leigang Qu, and Tat-Seng Chua. 2024. Composed image retrieval with text feedback via multi-grained uncertainty regularization.ICLR(2024)

2024
[70]

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan
[71]

InACM MM

HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Com- posed Video Retrieval. InACM MM. 6143–6152
[72]

Yuxuan Jiang, Runchao Li, Shubhashis Roy Dipta, Dawei Li, and Zhao Yang. 2026. Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation.arXiv preprint arXiv:2605.09253(2026)

Pith/arXiv arXiv 2026
[73]

Jinlai Zhang, Mingchao Xiang, Yongheng Hu, Wei Hao, Linlong Lei, and Kefu Yi. 2026. Multivariate feature learning and associative spatial information en- hancement for snow object detection in autonomous driving.EAAI175 (2026), 114672

2026
[74]

Kailin Jiang, Hongbo Jiang, Ning Jiang, Zhi Gao, Jinhe Bi, Yuchen Ren, Bin Li, Yuntao Du, Lei Liu, and Qing Li. 2025. KORE: Enhancing Knowledge Injec- tion for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints.arXiv preprint arXiv:2510.19316(2025)

Pith/arXiv arXiv 2025
[75]

Guancheng Wan, Xiaoran Shang, Yuxin Wu, Guibin Zhang, Jinhe Bi, et al. 2025. HYPERION: Fine-Grained Hypersphere Alignment for Robust Federated Graph Learning. InNeurIPS

2025
[76]

Ningning Xu, Yuxuan Jiang, Shubhashis Roy Dipta, and Hengyuan Zhang. 2025. Learning how to use tools, not just when: Pattern-aware tool-integrated reason- ing.arXiv preprint arXiv:2509.23292(2025)

arXiv 2025
[77]

Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, and Peng Hu. 2023. Cross-modal active complementary learning with self-refining correspondence. NeurIPS36 (2023), 24829–24840

2023
[78]

Yiming Zeng, Wanhao Yu, Zexin Li, Tao Ren, Yu Ma, Jinghan Cao, Xiyan Chen, and Tingting Yu. 2025. Bridging the editing gap in LLMs: FineEdit for precise and targeted text modifications.EMNLP Findings(2025), 2193–2206

2025
[79]

Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. 2024. Robust watermarking using generative priors against image editing: From bench- marking to advances.arXiv preprint arXiv:2410.18775(2024)

arXiv 2024
[80]

Lai Wei, Zhiquan Tan, Chenghai Li, Jindong Wang, and Weiran Huang. 2024. Diff-erank: A novel rank-based metric for evaluating large language models. NeurIPS37 (2024), 39501–39521

2024

Showing first 80 references.

[1] [1]

Xinxing Xu, Yong Liu, Salman Khan, Fahad Khan, Wangmeng Zuo, Rick Siow Mong Goh, Chun-Mei Feng, et al . 2024. Sentence-level Prompts Bene- fit Composed Image Retrieval. InICLR

2024

[2] [2]

Haokun Wen, Xuemeng Song, Jianhua Yin, Jianlong Wu, Weili Guan, and Liqiang Nie. 2023. Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval.IEEE TPAMI(2023)

2023

[3] [3]

Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan

[4] [4]

InAAAI, Vol

Encoder: Entity mining and modification relation binding for composed image retrieval. InAAAI, Vol. 39. 5101–5109

[5] [5]

Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhiheng Fu, Xuemeng Song, and Yupeng Hu. 2026. MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network.arXiv preprint arXiv:2603.29291(2026)

arXiv 2026

[6] [6]

Peng Hu, Zhenyu Huang, Dezhong Peng, Xu Wang, and Xi Peng. 2023. Cross- modal retrieval with partially mismatched pairs.IEEE TPAMI45, 8 (2023), 9595– 9610

2023

[7] [7]

Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, and Peng Hu

[8] [8]

Noisy-correspondence learning for text-to-image person re-identification. InCVPR. 27197–27206

[9] [9]

Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. 2026. REFINE: Composed Video Retrieval via Shared and Differential Semantics Enhancement.ACM ToMM(2026)

2026

[10] [10]

Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. 2026. ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval. InAAAI, Vol. 40. 23373– 23381

2026

[11] [11]

Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, and Liqiang Nie

[12] [12]

FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval.https://arxiv.org/abs/2503.21309(2025)

arXiv 2025

[13] [13]

Jiale Huang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Chunxiao Wang, and Yupeng Hu. 2026. IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval.arXiv preprint arXiv:2606.08144(2026)

Pith/arXiv arXiv 2026

[14] [14]

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Guozhi Qiu, Weili Guan, and Liqiang Nie. 2026. EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge.arXiv preprint arXiv:2605.24500(2026)

Pith/arXiv arXiv 2026

[15] [15]

Yuxuan Jiang and Francis Ferraro. 2026. Beyond math: Stories as a testbed for memorization-constrained reasoning in llms. InEACL. 5590–5607

2026

[16] [16]

Jinhe Bi, Minglai Yang, Xingcheng Zhou, Wenke Huang, Sikuan Yan, Yujun Wang, Zixuan Cao, Michael Färber, Xun Xiao, Volker Tresp, et al. 2026. EchoRL: Reinforcement Learning via Rollout Echoing.arXiv preprint arXiv:2605.31228 (2026)

Pith/arXiv arXiv 2026

[17] [17]

Jinghan Cao, Yu Ma, Xinjin Li, Qingyang Ren, and Xiangyun Chen. 2026. Task- Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models.arXiv preprint arXiv:2603.21389(2026)

arXiv 2026

[18] [18]

Zequn Xie, Chuxin Wang, Yeqiang Wang, Sihang Cai, Shulei Wang, and Tao Jin

[19] [19]

Chat-driven text generation and interaction for person retrieval. InEMNLP. 5259–5270

[20] [20]

Panqi Yang, Haodong Jing, Nanning Zheng, and Yongqiang Ma. 2026. UniBVR: Balancing visual and reasoning abilities in unified 3D scene understanding.Neu- rocomputing671 (2026), 132599

2026

[21] [21]

Yun Song, Wenjia Zheng, Tiedan Chen, Ziyu Wang, Jiazhao Shi, and Yisong Chen

[22] [22]

Deep neural network architectures for electrocardiogram classification: A comprehensive evaluation.arXiv preprint arXiv:2602.17701(2026)

arXiv 2026

[23] [23]

Yang Shi, Yifeng Xie, Minzhe Guo, Liangsi Lu, Mingxuan Huang, Jingchao Wang, Zhihong Zhu, Boyan Xu, and Zhiqi Huang. 2026. MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models.arXiv preprint arXiv:2601.03331 (2026)

Pith/arXiv arXiv 2026

[24] [24]

Yuan Sun, Xu Wang, Dezhong Peng, Zhenwen Ren, and Xiaobo Shen. 2023. Hierarchical hashing learning for image set classification.IEEE TIP32 (2023), 1732–1744

2023

[25] [25]

Zong Ke, Yuqing Cao, Zhenrui Chen, Yuchen Yin, Shouchao He, and Yu Cheng

[26] [26]

Finance Research Letters(2025), 107890

Early warning of cryptocurrency reversal risks via multi-source data. Finance Research Letters(2025), 107890

2025

[27] [27]

Yang Tian, Fan Liu, Jingyuan Zhang, Yupeng Hu, Liqiang Nie, et al. 2025. CoRe- MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG. InACL. 32967–32982

2025

[28] [28]

Qianyun Yang, Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, and Liqiang Nie. 2026. STABLE: Efficient Hybrid Nearest Neighbor Search via Magnitude- Uniformity and Cardinality-Robustness.IEEE TKDE(2026)

2026

[29] [29]

Qianyun Yang, Peizhuo Lv, Yingjiu Li, Shengzhi Zhang, Yuxuan Chen, Zhiwei Chen, Zixu Li, and Yupeng Hu. 2026. ERASE: Bypassing Collaborative Detection of AI Counterfeit Via Comprehensive Artifacts Elimination.IEEE TDSC(March 2026), 1–18. doi:10.1109/TDSC.2026.3677794

work page doi:10.1109/tdsc.2026.3677794 2026

[30] [30]

Senmao Tian, Xiang Wei, and Shunli Zhang. 2026. Neutral-Reference Prompting for Vision-Language Models.arXiv(2026), 2605.15615

Pith/arXiv arXiv 2026

[31] [31]

Xingfeng Li, Yinghui Sun, Quansen Sun, Zhenwen Ren, and Yuan Sun. 2023. Cross-view graph matching guided anchor alignment for incomplete multi-view clustering.Information Fusion100 (2023), 101941

2023

[32] [32]

Zixu Li, Yupeng Hu, Zhiwei Chen, Haokun Wen, Xuemeng Song, and Liqiang Nie. 2026. COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations.IEEE TIP(2026)

2026

[33] [33]

Senmao Tian, Xiang Wei, and Shunli Zhang. 2026. Sampling Control for Imbal- anced Calibration in Semi-Supervised Learning. InAAAI, Vol. 40. 25914–25922

2026

[34] [34]

Liangsi Lu, Xuhang Chen, Minzhe Guo, Shichu Li, Jingchao Wang, and Yang Shi. 2026. Chordedit: One-step low-energy transport for image editing. InCVPR. 14398–14407

2026

[35] [35]

Zichao Li and Zong Ke. 2025. Domain meets typology: Predicting verb-final order from universal dependencies for financial and blockchain nlp. InSIGTYP. 156–164

2025

[36] [36]

Zichao Li and Zong Ke. 2025. Cross-modal augmentation for low-resource language understanding and generation. InMAGMaR. 90–99

2025

[37] [37]

Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. 2024. Mace: Mass concept erasure in diffusion models. InCVPR. 6430–6440

2024

[38] [38]

Jinlai Zhang, Xiaolong Song, Yucheng Li, Diqing Liang, Zhiyong Zhang, and Jinhu Cai. 2026. Adaptive dual cross-attention network for multispectral object detection in autonomous driving.ESW A(2026), 132012

2026

[39] [39]

Zhiheng Fu, Zixu Li, Zhiwei Chen, Fangxu Liu, Yupeng Hu, Weili Guan, and Liqiang Nie. 2026. EgoAction: Egocentric Action Composition with Reliability- Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026.arXiv preprint arXiv:2605.24496(2026)

Pith/arXiv arXiv 2026

[40] [40]

Panqi Yang, Haodong Jing, Nanning Zheng, and Yongqiang Ma. 2026. UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space. In AAAI, Vol. 40. 11640–11648

2026

[41] [41]

Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, and Volker Tresp. 2023. SPOT! Revisiting Video-Language Models for Event Understanding.arXiv preprint arXiv:2311.12919(2023)

arXiv 2023

[42] [42]

Xinlei Yu, Chengming Xu, Zhangquan Chen, Yudong Zhang, Shilin Lu, Cheng Yang, Jiangning Zhang, Shuicheng Yan, and Xiaobin Hu. 2025. Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling.arXiv preprint arXiv:2508.03404(2025)

arXiv 2025

[43] [43]

Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Weili Guan, and Liqiang Nie. 2026. R3: Composed Video Retrieval via Reasoning-Guided Recalling and Re-ranking. arXiv preprint arXiv:2606.01113(2026)

Pith/arXiv arXiv 2026

[44] [44]

Yuxuan Jiang, Dawei Li, and Frank Ferraro. 2025. Drp: Distilled reasoning pruning with skill-aware step decomposition for efficient large reasoning models.arXiv preprint arXiv:2505.13975(2025)

Pith/arXiv arXiv 2025

[45] [45]

Yuan Sun, Yang Qin, Yongxiang Li, Dezhong Peng, Xi Peng, and Peng Hu. 2024. Robust multi-view clustering with noisy correspondence.IEEE TKDE36, 12 (2024), 9150–9162

2024

[46] [46]

Zixu Li, Yupeng Hu, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Weili Guan, and Liqiang Nie. 2026. TempRet: Temporal Enhancement and Two-Stage Reranking for CVPR 2026 EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge.arXiv preprint arXiv:2605.24470(2026)

Pith/arXiv arXiv 2026

[47] [47]

Jinhe Bi, Danqi Yan, Yifan Wang, Wenke Huang, Haokun Chen, Guancheng Wan, Mang Ye, Xun Xiao, Hinrich Schuetze, Volker Tresp, and Yunpu Ma. 2026. The Geometry of Reasoning: Self-Evaluation via Layerwise Trajectory Evolution. In ICML. https://openreview.net/forum?id=WQyrwQwzmK

2026

[48] [48]

Yuxuan Jiang and Francis Ferraro. 2026. SCRIBE: Structured Mid-Level Supervi- sion for Tool-Using Language Models.arXiv preprint arXiv:2601.03555(2026)

Pith/arXiv arXiv 2026

[49] [49]

Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, et al. 2025. Se-agent: Self- evolution trajectory optimization in multi-step reasoning with llm-based agents. arXiv preprint arXiv:2508.02085(2025)

arXiv 2025

[50] [50]

Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, and Xi Peng. 2021. Learning with noisy correspondence for cross-modal matching. NeurIPS34 (2021), 29406–29419

2021

[51] [51]

Jingjie Zhang, Lingli Tang, Jiachen Li, Jinyu Xu, Yanchun Ma, and Qing Xie

[52] [52]

InACM MM Asia

Robust Dual Embedding Contrastive Learning for Text-to-Image Person Re-identification with Noisy Correspondence. InACM MM Asia. 1–8

[53] [53]

Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. 2026. INTENT: Invariance and Discrimination-aware Noise Mitiga- tion for Robust Composed Image Retrieval. InAAAI, Vol. 40. 20463–20471

2026

[54] [54]

Shuxian Li, Changhao He, Xiting Liu, Joey Tianyi Zhou, Xi Peng, and Peng Hu. 2025. Learning with Noisy Triplet Correspondence for Composed Image Retrieval. InCVPR. 19628–19637

2025

[55] [55]

Siyuan Li, Youyuan Zhang, Fangming Liu, and Jing Li. 2026. Modality-Decoupled Online Recursive Editing.arXiv preprint arXiv:2605.20273(2026)

Pith/arXiv arXiv 2026

[56] [56]

Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays

[57] [57]

Composing Text and Image for Image Retrieval - an Empirical Odyssey. In CVPR. IEEE, 6439–6448

[58] [58]

Yanbei Chen, Shaogang Gong, and Loris Bazzani. 2020. Image Search With Text Feedback by Visiolinguistic Attention Learning. InCVPR. IEEE, 2998–3008

2020

[59] [59]

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie

[60] [60]

Retrieval

OFFSET: Segmentation-based Focus Shift Revision for Composed Image ICMR ’26, June 16–19, 2026, Amsterdam, Netherlands Huang et al. Retrieval. InACM MM. 6113–6122

2026

[61] [61]

Yida Zhao, Yuqing Song, and Qin Jin. 2022. Progressive learning for image retrieval with hybrid-modality queries. InACM SIGIR. 1012–1021

2022

[62] [62]

Matan Levy, Rami Ben-Ari, Nir Darshan, and Dani Lischinski. 2024. Data roaming and quality assessment for composed image retrieval. InAAAI, Vol. 38. 2991– 2999

2024

[63] [63]

Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, and Yupeng Hu. 2026. Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341 (2026)

arXiv 2026

[64] [64]

Qinlei Huang, Zhiwei Chen, Zixu Li, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. 2025. Median: Adaptive intermediate-grained aggregation network for composed image retrieval. InICASSP. IEEE, 1–5

2025

[65] [65]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InICML. PMLR, 8748–8763

2021

[66] [66]

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InICML. PMLR, 12888–12900

2022

[67] [67]

Zhiheng Fu, Zixu Li, Zhiwei Chen, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. 2025. Pair: Complementarity-guided disentanglement for composed image retrieval. InICASSP. IEEE, 1–5

2025

[68] [68]

Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. 2026. HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval. InAAAI, Vol. 40. 6762–6770

2026

[69] [69]

Yiyang Chen, Zhedong Zheng, Wei Ji, Leigang Qu, and Tat-Seng Chua. 2024. Composed image retrieval with text feedback via multi-grained uncertainty regularization.ICLR(2024)

2024

[70] [70]

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan

[71] [71]

InACM MM

HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Com- posed Video Retrieval. InACM MM. 6143–6152

[72] [72]

Yuxuan Jiang, Runchao Li, Shubhashis Roy Dipta, Dawei Li, and Zhao Yang. 2026. Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation.arXiv preprint arXiv:2605.09253(2026)

Pith/arXiv arXiv 2026

[73] [73]

Jinlai Zhang, Mingchao Xiang, Yongheng Hu, Wei Hao, Linlong Lei, and Kefu Yi. 2026. Multivariate feature learning and associative spatial information en- hancement for snow object detection in autonomous driving.EAAI175 (2026), 114672

2026

[74] [74]

Kailin Jiang, Hongbo Jiang, Ning Jiang, Zhi Gao, Jinhe Bi, Yuchen Ren, Bin Li, Yuntao Du, Lei Liu, and Qing Li. 2025. KORE: Enhancing Knowledge Injec- tion for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints.arXiv preprint arXiv:2510.19316(2025)

Pith/arXiv arXiv 2025

[75] [75]

Guancheng Wan, Xiaoran Shang, Yuxin Wu, Guibin Zhang, Jinhe Bi, et al. 2025. HYPERION: Fine-Grained Hypersphere Alignment for Robust Federated Graph Learning. InNeurIPS

2025

[76] [76]

Ningning Xu, Yuxuan Jiang, Shubhashis Roy Dipta, and Hengyuan Zhang. 2025. Learning how to use tools, not just when: Pattern-aware tool-integrated reason- ing.arXiv preprint arXiv:2509.23292(2025)

arXiv 2025

[77] [77]

Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, and Peng Hu. 2023. Cross-modal active complementary learning with self-refining correspondence. NeurIPS36 (2023), 24829–24840

2023

[78] [78]

Yiming Zeng, Wanhao Yu, Zexin Li, Tao Ren, Yu Ma, Jinghan Cao, Xiyan Chen, and Tingting Yu. 2025. Bridging the editing gap in LLMs: FineEdit for precise and targeted text modifications.EMNLP Findings(2025), 2193–2206

2025

[79] [79]

Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. 2024. Robust watermarking using generative priors against image editing: From bench- marking to advances.arXiv preprint arXiv:2410.18775(2024)

arXiv 2024

[80] [80]

Lai Wei, Zhiquan Tan, Chenghai Li, Jindong Wang, and Weiran Huang. 2024. Diff-erank: A novel rank-based metric for evaluating large language models. NeurIPS37 (2024), 39501–39521

2024