RankVR: Low-Rank Structure Perception and Value Recalibration for Robust Composed Image Retrieval
Pith reviewed 2026-06-27 10:33 UTC · model grok-4.3
The pith
The RankVR framework improves composed image retrieval by detecting noisy triplets through changes in the rank of their correlation matrix and recalibrating their training value.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RankVR addresses noisy triplet correspondence in composed image retrieval by introducing the Global Structure Consistency Perception module that uses the effective rank of the correlation matrix to identify samples disrupting macroscopic semantic symmetry, and the Adaptive Semantic Value Calibration module that integrates training potential and reliability to quantify each triplet's semantic value, resulting in more robust models on benchmarks like FashionIQ and CIRR.
What carries the argument
The Effective Rank of the Correlation Matrix in the GSCP module, which identifies samples disrupting macroscopic semantic symmetry by rank difference.
If this is right
- RankVR achieves superior performance compared to existing state-of-the-art methods on FashionIQ and CIRR under noisy conditions.
- The GSCP module decouples clean samples from those causing global structural inconsistency.
- The ASVC module ensures effective use of hard clean samples while suppressing noise with logical conflicts.
- The approach handles both global correlations and dynamic value changes during training.
Where Pith is reading between the lines
- Applying similar rank-based detection could help in other multimodal tasks with noisy alignments, such as video-text retrieval.
- The dynamic calibration might reduce the need for extensive data cleaning in large-scale training pipelines.
- If the correlation matrix is computed efficiently, this could scale to very large datasets without much overhead.
Load-bearing premise
Differences in the effective rank of the correlation matrix reliably flag samples that break semantic symmetry, and combining training potential with reliability gives an accurate value score without adding bias.
What would settle it
Running the same experiments on FashionIQ and CIRR but replacing the rank-based selection with random or scalar-based selection and finding no performance improvement would falsify the claim that the rank difference provides unique value for denoising.
Figures
read the original abstract
Composed Image Retrieval (CIR) constitutes a pivotal paradigm requiring models to perform joint reasoning on reference images and modification texts. However, the prevalence of Noisy Triplet Correspondence (NTC) in large-scale datasets severely constrains model performance. Existing denoising methods either target binary mismatches or rely on scalar-based point-wise estimation, neglecting rich global structural correlations among sample populations and dynamic value variations during training, thereby yielding suboptimal results. This paper identifies two critical unresolved challenges: Global Structural Inconsistency of Semantic Correlations and Hard Sample Discrimination Uncertainty. To address these, we propose RankVR, a framework designed to construct a robust CIR model via global structure consistency and dynamic value perception. Specifically, we introduce the Global Structure Consistency Perception (GSCP) module, which utilizes the Effective Rank of the Correlation Matrix to decouple clean samples from structural noise. By measuring rank difference, GSCP identifies samples disrupting macroscopic semantic symmetry. Furthermore, we develop the Adaptive Semantic Value Calibration (ASVC) module to distinguish high-value hard clean samples. By integrating training potential and reliability, it dynamically quantifies the semantic value of each triplet, ensuring effective utilization of hard samples while suppressing noise characterized by logical conflicts. Extensive experiments on the FashionIQ and CIRR benchmark datasets demonstrate that RankVR significantly outperforms existing state-of-the-art methods, validating its superior robustness in noisy environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes RankVR, a framework for robust Composed Image Retrieval under Noisy Triplet Correspondence (NTC). It introduces the Global Structure Consistency Perception (GSCP) module, which uses the effective rank of the correlation matrix to identify samples disrupting macroscopic semantic symmetry via rank difference, and the Adaptive Semantic Value Calibration (ASVC) module, which integrates training potential and reliability to dynamically quantify triplet semantic value and distinguish high-value hard clean samples from noise. The central claim is that RankVR significantly outperforms existing SOTA methods on the FashionIQ and CIRR benchmarks, demonstrating superior robustness in noisy environments.
Significance. If the GSCP and ASVC modules are shown to reliably separate structural noise from clean hard samples without introducing new biases, the work could provide a useful global-structure-aware alternative to point-wise denoising methods in CIR. The approach leverages population-level correlations and dynamic training signals, which addresses stated limitations of prior binary-mismatch or scalar-estimation techniques.
major comments (3)
- [Abstract] Abstract: The central claim of significant outperformance on FashionIQ and CIRR is stated without any quantitative metrics, tables, error bars, ablation results, or statistical details, which is load-bearing for assessing whether the reported robustness gains can be attributed to the proposed modules rather than implementation details.
- [Abstract] Abstract (GSCP description): No derivation or construction details are supplied for the correlation matrix (e.g., which embeddings, batch size, centering), nor any controlled experiment showing that rank difference correlates with injected NTC levels or cleanly isolates noise without discarding hard clean samples; this directly underpins the claim that GSCP decouples clean samples from structural noise.
- [Abstract] Abstract (ASVC description): The integration of training potential and reliability is asserted to quantify semantic value without new biases, but no sensitivity analysis, formulation, or validation against the weakest assumption (rank-difference heuristic separating clean vs. noisy triplets) is referenced, which is load-bearing for the hard-sample discrimination claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment regarding the abstract below and outline the planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of significant outperformance on FashionIQ and CIRR is stated without any quantitative metrics, tables, error bars, ablation results, or statistical details, which is load-bearing for assessing whether the reported robustness gains can be attributed to the proposed modules rather than implementation details.
Authors: We agree that the abstract would be strengthened by including quantitative support. In the revised version, we will incorporate specific metrics such as the percentage improvements in recall@K on FashionIQ and CIRR under noisy conditions, with references to the primary results tables and error bars from our experiments. revision: yes
-
Referee: [Abstract] Abstract (GSCP description): No derivation or construction details are supplied for the correlation matrix (e.g., which embeddings, batch size, centering), nor any controlled experiment showing that rank difference correlates with injected NTC levels or cleanly isolates noise without discarding hard clean samples; this directly underpins the claim that GSCP decouples clean samples from structural noise.
Authors: The full derivation and construction of the correlation matrix (using reference image and modification text embeddings within each batch, with centering applied) are detailed in Section 3.2. We will add a concise description of this construction to the abstract. The controlled experiments demonstrating the correlation of rank difference with NTC injection levels and isolation of noise (without discarding hard clean samples) appear in Section 4.3; we will reference these results in the revised abstract. revision: partial
-
Referee: [Abstract] Abstract (ASVC description): The integration of training potential and reliability is asserted to quantify semantic value without new biases, but no sensitivity analysis, formulation, or validation against the weakest assumption (rank-difference heuristic separating clean vs. noisy triplets) is referenced, which is load-bearing for the hard-sample discrimination claim.
Authors: We agree the abstract description of ASVC is high-level. The formulation combining training potential and reliability, along with sensitivity analysis and validation against the rank-difference heuristic, is presented in Section 3.3 and the supplementary material. We will revise the abstract to briefly describe the formulation and reference the validation experiments. revision: yes
Circularity Check
No circularity: modules reference external rank concepts without self-referential reduction
full rationale
The abstract and described framework introduce GSCP (using effective rank of correlation matrix to measure rank difference for identifying structural noise) and ASVC (integrating training potential and reliability for semantic value). No equations, derivations, or self-citations are shown that would reduce any claimed prediction or uniqueness result to a fitted input or prior author ansatz by construction. The central robustness claims are positioned as empirically validated on FashionIQ/CIRR rather than tautological. This matches the default non-circular outcome when no load-bearing step exhibits the enumerated patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Xinxing Xu, Yong Liu, Salman Khan, Fahad Khan, Wangmeng Zuo, Rick Siow Mong Goh, Chun-Mei Feng, et al . 2024. Sentence-level Prompts Bene- fit Composed Image Retrieval. InICLR
2024
-
[2]
Haokun Wen, Xuemeng Song, Jianhua Yin, Jianlong Wu, Weili Guan, and Liqiang Nie. 2023. Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval.IEEE TPAMI(2023)
2023
-
[3]
Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan
-
[4]
InAAAI, Vol
Encoder: Entity mining and modification relation binding for composed image retrieval. InAAAI, Vol. 39. 5101–5109
-
[5]
Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhiheng Fu, Xuemeng Song, and Yupeng Hu. 2026. MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network.arXiv preprint arXiv:2603.29291(2026)
arXiv 2026
-
[6]
Peng Hu, Zhenyu Huang, Dezhong Peng, Xu Wang, and Xi Peng. 2023. Cross- modal retrieval with partially mismatched pairs.IEEE TPAMI45, 8 (2023), 9595– 9610
2023
-
[7]
Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, and Peng Hu
-
[8]
Noisy-correspondence learning for text-to-image person re-identification. InCVPR. 27197–27206
-
[9]
Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. 2026. REFINE: Composed Video Retrieval via Shared and Differential Semantics Enhancement.ACM ToMM(2026)
2026
-
[10]
Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. 2026. ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval. InAAAI, Vol. 40. 23373– 23381
2026
-
[11]
Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, and Liqiang Nie
-
[12]
FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval.https://arxiv.org/abs/2503.21309(2025)
arXiv 2025
-
[13]
Jiale Huang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Chunxiao Wang, and Yupeng Hu. 2026. IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval.arXiv preprint arXiv:2606.08144(2026)
Pith/arXiv arXiv 2026
-
[14]
Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Guozhi Qiu, Weili Guan, and Liqiang Nie. 2026. EgoAdapt: A Multi-Scene Egocentric Adaptation Method for CVPR 2026 HD-EPIC VQA Challenge.arXiv preprint arXiv:2605.24500(2026)
Pith/arXiv arXiv 2026
-
[15]
Yuxuan Jiang and Francis Ferraro. 2026. Beyond math: Stories as a testbed for memorization-constrained reasoning in llms. InEACL. 5590–5607
2026
-
[16]
Jinhe Bi, Minglai Yang, Xingcheng Zhou, Wenke Huang, Sikuan Yan, Yujun Wang, Zixuan Cao, Michael Färber, Xun Xiao, Volker Tresp, et al. 2026. EchoRL: Reinforcement Learning via Rollout Echoing.arXiv preprint arXiv:2605.31228 (2026)
Pith/arXiv arXiv 2026
-
[17]
Jinghan Cao, Yu Ma, Xinjin Li, Qingyang Ren, and Xiangyun Chen. 2026. Task- Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models.arXiv preprint arXiv:2603.21389(2026)
arXiv 2026
-
[18]
Zequn Xie, Chuxin Wang, Yeqiang Wang, Sihang Cai, Shulei Wang, and Tao Jin
-
[19]
Chat-driven text generation and interaction for person retrieval. InEMNLP. 5259–5270
-
[20]
Panqi Yang, Haodong Jing, Nanning Zheng, and Yongqiang Ma. 2026. UniBVR: Balancing visual and reasoning abilities in unified 3D scene understanding.Neu- rocomputing671 (2026), 132599
2026
-
[21]
Yun Song, Wenjia Zheng, Tiedan Chen, Ziyu Wang, Jiazhao Shi, and Yisong Chen
-
[22]
Deep neural network architectures for electrocardiogram classification: A comprehensive evaluation.arXiv preprint arXiv:2602.17701(2026)
arXiv 2026
-
[23]
Yang Shi, Yifeng Xie, Minzhe Guo, Liangsi Lu, Mingxuan Huang, Jingchao Wang, Zhihong Zhu, Boyan Xu, and Zhiqi Huang. 2026. MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models.arXiv preprint arXiv:2601.03331 (2026)
Pith/arXiv arXiv 2026
-
[24]
Yuan Sun, Xu Wang, Dezhong Peng, Zhenwen Ren, and Xiaobo Shen. 2023. Hierarchical hashing learning for image set classification.IEEE TIP32 (2023), 1732–1744
2023
-
[25]
Zong Ke, Yuqing Cao, Zhenrui Chen, Yuchen Yin, Shouchao He, and Yu Cheng
-
[26]
Finance Research Letters(2025), 107890
Early warning of cryptocurrency reversal risks via multi-source data. Finance Research Letters(2025), 107890
2025
-
[27]
Yang Tian, Fan Liu, Jingyuan Zhang, Yupeng Hu, Liqiang Nie, et al. 2025. CoRe- MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG. InACL. 32967–32982
2025
-
[28]
Qianyun Yang, Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, and Liqiang Nie. 2026. STABLE: Efficient Hybrid Nearest Neighbor Search via Magnitude- Uniformity and Cardinality-Robustness.IEEE TKDE(2026)
2026
-
[29]
Qianyun Yang, Peizhuo Lv, Yingjiu Li, Shengzhi Zhang, Yuxuan Chen, Zhiwei Chen, Zixu Li, and Yupeng Hu. 2026. ERASE: Bypassing Collaborative Detection of AI Counterfeit Via Comprehensive Artifacts Elimination.IEEE TDSC(March 2026), 1–18. doi:10.1109/TDSC.2026.3677794
-
[30]
Senmao Tian, Xiang Wei, and Shunli Zhang. 2026. Neutral-Reference Prompting for Vision-Language Models.arXiv(2026), 2605.15615
Pith/arXiv arXiv 2026
-
[31]
Xingfeng Li, Yinghui Sun, Quansen Sun, Zhenwen Ren, and Yuan Sun. 2023. Cross-view graph matching guided anchor alignment for incomplete multi-view clustering.Information Fusion100 (2023), 101941
2023
-
[32]
Zixu Li, Yupeng Hu, Zhiwei Chen, Haokun Wen, Xuemeng Song, and Liqiang Nie. 2026. COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations.IEEE TIP(2026)
2026
-
[33]
Senmao Tian, Xiang Wei, and Shunli Zhang. 2026. Sampling Control for Imbal- anced Calibration in Semi-Supervised Learning. InAAAI, Vol. 40. 25914–25922
2026
-
[34]
Liangsi Lu, Xuhang Chen, Minzhe Guo, Shichu Li, Jingchao Wang, and Yang Shi. 2026. Chordedit: One-step low-energy transport for image editing. InCVPR. 14398–14407
2026
-
[35]
Zichao Li and Zong Ke. 2025. Domain meets typology: Predicting verb-final order from universal dependencies for financial and blockchain nlp. InSIGTYP. 156–164
2025
-
[36]
Zichao Li and Zong Ke. 2025. Cross-modal augmentation for low-resource language understanding and generation. InMAGMaR. 90–99
2025
-
[37]
Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. 2024. Mace: Mass concept erasure in diffusion models. InCVPR. 6430–6440
2024
-
[38]
Jinlai Zhang, Xiaolong Song, Yucheng Li, Diqing Liang, Zhiyong Zhang, and Jinhu Cai. 2026. Adaptive dual cross-attention network for multispectral object detection in autonomous driving.ESW A(2026), 132012
2026
-
[39]
Zhiheng Fu, Zixu Li, Zhiwei Chen, Fangxu Liu, Yupeng Hu, Weili Guan, and Liqiang Nie. 2026. EgoAction: Egocentric Action Composition with Reliability- Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026.arXiv preprint arXiv:2605.24496(2026)
Pith/arXiv arXiv 2026
-
[40]
Panqi Yang, Haodong Jing, Nanning Zheng, and Yongqiang Ma. 2026. UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space. In AAAI, Vol. 40. 11640–11648
2026
-
[41]
Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, and Volker Tresp. 2023. SPOT! Revisiting Video-Language Models for Event Understanding.arXiv preprint arXiv:2311.12919(2023)
arXiv 2023
-
[42]
Xinlei Yu, Chengming Xu, Zhangquan Chen, Yudong Zhang, Shilin Lu, Cheng Yang, Jiangning Zhang, Shuicheng Yan, and Xiaobin Hu. 2025. Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling.arXiv preprint arXiv:2508.03404(2025)
arXiv 2025
-
[43]
Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Weili Guan, and Liqiang Nie. 2026. R3: Composed Video Retrieval via Reasoning-Guided Recalling and Re-ranking. arXiv preprint arXiv:2606.01113(2026)
Pith/arXiv arXiv 2026
-
[44]
Yuxuan Jiang, Dawei Li, and Frank Ferraro. 2025. Drp: Distilled reasoning pruning with skill-aware step decomposition for efficient large reasoning models.arXiv preprint arXiv:2505.13975(2025)
Pith/arXiv arXiv 2025
-
[45]
Yuan Sun, Yang Qin, Yongxiang Li, Dezhong Peng, Xi Peng, and Peng Hu. 2024. Robust multi-view clustering with noisy correspondence.IEEE TKDE36, 12 (2024), 9150–9162
2024
-
[46]
Zixu Li, Yupeng Hu, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Weili Guan, and Liqiang Nie. 2026. TempRet: Temporal Enhancement and Two-Stage Reranking for CVPR 2026 EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge.arXiv preprint arXiv:2605.24470(2026)
Pith/arXiv arXiv 2026
-
[47]
Jinhe Bi, Danqi Yan, Yifan Wang, Wenke Huang, Haokun Chen, Guancheng Wan, Mang Ye, Xun Xiao, Hinrich Schuetze, Volker Tresp, and Yunpu Ma. 2026. The Geometry of Reasoning: Self-Evaluation via Layerwise Trajectory Evolution. In ICML. https://openreview.net/forum?id=WQyrwQwzmK
2026
-
[48]
Yuxuan Jiang and Francis Ferraro. 2026. SCRIBE: Structured Mid-Level Supervi- sion for Tool-Using Language Models.arXiv preprint arXiv:2601.03555(2026)
Pith/arXiv arXiv 2026
-
[49]
Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, et al. 2025. Se-agent: Self- evolution trajectory optimization in multi-step reasoning with llm-based agents. arXiv preprint arXiv:2508.02085(2025)
arXiv 2025
-
[50]
Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, and Xi Peng. 2021. Learning with noisy correspondence for cross-modal matching. NeurIPS34 (2021), 29406–29419
2021
-
[51]
Jingjie Zhang, Lingli Tang, Jiachen Li, Jinyu Xu, Yanchun Ma, and Qing Xie
-
[52]
InACM MM Asia
Robust Dual Embedding Contrastive Learning for Text-to-Image Person Re-identification with Noisy Correspondence. InACM MM Asia. 1–8
-
[53]
Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. 2026. INTENT: Invariance and Discrimination-aware Noise Mitiga- tion for Robust Composed Image Retrieval. InAAAI, Vol. 40. 20463–20471
2026
-
[54]
Shuxian Li, Changhao He, Xiting Liu, Joey Tianyi Zhou, Xi Peng, and Peng Hu. 2025. Learning with Noisy Triplet Correspondence for Composed Image Retrieval. InCVPR. 19628–19637
2025
-
[55]
Siyuan Li, Youyuan Zhang, Fangming Liu, and Jing Li. 2026. Modality-Decoupled Online Recursive Editing.arXiv preprint arXiv:2605.20273(2026)
Pith/arXiv arXiv 2026
-
[56]
Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays
-
[57]
Composing Text and Image for Image Retrieval - an Empirical Odyssey. In CVPR. IEEE, 6439–6448
-
[58]
Yanbei Chen, Shaogang Gong, and Loris Bazzani. 2020. Image Search With Text Feedback by Visiolinguistic Attention Learning. InCVPR. IEEE, 2998–3008
2020
-
[59]
Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie
-
[60]
Retrieval
OFFSET: Segmentation-based Focus Shift Revision for Composed Image ICMR ’26, June 16–19, 2026, Amsterdam, Netherlands Huang et al. Retrieval. InACM MM. 6113–6122
2026
-
[61]
Yida Zhao, Yuqing Song, and Qin Jin. 2022. Progressive learning for image retrieval with hybrid-modality queries. InACM SIGIR. 1012–1021
2022
-
[62]
Matan Levy, Rami Ben-Ari, Nir Darshan, and Dani Lischinski. 2024. Data roaming and quality assessment for composed image retrieval. InAAAI, Vol. 38. 2991– 2999
2024
-
[63]
Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, and Yupeng Hu. 2026. Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341 (2026)
arXiv 2026
-
[64]
Qinlei Huang, Zhiwei Chen, Zixu Li, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. 2025. Median: Adaptive intermediate-grained aggregation network for composed image retrieval. InICASSP. IEEE, 1–5
2025
-
[65]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InICML. PMLR, 8748–8763
2021
-
[66]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InICML. PMLR, 12888–12900
2022
-
[67]
Zhiheng Fu, Zixu Li, Zhiwei Chen, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. 2025. Pair: Complementarity-guided disentanglement for composed image retrieval. InICASSP. IEEE, 1–5
2025
-
[68]
Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. 2026. HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval. InAAAI, Vol. 40. 6762–6770
2026
-
[69]
Yiyang Chen, Zhedong Zheng, Wei Ji, Leigang Qu, and Tat-Seng Chua. 2024. Composed image retrieval with text feedback via multi-grained uncertainty regularization.ICLR(2024)
2024
-
[70]
Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan
-
[71]
InACM MM
HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Com- posed Video Retrieval. InACM MM. 6143–6152
-
[72]
Yuxuan Jiang, Runchao Li, Shubhashis Roy Dipta, Dawei Li, and Zhao Yang. 2026. Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation.arXiv preprint arXiv:2605.09253(2026)
Pith/arXiv arXiv 2026
-
[73]
Jinlai Zhang, Mingchao Xiang, Yongheng Hu, Wei Hao, Linlong Lei, and Kefu Yi. 2026. Multivariate feature learning and associative spatial information en- hancement for snow object detection in autonomous driving.EAAI175 (2026), 114672
2026
-
[74]
Kailin Jiang, Hongbo Jiang, Ning Jiang, Zhi Gao, Jinhe Bi, Yuchen Ren, Bin Li, Yuntao Du, Lei Liu, and Qing Li. 2025. KORE: Enhancing Knowledge Injec- tion for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints.arXiv preprint arXiv:2510.19316(2025)
Pith/arXiv arXiv 2025
-
[75]
Guancheng Wan, Xiaoran Shang, Yuxin Wu, Guibin Zhang, Jinhe Bi, et al. 2025. HYPERION: Fine-Grained Hypersphere Alignment for Robust Federated Graph Learning. InNeurIPS
2025
-
[76]
Ningning Xu, Yuxuan Jiang, Shubhashis Roy Dipta, and Hengyuan Zhang. 2025. Learning how to use tools, not just when: Pattern-aware tool-integrated reason- ing.arXiv preprint arXiv:2509.23292(2025)
arXiv 2025
-
[77]
Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, and Peng Hu. 2023. Cross-modal active complementary learning with self-refining correspondence. NeurIPS36 (2023), 24829–24840
2023
-
[78]
Yiming Zeng, Wanhao Yu, Zexin Li, Tao Ren, Yu Ma, Jinghan Cao, Xiyan Chen, and Tingting Yu. 2025. Bridging the editing gap in LLMs: FineEdit for precise and targeted text modifications.EMNLP Findings(2025), 2193–2206
2025
-
[79]
Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. 2024. Robust watermarking using generative priors against image editing: From bench- marking to advances.arXiv preprint arXiv:2410.18775(2024)
arXiv 2024
-
[80]
Lai Wei, Zhiquan Tan, Chenghai Li, Jindong Wang, and Weiran Huang. 2024. Diff-erank: A novel rank-based metric for evaluating large language models. NeurIPS37 (2024), 39501–39521
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.