PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning

Shudong Huang; Wentao Feng; Yang Liu; Zhuoyao Liu

arxiv: 2509.15623 · v2 · submitted 2025-09-19 · 💻 cs.CV

PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning

Zhuoyao Liu , Yang Liu , Wentao Feng , Shudong Huang This is my paper

Pith reviewed 2026-05-18 16:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords noisy correspondence learningcross-modal retrievalpseudo-label consistencysample refinementadaptive pair optimizationimage-text retrievalnoisy supervision

0 comments

The pith

PCSR refines noisy image-text pairs by scoring pseudo-label consistency to separate ambiguous from refinable samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that prior methods for noisy correspondences in cross-modal retrieval rely on coarse clean-versus-noisy splits and apply uniform strategies, which wastes the diversity inside noisy data. PCSR instead uses confidence estimation to flag noisy pairs and then applies a Pseudo-label Consistency Score to split those pairs into ambiguous ones and refinable ones. Ambiguous pairs train with robust losses while refinable pairs receive text replacement to supply cleaner signals. A reader would care because most real image-text collections contain misalignments that degrade similarity learning, and this finer division promises to extract more useful training value from imperfect data.

Core claim

We introduce the PCSR framework, which enhances correspondence reliability by explicitly dividing samples based on pseudo-label consistency. Clean and noisy pairs are distinguished via confidence-based estimation. Noisy pairs are further refined through pseudo-label consistency to uncover structurally distinct subsets. The Pseudo-label Consistency Score quantifies prediction stability to isolate ambiguous samples from refinable ones. Adaptive Pair Optimization then applies robust loss functions to the former and text replacement to the latter.

What carries the argument

The Pseudo-label Consistency Score (PCS), which measures prediction stability to partition noisy pairs so that each subset receives a matching optimization strategy inside Adaptive Pair Optimization.

If this is right

Retrieval models trained with PCSR achieve higher performance on noisy versions of CC152K, MS-COCO, and Flickr30K.
Noisy samples receive category-specific treatment instead of uniform handling, increasing overall data utilization.
Misaligned pairs exert less damage on similarity learning because refinable cases are actively corrected.
Cross-modal retrieval systems become more robust when supervision contains realistic noise levels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same consistency-based split could be tested on video-text or audio-text retrieval tasks that also suffer from alignment noise.
Text replacement for refinable pairs might be replaced or augmented by other generative corrections without changing the core separation logic.
Integrating the PCS with self-supervised pretraining could further stabilize the ambiguous-versus-refinable boundary on very large collections.

Load-bearing premise

That a confidence-based estimate followed by the Pseudo-label Consistency Score can reliably identify which noisy pairs will yield useful training signals when their text is replaced.

What would settle it

Train the same base model on a controlled noisy version of MS-COCO or Flickr30K once with standard clean-noisy splitting and once with the full PCSR pipeline; if recall@K and precision@K show no consistent gain for PCSR, the benefit of the finer division collapses.

Figures

Figures reproduced from arXiv: 2509.15623 by Shudong Huang, Wentao Feng, Yang Liu, Zhuoyao Liu.

**Figure 2.** Figure 2: Overview of the proposed method. We compute the Pseudo-label Consistency Score (PCS) through repeated pseudo [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: At epoch 40, we computed PCS values across all [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Cross-modal retrieval aims to align different modalities via semantic similarity. However, existing methods often assume that image-text pairs are perfectly aligned, overlooking Noisy Correspondences in real data. These misaligned pairs misguide similarity learning and degrade retrieval performance. Previous methods often rely on coarse-grained categorizations that simply divide data into clean and noisy samples, overlooking the intrinsic diversity within noisy instances. Moreover, they typically apply uniform training strategies regardless of sample characteristics, resulting in suboptimal sample utilization for model optimization. To address the above challenges, we introduce a novel framework, called Pseudo-label Consistency-Guided Sample Refinement (PCSR), which enhances correspondence reliability by explicitly dividing samples based on pseudo-label consistency. Specifically, we first employ a confidence-based estimation to distinguish clean and noisy pairs, then refine the noisy pairs via pseudo-label consistency to uncover structurally distinct subsets. We further proposed a Pseudo-label Consistency Score (PCS) to quantify prediction stability, enabling the separation of ambiguous and refinable samples within noisy pairs. Accordingly, we adopt Adaptive Pair Optimization (APO), where ambiguous samples are optimized with robust loss functions and refinable ones are enhanced via text replacement during training. Extensive experiments on CC152K, MS-COCO and Flickr30K validate the effectiveness of our method in improving retrieval robustness under noisy supervision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PCSR splits noisy pairs further with a consistency score and applies different fixes, but the text replacement step is too vaguely described to judge if it actually helps.

read the letter

The core idea is to first flag clean versus noisy pairs with a confidence check, then use a pseudo-label consistency score to split the noisy ones into ambiguous and refinable groups, and finally run adaptive optimization that uses robust loss on the first group and text replacement on the second. This is a modest but reasonable step beyond the usual clean/noisy binary split that most prior noisy correspondence work relies on. It directly targets the claim that noisy pairs are not all the same and deserve differentiated handling during training on cross-modal retrieval tasks. The three-dataset evaluation setup (CC152K, MS-COCO, Flickr30K) is standard and appropriate for the problem. The consistency score itself is a straightforward way to measure prediction stability and gives the method a concrete signal for the split. That part is clearly motivated and easy to implement on top of existing pseudo-label pipelines. The main weakness is the text replacement procedure. The abstract gives no source for the replacement text, no selection rule, and no argument that the new pair is more aligned than the original noisy one. If the replacement is drawn from the same model’s outputs, the risk of simply moving the noise around is real and undercuts the novelty of handling “intrinsic diversity.” The circularity worry is present but secondary; the bigger issue is whether the replacement step produces a net gain or just adds another hyperparameter. This work is for people already working on robust multimodal retrieval under noisy supervision. A reader who wants concrete tactics for salvaging imperfect pairs will find the differentiated strategy worth looking at, even if the details need tightening. I would send it to peer review so the replacement mechanism and the actual numbers can be examined properly.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes PCSR, a framework for handling noisy correspondences in cross-modal retrieval. It employs confidence-based estimation to separate clean from noisy image-text pairs, introduces a Pseudo-label Consistency Score (PCS) to further partition noisy pairs into ambiguous and refinable subsets, applies robust losses to ambiguous samples, and performs text replacement on refinable samples via Adaptive Pair Optimization (APO). Effectiveness is claimed on CC152K, MS-COCO, and Flickr30K.

Significance. If the results hold, the work offers a finer-grained alternative to binary clean/noisy splits by exploiting diversity within noisy instances, which could improve sample utilization and robustness in noisy supervision for retrieval tasks. The PCS as a stability metric is a potentially useful technical contribution.

major comments (3)

[Abstract, paragraph describing the refinement step and APO] Abstract and APO description: the text replacement mechanism for refinable samples is load-bearing for the novelty claim of handling 'intrinsic diversity within noisy instances,' yet the source of replacement texts, selection criterion, and any guarantee of improved semantic alignment (versus the original noisy pair) are not defined. If replacements derive from the same model's pseudo-labels, error reinforcement rather than correction is possible.
[Method, PCS definition] Method section on PCS: using pseudo-labels generated by the model under training to compute consistency scores for deciding how to refine the same data creates a circular dependency. The manuscript must demonstrate that PCS provides an independent signal rather than amplifying early errors.
[Experiments] Experiments: the abstract asserts validation on three datasets but supplies no quantitative results, ablation on the two free parameters (confidence threshold and PCS threshold), or direct comparison against uniform strategies. Without these, it is impossible to confirm that the proposed division yields gains that support the central claim.

minor comments (2)

[Notation and abstract] Ensure all acronyms (PCS, APO) are expanded on first use and used consistently.
[Figures] Pipeline figures should explicitly label the clean/noisy split, PCS computation, and the two branches of APO.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback. We have carefully reviewed each major comment and provide point-by-point responses below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract, paragraph describing the refinement step and APO] Abstract and APO description: the text replacement mechanism for refinable samples is load-bearing for the novelty claim of handling 'intrinsic diversity within noisy instances,' yet the source of replacement texts, selection criterion, and any guarantee of improved semantic alignment (versus the original noisy pair) are not defined. If replacements derive from the same model's pseudo-labels, error reinforcement rather than correction is possible.

Authors: We agree that the current description of the text replacement mechanism within APO is insufficiently detailed. In the revised manuscript we will expand the APO subsection to explicitly specify the source of replacement texts, the selection criterion used, and empirical evidence (including alignment metrics before and after replacement) that the operation improves semantic correspondence. We will also add a short analysis addressing the risk of error reinforcement, for example by restricting replacements to samples whose PCS exceeds a conservative threshold and by validating against a small set of manually verified pairs. revision: yes
Referee: [Method, PCS definition] Method section on PCS: using pseudo-labels generated by the model under training to compute consistency scores for deciding how to refine the same data creates a circular dependency. The manuscript must demonstrate that PCS provides an independent signal rather than amplifying early errors.

Authors: This concern about circularity is well-taken. We will revise the PCS definition paragraph to clarify that consistency is measured across temporally separated model checkpoints (e.g., every k epochs) rather than solely on the instantaneous prediction, thereby providing a stability signal that is partially decoupled from any single erroneous state. In addition, we will include a new paragraph with both a brief theoretical argument and supporting experiments that track how PCS correlates with ground-truth alignment on a held-out clean subset, demonstrating that early errors do not dominate the score. revision: yes
Referee: [Experiments] Experiments: the abstract asserts validation on three datasets but supplies no quantitative results, ablation on the two free parameters (confidence threshold and PCS threshold), or direct comparison against uniform strategies. Without these, it is impossible to confirm that the proposed division yields gains that support the central claim.

Authors: We thank the referee for highlighting the need for more explicit evidence. The full paper already reports quantitative results on CC152K, MS-COCO, and Flickr30K in Section 4 (Tables 1–3). We will update the abstract to include the main performance deltas. We will also add a dedicated ablation subsection that varies the confidence threshold and the PCS threshold, and we will insert direct comparisons against uniform robust-loss and random-replacement baselines to quantify the benefit of the fine-grained division. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation proposes independent framework components validated on external benchmarks.

full rationale

The paper's chain begins with a confidence-based split of clean vs. noisy pairs, followed by introduction of a new PCS metric to further partition noisy pairs into ambiguous and refinable subsets, then applies APO (robust loss on one subset, text replacement on the other). None of these steps reduces to its own inputs by construction: PCS is defined as a stability quantifier rather than a tautological re-use of the model's output, and text replacement is presented as an enhancement operation without any equation showing it equals the original noisy pair or a fitted parameter. No self-citation is invoked as a uniqueness theorem or load-bearing premise, and the method is tested on CC152K, MS-COCO and Flickr30K, which are independent of the internal pseudo-label generation. This satisfies the self-contained criterion; the approach is a standard iterative refinement proposal rather than a definitional loop.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The framework rests on several modeling choices whose justification is not supplied in the abstract: thresholds for clean/noisy separation, the definition of the consistency score, and the assumption that text replacement creates valid positive pairs.

free parameters (2)

confidence threshold for clean/noisy split
Used to initially separate pairs; value not stated in abstract.
PCS threshold separating ambiguous from refinable
Determines which noisy samples receive which training strategy.

axioms (1)

domain assumption Pseudo-label consistency reliably indicates whether a noisy pair can be refined by text replacement
Invoked when the method divides noisy samples and applies APO.

pith-pipeline@v0.9.0 · 5770 in / 1340 out tokens · 66728 ms · 2026-05-18T16:30:07.525261+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We further propose a Pseudo-label Consistency Score (PCS) to quantify prediction stability, enabling the separation of ambiguous and refinable samples within noisy pairs. Accordingly, we adopt Adaptive Pair Optimization (APO), where ambiguous samples are optimized with robust loss functions and refinable ones are enhanced via text replacement during training.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we first employ a confidence-based estimation to distinguish clean and noisy pairs, then refine the noisy pairs via pseudo-label consistency

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Anderson, P.; He, X.; Buehler, C.; Teney, D.; Johnson, M.; Gould, S.; and Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 6077--6086

work page 2018
[2]

Bai, Y.; Yang, E.; Han, B.; Yang, Y.; Li, J.; Mao, Y.; Niu, G.; and Liu, T. 2021. Understanding and improving early stopping for learning with noisy labels. Advances in Neural Information Processing Systems, 34: 24392--24403

work page 2021
[3]

Chen, J.; Dun, C.; and Kyrillidis, A. 2024. Fast fixmatch: Faster semi-supervised learning with curriculum batch size. In 2024 IEEE International Symposium on Information Theory, 1836--1841. IEEE

work page 2024
[4]

Chen, J.; Hu, H.; Wu, H.; Jiang, Y.; and Wang, C. 2021. Learning the best pooling strategy for visual semantic embedding. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 15789--15798

work page 2021
[5]

Chen, M.; and Wang, C. 2024. Multi-head co-training: An uncertainty-aware and robust semi-supervised learning framework. Knowledge-Based Systems, 302: 112325

work page 2024
[6]

D.; Wang, X.; Vineet, V.; Joshi, N.; Torralba, A.; Jegelka, S.; and Song, Y

Chuang, C.-Y.; Hjelm, R. D.; Wang, X.; Vineet, V.; Joshi, N.; Torralba, A.; Jegelka, S.; and Song, Y. 2022. Robust contrastive learning against noisy views. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 16670--16681

work page 2022
[7]

Diao, H.; Zhang, Y.; Ma, L.; and Lu, H. 2021. Similarity reasoning and filtration for image-text matching. In Proceedings of the AAAI conference on Artificial Intelligence, volume 35, 1218--1226

work page 2021
[8]

Duan, Y.; Gu, Z.; Ying, Z.; Qi, L.; Meng, C.; and Shi, Y. 2024. Pc2: Pseudo-classification based pseudo-captioning for noisy correspondence learning in cross-modal retrieval. In Proceedings of the 32nd ACM International Conference on Multimedia, 9397--9406

work page 2024
[9]

J.; Kiros, J

Faghri, F.; Fleet, D. J.; Kiros, J. R.; and Fidler, S. 2017. Vse++: Improving visual-semantic embeddings with hard negatives. In British Machine Vision Conference

work page 2017
[10]

Feng, Y.; Zhu, H.; Peng, D.; Peng, X.; and Hu, P. 2023. RONO: robust discriminative learning with noisy labels for 2D-3D cross-modal retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11610--11619

work page 2023
[11]

Fu, Z.; Zhang, L.; Xia, H.; and Mao, Z. 2024. Linguistic-aware patch slimming framework for fine-grained cross-modal alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 26307--26316

work page 2024
[12]

Han, H.; Miao, K.; Zheng, Q.; and Luo, M. 2023. Noisy correspondence learning with meta similarity correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7517--7526

work page 2023
[13]

Han, H.; Zheng, Q.; Dai, G.; Luo, M.; and Wang, J. 2024. Learning to rematch mismatched pairs for robust cross-modal retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 26679--26688

work page 2024
[14]

Heidari, M.; Zhang, H.; and Guo, Y. 2024. Reinforcement learning guided semi-supervised learning. Advances in Neural Information Processing Systems, 37: 136990--137009

work page 2024
[15]

Hu, P.; Huang, Z.; Peng, D.; Wang, X.; and Peng, X. 2023. Cross-modal retrieval with partially mismatched pairs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8): 9595--9610

work page 2023
[16]

Huang, Z.; Niu, G.; Liu, X.; Ding, W.; Xiao, X.; Wu, H.; and Peng, X. 2021. Learning with noisy correspondence for cross-modal matching. Advances in Neural Information Processing Systems, 34: 29406--29419

work page 2021
[17]

Iscen, A.; Valmadre, J.; Arnab, A.; and Schmid, C. 2022. Learning with neighbor consistency for noisy labels. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 4672--4681

work page 2022
[18]

Jiang, L.; Huang, D.; Liu, M.; and Yang, W. 2020. Beyond synthetic noise: Deep learning on controlled noisy labels. In International conference on Machine Learning, 4804--4815. PMLR

work page 2020
[19]

Lee, D.-H.; et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International Conference on Machine Learning, volume 3, 896. Atlanta

work page 2013
[20]

Lee, K.-H.; Chen, X.; Hua, G.; Hu, H.; and He, X. 2018. Stacked cross attention for image-text matching. In Proceedings of the European conference on Computer Vision, 201--216

work page 2018
[21]

Li, K.; Zhang, Y.; Li, K.; Li, Y.; and Fu, Y. 2019. Visual semantic reasoning for image-text matching. In Proceedings of the IEEE/CVF international conference on Computer Vision, 4654--4662

work page 2019
[22]

Li, Y.; Huang, H.; Xu, J.; and Huang, S.-L. 2024. NAC: Mitigating Noisy Correspondence in Cross-Modal Matching Via Neighbor Auxiliary Corrector. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 6815--6819. IEEE

work page 2024
[23]

Liu, C.; Mao, Z.; Zhang, T.; Xie, H.; Wang, B.; and Zhang, Y. 2020. Graph structured network for image-text matching. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 10921--10930

work page 2020
[24]

Liu, Y.; Liu, M.; Huang, S.; and Lv, J. 2025. Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, 5676--5684

work page 2025
[25]

Ma, X.; Yang, M.; Li, Y.; Hu, P.; Lv, J.; and Peng, X. 2024. Cross-modal retrieval with noisy correspondence via consistency refining and mining. IEEE transactions on Image Processing, 33: 2587--2598

work page 2024
[26]

Miyato, T.; Maeda, S.-I.; Koyama, M.; and Ishii, S. 2019. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8): 1979--1993

work page 2019
[27]

Pan, Z.; Wu, F.; and Zhang, B. 2023. Fine-grained image-text matching by cross-modal hard aligning network. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 19275--19284

work page 2023
[28]

Qin, Y.; Peng, D.; Peng, X.; Wang, X.; and Hu, P. 2022. Deep evidential learning with noisy correspondence for cross-modal retrieval. In Proceedings of the 30th ACM International Conference on Multimedia, 4948--4956

work page 2022
[29]

Wang, Q.; Han, B.; Liu, T.; Niu, G.; Yang, J.; and Gong, C. 2021. Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model. Proceedings of the AAAI Conference on Artificial Intelligence, 35(11): 10183--10191

work page 2021
[30]

Wang, S.; Wang, R.; Yao, Z.; Shan, S.; and Chen, X. 2020. Cross-modal scene graph matching for relationship-aware image-text retrieval. In Proceedings of the IEEE/CVF winter conference on Applications of Computer Vision, 1508--1517

work page 2020
[31]

Wen, T.; Lai, S.; and Qian, X. 2021. Preparing lessons: Improve knowledge distillation with better supervision. Neurocomputing, 454: 25--33

work page 2021
[32]

Yan, J.; Luo, L.; Deng, C.; and Huang, H. 2023. Adaptive hierarchical similarity metric learning with noisy labels. IEEE Transactions on Image Processing, 32: 1245--1256

work page 2023
[33]

Yang, S.; Li, Q.; Li, W.; Li, X.; and Liu, A.-A. 2022. Dual-level representation enhancement on characteristic and context for image-text retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 32(11): 8037--8050

work page 2022
[34]

Yang, S.; Xu, Z.; Wang, K.; You, Y.; Yao, H.; Liu, T.; and Xu, M. 2023. Bicro: Noisy correspondence rectification for multi-modality data via bi-directional cross-modal similarity consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19883--19892

work page 2023
[35]

Zhai, X.; Mustafa, B.; Kolesnikov, A.; and Beyer, L. 2023. Sigmoid Loss for Language Image Pre-Training. In 2023 IEEE/CVF International Conference on Computer Vision, 11941--11952

work page 2023
[36]

Zhang, H.; Mao, Z.; Zhang, K.; and Zhang, Y. 2022 a . Show your faith: Cross-modal confidence-aware network for image-text matching. In Proceedings of the AAAI conference on Artificial Intelligence, volume 36, 3262--3270

work page 2022
[37]

Zhang, K.; Mao, Z.; Wang, Q.; and Zhang, Y. 2022 b . Negative-aware attention framework for image-text matching. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 15661--15670

work page 2022
[38]

Zhang, Q.; Lei, Z.; Zhang, Z.; and Li, S. Z. 2020. Context-aware attention network for image-text retrieval. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 3536--3545

work page 2020
[39]

Y.; and Shen, W

Zhang, S.; Li, Y.; Tian, J.; Man, Z.; Chung, C. Y.; and Shen, W. 2024. Improving Battery Life Prediction with Unlabeled Data: Confidence-Weighted Semi-Supervised Learning with Label Propagation. IEEE Transactions on Transportation Electrification

work page 2024
[40]

Zhang, Z.; and Sabuncu, M. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems, 31

work page 2018
[41]

Zhao, X.; Li, D.; Zhong, Y.; Hu, B.; Chen, Y.; Hu, B.; and Zhang, M. 2024. SEER : Self-Aligned Evidence Extraction for Retrieval-Augmented Generation. In Al-Onaizan, Y.; Bansal, M.; and Chen, Y.-N., eds., Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 3027--3041. Association for Computational Linguistics

work page 2024
[42]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[43]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Anderson, P.; He, X.; Buehler, C.; Teney, D.; Johnson, M.; Gould, S.; and Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 6077--6086

work page 2018

[2] [2]

Bai, Y.; Yang, E.; Han, B.; Yang, Y.; Li, J.; Mao, Y.; Niu, G.; and Liu, T. 2021. Understanding and improving early stopping for learning with noisy labels. Advances in Neural Information Processing Systems, 34: 24392--24403

work page 2021

[3] [3]

Chen, J.; Dun, C.; and Kyrillidis, A. 2024. Fast fixmatch: Faster semi-supervised learning with curriculum batch size. In 2024 IEEE International Symposium on Information Theory, 1836--1841. IEEE

work page 2024

[4] [4]

Chen, J.; Hu, H.; Wu, H.; Jiang, Y.; and Wang, C. 2021. Learning the best pooling strategy for visual semantic embedding. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 15789--15798

work page 2021

[5] [5]

Chen, M.; and Wang, C. 2024. Multi-head co-training: An uncertainty-aware and robust semi-supervised learning framework. Knowledge-Based Systems, 302: 112325

work page 2024

[6] [6]

D.; Wang, X.; Vineet, V.; Joshi, N.; Torralba, A.; Jegelka, S.; and Song, Y

Chuang, C.-Y.; Hjelm, R. D.; Wang, X.; Vineet, V.; Joshi, N.; Torralba, A.; Jegelka, S.; and Song, Y. 2022. Robust contrastive learning against noisy views. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 16670--16681

work page 2022

[7] [7]

Diao, H.; Zhang, Y.; Ma, L.; and Lu, H. 2021. Similarity reasoning and filtration for image-text matching. In Proceedings of the AAAI conference on Artificial Intelligence, volume 35, 1218--1226

work page 2021

[8] [8]

Duan, Y.; Gu, Z.; Ying, Z.; Qi, L.; Meng, C.; and Shi, Y. 2024. Pc2: Pseudo-classification based pseudo-captioning for noisy correspondence learning in cross-modal retrieval. In Proceedings of the 32nd ACM International Conference on Multimedia, 9397--9406

work page 2024

[9] [9]

J.; Kiros, J

Faghri, F.; Fleet, D. J.; Kiros, J. R.; and Fidler, S. 2017. Vse++: Improving visual-semantic embeddings with hard negatives. In British Machine Vision Conference

work page 2017

[10] [10]

Feng, Y.; Zhu, H.; Peng, D.; Peng, X.; and Hu, P. 2023. RONO: robust discriminative learning with noisy labels for 2D-3D cross-modal retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11610--11619

work page 2023

[11] [11]

Fu, Z.; Zhang, L.; Xia, H.; and Mao, Z. 2024. Linguistic-aware patch slimming framework for fine-grained cross-modal alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 26307--26316

work page 2024

[12] [12]

Han, H.; Miao, K.; Zheng, Q.; and Luo, M. 2023. Noisy correspondence learning with meta similarity correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7517--7526

work page 2023

[13] [13]

Han, H.; Zheng, Q.; Dai, G.; Luo, M.; and Wang, J. 2024. Learning to rematch mismatched pairs for robust cross-modal retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 26679--26688

work page 2024

[14] [14]

Heidari, M.; Zhang, H.; and Guo, Y. 2024. Reinforcement learning guided semi-supervised learning. Advances in Neural Information Processing Systems, 37: 136990--137009

work page 2024

[15] [15]

Hu, P.; Huang, Z.; Peng, D.; Wang, X.; and Peng, X. 2023. Cross-modal retrieval with partially mismatched pairs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8): 9595--9610

work page 2023

[16] [16]

Huang, Z.; Niu, G.; Liu, X.; Ding, W.; Xiao, X.; Wu, H.; and Peng, X. 2021. Learning with noisy correspondence for cross-modal matching. Advances in Neural Information Processing Systems, 34: 29406--29419

work page 2021

[17] [17]

Iscen, A.; Valmadre, J.; Arnab, A.; and Schmid, C. 2022. Learning with neighbor consistency for noisy labels. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 4672--4681

work page 2022

[18] [18]

Jiang, L.; Huang, D.; Liu, M.; and Yang, W. 2020. Beyond synthetic noise: Deep learning on controlled noisy labels. In International conference on Machine Learning, 4804--4815. PMLR

work page 2020

[19] [19]

Lee, D.-H.; et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International Conference on Machine Learning, volume 3, 896. Atlanta

work page 2013

[20] [20]

Lee, K.-H.; Chen, X.; Hua, G.; Hu, H.; and He, X. 2018. Stacked cross attention for image-text matching. In Proceedings of the European conference on Computer Vision, 201--216

work page 2018

[21] [21]

Li, K.; Zhang, Y.; Li, K.; Li, Y.; and Fu, Y. 2019. Visual semantic reasoning for image-text matching. In Proceedings of the IEEE/CVF international conference on Computer Vision, 4654--4662

work page 2019

[22] [22]

Li, Y.; Huang, H.; Xu, J.; and Huang, S.-L. 2024. NAC: Mitigating Noisy Correspondence in Cross-Modal Matching Via Neighbor Auxiliary Corrector. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 6815--6819. IEEE

work page 2024

[23] [23]

Liu, C.; Mao, Z.; Zhang, T.; Xie, H.; Wang, B.; and Zhang, Y. 2020. Graph structured network for image-text matching. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 10921--10930

work page 2020

[24] [24]

Liu, Y.; Liu, M.; Huang, S.; and Lv, J. 2025. Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, 5676--5684

work page 2025

[25] [25]

Ma, X.; Yang, M.; Li, Y.; Hu, P.; Lv, J.; and Peng, X. 2024. Cross-modal retrieval with noisy correspondence via consistency refining and mining. IEEE transactions on Image Processing, 33: 2587--2598

work page 2024

[26] [26]

Miyato, T.; Maeda, S.-I.; Koyama, M.; and Ishii, S. 2019. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8): 1979--1993

work page 2019

[27] [27]

Pan, Z.; Wu, F.; and Zhang, B. 2023. Fine-grained image-text matching by cross-modal hard aligning network. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 19275--19284

work page 2023

[28] [28]

Qin, Y.; Peng, D.; Peng, X.; Wang, X.; and Hu, P. 2022. Deep evidential learning with noisy correspondence for cross-modal retrieval. In Proceedings of the 30th ACM International Conference on Multimedia, 4948--4956

work page 2022

[29] [29]

Wang, Q.; Han, B.; Liu, T.; Niu, G.; Yang, J.; and Gong, C. 2021. Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model. Proceedings of the AAAI Conference on Artificial Intelligence, 35(11): 10183--10191

work page 2021

[30] [30]

Wang, S.; Wang, R.; Yao, Z.; Shan, S.; and Chen, X. 2020. Cross-modal scene graph matching for relationship-aware image-text retrieval. In Proceedings of the IEEE/CVF winter conference on Applications of Computer Vision, 1508--1517

work page 2020

[31] [31]

Wen, T.; Lai, S.; and Qian, X. 2021. Preparing lessons: Improve knowledge distillation with better supervision. Neurocomputing, 454: 25--33

work page 2021

[32] [32]

Yan, J.; Luo, L.; Deng, C.; and Huang, H. 2023. Adaptive hierarchical similarity metric learning with noisy labels. IEEE Transactions on Image Processing, 32: 1245--1256

work page 2023

[33] [33]

Yang, S.; Li, Q.; Li, W.; Li, X.; and Liu, A.-A. 2022. Dual-level representation enhancement on characteristic and context for image-text retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 32(11): 8037--8050

work page 2022

[34] [34]

Yang, S.; Xu, Z.; Wang, K.; You, Y.; Yao, H.; Liu, T.; and Xu, M. 2023. Bicro: Noisy correspondence rectification for multi-modality data via bi-directional cross-modal similarity consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19883--19892

work page 2023

[35] [35]

Zhai, X.; Mustafa, B.; Kolesnikov, A.; and Beyer, L. 2023. Sigmoid Loss for Language Image Pre-Training. In 2023 IEEE/CVF International Conference on Computer Vision, 11941--11952

work page 2023

[36] [36]

Zhang, H.; Mao, Z.; Zhang, K.; and Zhang, Y. 2022 a . Show your faith: Cross-modal confidence-aware network for image-text matching. In Proceedings of the AAAI conference on Artificial Intelligence, volume 36, 3262--3270

work page 2022

[37] [37]

Zhang, K.; Mao, Z.; Wang, Q.; and Zhang, Y. 2022 b . Negative-aware attention framework for image-text matching. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 15661--15670

work page 2022

[38] [38]

Zhang, Q.; Lei, Z.; Zhang, Z.; and Li, S. Z. 2020. Context-aware attention network for image-text retrieval. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 3536--3545

work page 2020

[39] [39]

Y.; and Shen, W

Zhang, S.; Li, Y.; Tian, J.; Man, Z.; Chung, C. Y.; and Shen, W. 2024. Improving Battery Life Prediction with Unlabeled Data: Confidence-Weighted Semi-Supervised Learning with Label Propagation. IEEE Transactions on Transportation Electrification

work page 2024

[40] [40]

Zhang, Z.; and Sabuncu, M. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems, 31

work page 2018

[41] [41]

Zhao, X.; Li, D.; Zhong, Y.; Hu, B.; Chen, Y.; Hu, B.; and Zhang, M. 2024. SEER : Self-Aligned Evidence Extraction for Retrieval-Augmented Generation. In Al-Onaizan, Y.; Bansal, M.; and Chen, Y.-N., eds., Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 3027--3041. Association for Computational Linguistics

work page 2024

[42] [42]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[43] [43]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page