Structured Exploration and Exploitation of Label Functions for Automated Data Annotation

Ha-Linh Nguyen; Hieu Dinh Vo; Phong Lam; Son Nguyen; Thu-Trang Nguyen

arxiv: 2604.08578 · v1 · submitted 2026-03-28 · 💻 cs.LG · cs.AI

Structured Exploration and Exploitation of Label Functions for Automated Data Annotation

Phong Lam , Ha-Linh Nguyen , Thu-Trang Nguyen , Son Nguyen , Hieu Dinh Vo This is my paper

Pith reviewed 2026-05-14 23:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords label functionsprogrammatic labelingweak supervisionautomated annotationmachine learningheuristic rulesdata labeling

0 comments

The pith

EXPONA generates label functions by exploring surface, structural, and semantic levels while applying reliability-aware filtering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EXPONA, a framework that treats label function generation as a structured process to automate data annotation for machine learning. It explores heuristics from surface patterns, structural relations, and semantic meanings to increase the number of covered examples. Reliability mechanisms then suppress noisy or duplicate functions to keep the signals useful. Tests on eleven classification datasets show higher coverage, better label quality, and stronger final model results than prior automated methods. This approach matters if it can lower the cost and error rates of building training data without heavy manual effort.

Core claim

EXPONA formulates LF generation as a principled process balancing diversity and reliability by systematically exploring multi-level LFs spanning surface, structural, and semantic perspectives and applying reliability-aware mechanisms to suppress noisy or redundant heuristics while preserving complementary signals, which produces nearly complete label coverage up to 98.9 percent, improved weak label quality by up to 87 percent, and downstream performance gains of up to 46 percent in weighted F1 across eleven datasets.

What carries the argument

The EXPONA framework that explores label functions at surface, structural, and semantic levels combined with reliability-aware filtering to suppress noisy heuristics.

Load-bearing premise

Exploring label functions at surface, structural, and semantic levels together with reliability-aware filtering will produce complementary signals without introducing new biases or missing important domain-specific patterns.

What would settle it

A controlled experiment on a held-out dataset where EXPONA produces lower coverage or weaker downstream models than the best existing automated label function method would settle whether the central claim holds.

Figures

Figures reproduced from arXiv: 2604.08578 by Ha-Linh Nguyen, Hieu Dinh Vo, Phong Lam, Son Nguyen, Thu-Trang Nguyen.

**Figure 1.** Figure 1: Expona: Approach Overview Task Description Classify news section from descriptions Available Labels World Sport Business Sci/Tech Task Description Classify movie sentiments from reviews Available Labels Negative Positive [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Prompt templates of Expona for surface label function exploration. Surface LFs typically achieve high precision when their cues often suffer from limited coverage and poor domain transferability. These limitations are effectively mitigated by the complementary structural and semantic LFs, whose generation processes are discussed in the following sections. 3.1.2. Structural Label Function Structural LFs aim… view at source ↗

**Figure 3.** Figure 3: Per-class proportion of instances (orange bars) as well as per-class coverage and F1-Score of Expona for dataset ChemProt. supervision can lead to under-discriminative LFs and thus suboptimal aggregation. We also performed a class-wise analysis on a representative multi-class dataset, i.e., ChemProt, reporting both coverage and per-class weighted F1-scores to examine EXPONA’s behavior under severe class … view at source ↗

**Figure 5.** Figure 5: Expona’s run time and E2E performance as a function of filtering parameter 𝛼. Dataset: Massive. 𝛼. Recall that 𝛼 controls the strictness of intra-type filtering, determining the minimum accuracy threshold 𝜃 𝑐 intra = 𝛼 ⋅ max𝜆𝑗∈Λ𝑐 ̂𝑎𝑐𝑐(𝜆𝑗 ) within each LF category 𝑐. A smaller 𝛼 permits a broader set of label functions, prioritizing diversity, while a larger 𝛼 enforces stricter selection, retaining only th… view at source ↗

**Figure 6.** Figure 6: Coverage and performance of Expona as a function of the number of LFs per category 𝐾𝑐 . Dataset: Massive. However, the improvement in label quality and downstream E2E performance did not scale proportionally. Label quality improved slightly as 𝐾𝑐 increases from 5 to 20, peaking at 0.746 before declining when 𝐾𝑐 reached 40. The initial improvement reflects greater representational diversity among LFs, while… view at source ↗

**Figure 7.** Figure 7: Coverage and performance of Expona as a function of the proportion of labeled instances (|𝐷𝑙 |∕(|𝐷| + |𝐷𝑙 |)). Dataset: Chemprot. its ability to deliver competitive performance with minimal labeled data, while continuing to benefit, albeit with diminishing returns, from further supervision. 5.5. Efficiency Analysis All experiments were conducted on a Linux 5.15.154 server equipped with two NVIDIA T4 GPUs.… view at source ↗

read the original abstract

High-quality labeled data is critical for training reliable machine learning and deep learning models, yet manual annotation remains costly and error-prone. Programmatic labeling addresses this challenge by using label functions (LFs), i.e., heuristic rules that automatically generate weak labels for training datasets. However, existing automated LF generation methods either rely on large language models (LLMs) to synthesize surface-level heuristics or employ model-based synthesis over hand-crafted primitives. These approaches often result in limited coverage and unreliable label quality. In this paper, we introduce EXPONA, an automated framework for programmatic labeling that formulates LF generation as a principled process balancing diversity and reliability. EXPONA systematically explores multi-level LFs, spanning surface, structural, and semantic perspectives. EXPONA further applies reliability-aware mechanisms to suppress noisy or redundant heuristics while preserving complementary signals. To evaluate EXPONA, we conducted extensive experiments on eleven classification datasets across diverse domains. Experimental results show that EXPONA consistently outperformed state-of-the-art automated LF generation methods. Specifically, EXPONA achieved nearly complete label coverage (up to 98.9%), improved weak label quality by up to 87%, and yielded downstream performance gains of up to 46% in weighted F1. These results indicate that EXPONA's combination of multi-level LF exploration and reliability-aware filtering enabled more consistent label quality and downstream performance across diverse tasks by balancing coverage and precision in the generated LF set.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EXPONA's multi-level label function exploration delivers practical coverage gains on the reported datasets, but the reliability filter's handling of potential LLM biases at the semantic level needs clearer validation.

read the letter

The core contribution is a framework that generates label functions across surface, structural, and semantic levels, then applies reliability-aware filtering to reduce noise and redundancy. On eleven classification datasets it reports coverage up to 98.9 percent, weak-label quality lifts up to 87 percent, and downstream weighted F1 gains up to 46 percent over the baselines it compares against. That combination of breadth and reported effect size is the main takeaway for someone scanning the abstract and results tables.

Referee Report

2 major / 2 minor

Summary. The paper introduces EXPONA, a framework for automated label function (LF) generation in programmatic labeling. It formulates LF creation as a process that systematically explores multi-level heuristics (surface, structural, and semantic) and applies reliability-aware filtering to suppress noisy or redundant signals while retaining complementary ones. Experiments on eleven classification datasets across domains report up to 98.9% label coverage, 87% improvement in weak-label quality, and 46% gains in downstream weighted F1 over prior automated LF methods.

Significance. If the experimental claims hold under rigorous controls, EXPONA would advance automated data annotation by demonstrating that structured multi-level exploration plus targeted filtering can simultaneously raise coverage and precision without introducing unmeasured bias. The approach directly targets the coverage-quality trade-off that limits both LLM-synthesis and primitive-based baselines.

major comments (2)

[Experimental Evaluation] Experimental section: the abstract and results claim peak gains of 98.9% coverage, 87% quality lift, and 46% F1 improvement, yet supply no description of baseline LF implementations, number of random seeds, statistical significance tests, or the precise reliability metric and threshold used in filtering; without these controls the reported superiority cannot be assessed.
[Method] LF generation and filtering subsection: semantic LFs are produced by LLM prompting over hand-crafted primitives, but the manuscript provides no explicit bias-detection metric, cross-domain validation procedure, or ablation that isolates whether the reliability-aware filter removes LLM-induced domain skews; this leaves the central complementarity claim vulnerable on the eleven datasets.

minor comments (2)

[Method] Notation for the reliability score and the diversity objective is introduced without an accompanying equation or pseudocode block, making the filtering step difficult to re-implement.
[Results] Table captions do not list the exact number of LFs generated per method or the coverage metric definition, complicating direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with clarifications and commitments to strengthen the manuscript.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental section: the abstract and results claim peak gains of 98.9% coverage, 87% quality lift, and 46% F1 improvement, yet supply no description of baseline LF implementations, number of random seeds, statistical significance tests, or the precise reliability metric and threshold used in filtering; without these controls the reported superiority cannot be assessed.

Authors: We agree the submitted version omitted key experimental controls. In revision we will add: (1) explicit re-implementation details for all baselines drawn from their source papers, (2) all metrics reported as mean ± std over 5 random seeds, (3) paired t-test p-values for significance, and (4) the reliability metric as LF accuracy estimated on a 5% held-out validation set with threshold 0.65. These additions will allow full assessment of the reported gains. revision: yes
Referee: [Method] LF generation and filtering subsection: semantic LFs are produced by LLM prompting over hand-crafted primitives, but the manuscript provides no explicit bias-detection metric, cross-domain validation procedure, or ablation that isolates whether the reliability-aware filter removes LLM-induced domain skews; this leaves the central complementarity claim vulnerable on the eleven datasets.

Authors: The reliability filter already prunes LFs using estimated accuracy and agreement scores, which reduces noisy LLM outputs. We acknowledge the absence of an explicit bias metric. In revision we will insert: (i) a KL-divergence bias metric between LLM LF label distributions and validation ground truth, (ii) expanded cross-domain results across all eleven datasets, and (iii) an ablation isolating the filter's effect on domain skew. This will directly support the complementarity claim. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents EXPONA as an empirical framework that explores label functions across surface, structural, and semantic levels then applies reliability-aware filtering, with all performance claims (coverage up to 98.9%, quality gains up to 87%, F1 gains up to 46%) resting on direct experimental comparisons against baselines across eleven datasets. No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the abstract or described derivation. The multi-level exploration and filtering steps are implemented as procedural heuristics whose outputs are measured externally rather than defined in terms of the target metrics, rendering the reported results independent of internal circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that label functions generated from multiple perspectives will contain complementary reliable signals that can be isolated by filtering; no free parameters or new entities are quantified in the abstract.

axioms (1)

domain assumption Label functions generated across surface, structural, and semantic levels provide complementary signals that reliability-aware filtering can separate from noise and redundancy.
This premise underpins the entire EXPONA design and the reported gains in coverage and quality.

invented entities (1)

EXPONA framework no independent evidence
purpose: Automated generation and filtering of multi-level label functions for weak supervision
New named method introduced to perform the structured exploration and reliability filtering.

pith-pipeline@v0.9.0 · 5566 in / 1301 out tokens · 40938 ms · 2026-05-14T23:20:33.926337+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

D. Zha, Z. P. Bhat, K.-H. Lai, F. Yang, Z. Jiang, S. Zhong, X. Hu, Data-centric artificial intelligence: A survey, ACM Computing Sur- veys 57 (5) (2025) 1–42

work page 2025
[2]

A. Jain, H. Patel, L. Nagalapatti, N. Gupta, S. Mehta, S. Guttula, S.Mujumdar,S.Afzal,R.SharmaMittal,V.Munigala,Overviewand importanceofdataqualityformachinelearningtasks,in:Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 3561–3562

work page 2020
[3]

S. Wang, Y. Liu, Y. Xu, C. Zhu, M. Zeng, Want to reduce labeling cost? gpt-3 can help, arXiv preprint arXiv:2108.13487

work page arXiv
[4]

X. He, Z. Lin, Y. Gong, A. Jin, H. Zhang, C. Lin, J. Jiao, S. M. Yiu, N.Duan,W.Chen,etal.,Annollm:Makinglargelanguagemodelsto bebettercrowdsourcedannotators,arXivpreprintarXiv:2303.16854

work page arXiv
[5]

Settles, Active learning literature survey

B. Settles, Active learning literature survey

work page
[6]

P. Liu, L. Wang, R. Ranjan, G. He, L. Zhao, A survey on active deep learning:Frommodeldriventodatadriven,ACMComputingSurveys (CSUR) 54 (10s) (2022) 1–34

work page 2022
[7]

P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, B. B. Gupta, X. Chen, X. Wang, A survey of deep active learning, ACM computing surveys (CSUR) 54 (9) (2021) 1–40

work page 2021
[8]

N. Guan, N. Koudas, Activedp: Bridging active learning and data programming, arXiv preprint arXiv:2402.06056

work page arXiv
[9]

Nashaat, A

M. Nashaat, A. Ghosh, J. Miller, S. Quader, C. Marston, J.-F. Puget, Hybridization of active learning and data programming for labeling large industrial datasets, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 46–55

work page 2018
[10]

Vishwakarma, H

H. Vishwakarma, H. Lin, F. Sala, R. Korlakai Vinayak, Promises and pitfalls of threshold-based auto-labeling, Advances in Neural Information Processing Systems 36 (2023) 51955–51990

work page 2023
[11]

F.Wang,C.Zhang,Labelpropagationthroughlinearneighborhoods, in: Proceedings of the 23rd international conference on Machine learning, 2006, pp. 985–992

work page 2006
[12]

Chapelle, B

O. Chapelle, B. Scholkopf, A. Zien, Semi-supervised learning, IEEE Transactions on Neural Networks 20 (3) (2009) 542–542

work page 2009
[13]

D. Zhou, O. Bousquet, T. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, Advances in neural information processing systems 16

work page
[14]

A. J. Ratner, C. M. De Sa, S. Wu, D. Selsam, C. Ré, Data pro- gramming: Creating large training sets, quickly, Advances in neural information processing systems 29

work page
[15]

D. Fu, M. Chen, F. Sala, S. Hooper, K. Fatahalian, C. Ré, Fast and three-rious: Speeding up weak supervision with triplet methods, in: International conference on machine learning, PMLR, 2020, pp. 3280–3291

work page 2020
[16]

N. Das, S. Chaba, R. Wu, S. Gandhi, D. H. Chau, X. Chu, Goggles: Automaticimagelabelingwithaffinitycoding,in:Proceedingsofthe 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 1717–1732

work page 2020
[18]

Ratner, S

A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, C. Ré, Snorkel: Rapid training data creation with weak supervision, in: Proceedings oftheVLDBendowment.Internationalconferenceonverylargedata bases, Vol. 11, 2017, p. 269

work page 2017
[19]

S. Ruan, H. Liu, Z. Chen, B. Feng, K. Zhang, C. C. Cao, E. Chen, L. Chen, Cpws: Confident programmatic weak supervision for high- quality data labeling, ACM Transactions on Information Systems 43 (4) (2025) 1–26

work page 2025
[20]

T.Zhang,L.Cai,J.Li,N.Roberts,N.Guha,F.Sala,Strongerthanyou think:Benchmarkingweaksupervisiononrealistictasks,Advancesin Neural Information Processing Systems 37 (2024) 122292–122315

work page 2024
[21]

T.Brown,B.Mann,N.Ryder,M.Subbiah,J.D.Kaplan,P.Dhariwal, A.Neelakantan,P.Shyam,G.Sastry,A.Askell,etal.,Languagemod- els are few-shot learners, Advances in neural information processing systems 33 (2020) 1877–1901

work page 2020
[22]

J. Ye, J. Gao, Q. Li, H. Xu, J. Feng, Z. Wu, T. Yu, L. Kong, Zerogen: Efficient zero-shot learning via dataset generation, arXiv preprint arXiv:2202.07922

work page arXiv
[23]

Y. Meng, M. Michalski, J. Huang, Y. Zhang, T. Abdelzaher, J. Han, Tuninglanguagemodelsastrainingdatageneratorsforaugmentation- enhancedfew-shotlearning,in:InternationalConferenceonMachine Learning, PMLR, 2023, pp. 24457–24477

work page 2023
[24]

arXiv preprint arXiv:2310.19596 , year=

R. Zhang, Y. Li, Y. Ma, M. Zhou, L. Zou, Llmaaa: Making large lan- guagemodelsasactiveannotators,arXivpreprintarXiv:2310.19596

work page arXiv
[25]

Schroeder, D

H. Schroeder, D. Roy, J. Kabbara, Just put a human in the loop? investigatingllm-assistedannotationforsubjectivetasks,in:Findings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 25771–25795

work page 2025
[26]

Mazuelas, S

S. Mazuelas, S. An, S. Dasgupta, et al., Reliable programmatic weak supervision with confidence intervals for label probabilities, IEEE Transactions on Pattern Analysis and Machine Intelligence

work page
[27]

Varma, C

P. Varma, C. Ré, Snuba: Automating weak supervision to label trainingdata,in:ProceedingsoftheVLDBEndowment.International Conference on Very Large Data Bases, Vol. 12, 2018, p. 223. Lam et al.:Preprint submitted to ElsevierPage 16 of 17 Expona

work page 2018
[28]

X.Zhao,H.Ding,Z.Feng,Glara:Graph-basedlabelingruleaugmen- tationforweaklysupervisednamedentityrecognition,arXivpreprint arXiv:2104.06230

work page arXiv
[29]

T.-H.Huang,C.Cao,V.Bhargava,F.Sala,Thealchemist:Automated labeling 500x cheaper than llm data annotators, Advances in Neural Information Processing Systems 37 (2024) 62648–62672

work page 2024
[30]

N. Guan, K. Chen, N. Koudas, Datasculpt: Cost-efficient label func- tiondesignviapromptinglargelanguagemodels,in:Proceedings28th InternationalConferenceonExtendingDatabaseTechnology,EDBT, 2025, pp. 25–28

work page 2025
[31]

C. Li, A. Gilad, B. Glavic, Z. Miao, S. Roy, Refining labeling functions with limited labeled data, in: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 2025, pp. 1318–1329

work page 2025
[32]

Smith, J

R. Smith, J. A. Fries, B. Hancock, S. H. Bach, Language models in the loop: Incorporating prompting into weak supervision, ACM/JMS Journal of Data Science 1 (2) (2024) 1–30

work page 2024
[33]

A. A. Alvarez, N. X. Fincham, Automated l2 proficiency scoring: Weak supervision, large language models, and statistical guarantees, in: Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), Association for Computational Linguistics, Vienna, Austria, 2025, pp. 384–397

work page 2025
[34]

T. C. Alberto, J. V. Lochter, T. A. Almeida, Tubespam: Comment spam filtering on youtube, in: 2015 IEEE 14th international confer- ence on machine learning and applications (ICMLA), IEEE, 2015, pp. 138–143

work page 2015
[35]

T. A. Almeida, J. M. G. Hidalgo, A. Yamakami, Contributions to the study of sms spam filtering: new collection and results, in: Proceed- ings of the 11th ACM symposium on Document engineering, 2011, pp. 259–262

work page 2011
[36]

W.Ren,Y.Li,H.Su,D.Kartchner,C.Mitchell,C.Zhang,Denoising multi-source weak supervision for neural text classification, arXiv preprint arXiv:2010.04582

work page arXiv 2010
[37]

P.Malo,A.Sinha,P.Korhonen,J.Wallenius,P.Takala,Gooddebtor bad debt: Detecting semantic orientations in economic texts, Journal of the Association for Information Science and Technology 65 (4) (2014) 782–796

work page 2014
[38]

Krallinger, O

M. Krallinger, O. Rabal, S. A. Akhondi, M. P. Pérez, J. Santamaría, G.P.Rodríguez,G.Tsatsaronis,A.Intxaurrondo,J.A.López,U.Nan- dal,etal.,Overviewofthebiocreativevichemical-proteininteraction track, in: Proceedings of the sixth BioCreative challenge evaluation workshop, Vol. 1, 2017, pp. 141–146

work page 2017
[39]

Phong, N

L. Phong, N. Ha-Linh, N. Thu-Trang, N. Son, D. V. Hieu, Structured exploration and exploitation of label functions for automated data annotation. URLhttps://github.com/iSE-UET-VNU/EXPONA

work page
[40]

D. Zhu, X. Shen, M. Mosbach, A. Stephan, D. Klakow, Weaker than youthink:Acriticallookatweaklysupervisedlearning,in:Proceed- ingsofthe61stAnnualMeetingoftheAssociationforComputational Linguistics (Volume 1: Long Papers), 2023, pp. 14229–14253

work page 2023
[41]

A. P. Dawid, A. M. Skene, Maximum likelihood estimation of observer error-rates using the em algorithm, Journal of the Royal StatisticalSociety:SeriesC(AppliedStatistics)28(1)(1979)20–28

work page 1979
[42]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies,volume1(longandshortpapers),2019,pp.4171–4186

work page 2019
[43]

J. Li, H. Ding, J. Shang, J. McAuley, Z. Feng, Weakly supervised named entity tagging with learnable logical rules, arXiv preprint arXiv:2107.02282

work page arXiv
[44]

Boecking, W

B. Boecking, W. Neiswanger, E. Xing, A. Dubrawski, Interactive weak supervision: Learning useful heuristics for data labeling, arXiv preprint arXiv:2012.06046

work page arXiv 2012
[45]

Galhotra, B

S. Galhotra, B. Golshan, W.-C. Tan, Adaptive rule discovery for la- belingtextdata,in:Proceedingsofthe2021Internationalconference on management of data, 2021, pp. 2217–2225

work page 2021
[46]

Oliveira, G

V. Oliveira, G. Nogueira, T. Faleiros, R. Marcacini, Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents: V. oliveira et al., Artificial Intelligence and Law 33 (2) (2025) 361–381

work page 2025
[47]

Vishwakarma, Y

H. Vishwakarma, Y. Chen, S. J. Tay, S. S. S. Namburi, F. Sala, R.KorlakaiVinayak,Pearlsfrompebbles:Improvedconfidencefunc- tions for auto-labeling, Advances in Neural Information Processing Systems 37 (2024) 15983–16015

work page 2024
[48]

Lam, H.-L

P. Lam, H.-L. Nguyen, X.-T. D. Dang, V.-S. Tran, M.-D. Le, T.- T. Nguyen, S. Nguyen, H. D. Vo, Leveraging local and global rela- tionships for corrupted label detection, Future Generation Computer Systems 166 (2025) 107729

work page 2025
[49]

Z. Zhu, Z. Dong, Y. Liu, Detecting corrupted labels without training a model to predict, in: International conference on machine learning, PMLR, 2022, pp. 27412–27427

work page 2022
[50]

Y. Yin, Y. Feng, S. Weng, Z. Liu, Y. Yao, Y. Zhang, Z. Zhao, Z.Chen,Dynamicdatafaultlocalizationfordeepneuralnetworks,in: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engi- neering, 2023, pp. 1345–1357

work page 2023
[51]

S. Kim, S. Kang, D. Kim, J. Ok, H. Yu, Delving into instance- dependent label noise in graph data: A comprehensive study and benchmark,in:Proceedingsofthe31stACMSIGKDDConferenceon Knowledge Discovery and Data Mining V. 2, 2025, pp. 5539–5550

work page 2025
[52]

Maharana, P

A. Maharana, P. Yadav, M. Bansal, D2 pruning: Message passing for balancing diversity & difficulty in data pruning, in: The Twelfth International Conference on Learning Representations

work page
[53]

Frid-Adar, E

M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, H. Greenspan, Syntheticdataaugmentationusingganforimprovedliverlesionclas- sification,in:2018IEEE15thinternationalsymposiumonbiomedical imaging (ISBI 2018), IEEE, 2018, pp. 289–293

work page 2018
[54]

Boukerche, L

A. Boukerche, L. Zheng, O. Alfandi, Outlier detection: Methods, models, and classification, ACM Computing Surveys (CSUR) 53 (3) (2020) 1–37

work page 2020
[55]

Lam et al.:Preprint submitted to ElsevierPage 17 of 17

W.-C.Lin,C.-F.Tsai,Missingvalueimputation:areviewandanalysis of the literature (2006–2017), Artificial Intelligence Review 53 (2) (2020) 1487–1509. Lam et al.:Preprint submitted to ElsevierPage 17 of 17

work page 2006

[1] [1]

D. Zha, Z. P. Bhat, K.-H. Lai, F. Yang, Z. Jiang, S. Zhong, X. Hu, Data-centric artificial intelligence: A survey, ACM Computing Sur- veys 57 (5) (2025) 1–42

work page 2025

[2] [2]

A. Jain, H. Patel, L. Nagalapatti, N. Gupta, S. Mehta, S. Guttula, S.Mujumdar,S.Afzal,R.SharmaMittal,V.Munigala,Overviewand importanceofdataqualityformachinelearningtasks,in:Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 3561–3562

work page 2020

[3] [3]

S. Wang, Y. Liu, Y. Xu, C. Zhu, M. Zeng, Want to reduce labeling cost? gpt-3 can help, arXiv preprint arXiv:2108.13487

work page arXiv

[4] [4]

X. He, Z. Lin, Y. Gong, A. Jin, H. Zhang, C. Lin, J. Jiao, S. M. Yiu, N.Duan,W.Chen,etal.,Annollm:Makinglargelanguagemodelsto bebettercrowdsourcedannotators,arXivpreprintarXiv:2303.16854

work page arXiv

[5] [5]

Settles, Active learning literature survey

B. Settles, Active learning literature survey

work page

[6] [6]

P. Liu, L. Wang, R. Ranjan, G. He, L. Zhao, A survey on active deep learning:Frommodeldriventodatadriven,ACMComputingSurveys (CSUR) 54 (10s) (2022) 1–34

work page 2022

[7] [7]

P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, B. B. Gupta, X. Chen, X. Wang, A survey of deep active learning, ACM computing surveys (CSUR) 54 (9) (2021) 1–40

work page 2021

[8] [8]

N. Guan, N. Koudas, Activedp: Bridging active learning and data programming, arXiv preprint arXiv:2402.06056

work page arXiv

[9] [9]

Nashaat, A

M. Nashaat, A. Ghosh, J. Miller, S. Quader, C. Marston, J.-F. Puget, Hybridization of active learning and data programming for labeling large industrial datasets, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 46–55

work page 2018

[10] [10]

Vishwakarma, H

H. Vishwakarma, H. Lin, F. Sala, R. Korlakai Vinayak, Promises and pitfalls of threshold-based auto-labeling, Advances in Neural Information Processing Systems 36 (2023) 51955–51990

work page 2023

[11] [11]

F.Wang,C.Zhang,Labelpropagationthroughlinearneighborhoods, in: Proceedings of the 23rd international conference on Machine learning, 2006, pp. 985–992

work page 2006

[12] [12]

Chapelle, B

O. Chapelle, B. Scholkopf, A. Zien, Semi-supervised learning, IEEE Transactions on Neural Networks 20 (3) (2009) 542–542

work page 2009

[13] [13]

D. Zhou, O. Bousquet, T. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, Advances in neural information processing systems 16

work page

[14] [14]

A. J. Ratner, C. M. De Sa, S. Wu, D. Selsam, C. Ré, Data pro- gramming: Creating large training sets, quickly, Advances in neural information processing systems 29

work page

[15] [15]

D. Fu, M. Chen, F. Sala, S. Hooper, K. Fatahalian, C. Ré, Fast and three-rious: Speeding up weak supervision with triplet methods, in: International conference on machine learning, PMLR, 2020, pp. 3280–3291

work page 2020

[16] [16]

N. Das, S. Chaba, R. Wu, S. Gandhi, D. H. Chau, X. Chu, Goggles: Automaticimagelabelingwithaffinitycoding,in:Proceedingsofthe 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 1717–1732

work page 2020

[17] [18]

Ratner, S

A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, C. Ré, Snorkel: Rapid training data creation with weak supervision, in: Proceedings oftheVLDBendowment.Internationalconferenceonverylargedata bases, Vol. 11, 2017, p. 269

work page 2017

[18] [19]

S. Ruan, H. Liu, Z. Chen, B. Feng, K. Zhang, C. C. Cao, E. Chen, L. Chen, Cpws: Confident programmatic weak supervision for high- quality data labeling, ACM Transactions on Information Systems 43 (4) (2025) 1–26

work page 2025

[19] [20]

T.Zhang,L.Cai,J.Li,N.Roberts,N.Guha,F.Sala,Strongerthanyou think:Benchmarkingweaksupervisiononrealistictasks,Advancesin Neural Information Processing Systems 37 (2024) 122292–122315

work page 2024

[20] [21]

T.Brown,B.Mann,N.Ryder,M.Subbiah,J.D.Kaplan,P.Dhariwal, A.Neelakantan,P.Shyam,G.Sastry,A.Askell,etal.,Languagemod- els are few-shot learners, Advances in neural information processing systems 33 (2020) 1877–1901

work page 2020

[21] [22]

J. Ye, J. Gao, Q. Li, H. Xu, J. Feng, Z. Wu, T. Yu, L. Kong, Zerogen: Efficient zero-shot learning via dataset generation, arXiv preprint arXiv:2202.07922

work page arXiv

[22] [23]

Y. Meng, M. Michalski, J. Huang, Y. Zhang, T. Abdelzaher, J. Han, Tuninglanguagemodelsastrainingdatageneratorsforaugmentation- enhancedfew-shotlearning,in:InternationalConferenceonMachine Learning, PMLR, 2023, pp. 24457–24477

work page 2023

[23] [24]

arXiv preprint arXiv:2310.19596 , year=

R. Zhang, Y. Li, Y. Ma, M. Zhou, L. Zou, Llmaaa: Making large lan- guagemodelsasactiveannotators,arXivpreprintarXiv:2310.19596

work page arXiv

[24] [25]

Schroeder, D

H. Schroeder, D. Roy, J. Kabbara, Just put a human in the loop? investigatingllm-assistedannotationforsubjectivetasks,in:Findings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 25771–25795

work page 2025

[25] [26]

Mazuelas, S

S. Mazuelas, S. An, S. Dasgupta, et al., Reliable programmatic weak supervision with confidence intervals for label probabilities, IEEE Transactions on Pattern Analysis and Machine Intelligence

work page

[26] [27]

Varma, C

P. Varma, C. Ré, Snuba: Automating weak supervision to label trainingdata,in:ProceedingsoftheVLDBEndowment.International Conference on Very Large Data Bases, Vol. 12, 2018, p. 223. Lam et al.:Preprint submitted to ElsevierPage 16 of 17 Expona

work page 2018

[27] [28]

X.Zhao,H.Ding,Z.Feng,Glara:Graph-basedlabelingruleaugmen- tationforweaklysupervisednamedentityrecognition,arXivpreprint arXiv:2104.06230

work page arXiv

[28] [29]

T.-H.Huang,C.Cao,V.Bhargava,F.Sala,Thealchemist:Automated labeling 500x cheaper than llm data annotators, Advances in Neural Information Processing Systems 37 (2024) 62648–62672

work page 2024

[29] [30]

N. Guan, K. Chen, N. Koudas, Datasculpt: Cost-efficient label func- tiondesignviapromptinglargelanguagemodels,in:Proceedings28th InternationalConferenceonExtendingDatabaseTechnology,EDBT, 2025, pp. 25–28

work page 2025

[30] [31]

C. Li, A. Gilad, B. Glavic, Z. Miao, S. Roy, Refining labeling functions with limited labeled data, in: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 2025, pp. 1318–1329

work page 2025

[31] [32]

Smith, J

R. Smith, J. A. Fries, B. Hancock, S. H. Bach, Language models in the loop: Incorporating prompting into weak supervision, ACM/JMS Journal of Data Science 1 (2) (2024) 1–30

work page 2024

[32] [33]

A. A. Alvarez, N. X. Fincham, Automated l2 proficiency scoring: Weak supervision, large language models, and statistical guarantees, in: Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), Association for Computational Linguistics, Vienna, Austria, 2025, pp. 384–397

work page 2025

[33] [34]

T. C. Alberto, J. V. Lochter, T. A. Almeida, Tubespam: Comment spam filtering on youtube, in: 2015 IEEE 14th international confer- ence on machine learning and applications (ICMLA), IEEE, 2015, pp. 138–143

work page 2015

[34] [35]

T. A. Almeida, J. M. G. Hidalgo, A. Yamakami, Contributions to the study of sms spam filtering: new collection and results, in: Proceed- ings of the 11th ACM symposium on Document engineering, 2011, pp. 259–262

work page 2011

[35] [36]

W.Ren,Y.Li,H.Su,D.Kartchner,C.Mitchell,C.Zhang,Denoising multi-source weak supervision for neural text classification, arXiv preprint arXiv:2010.04582

work page arXiv 2010

[36] [37]

P.Malo,A.Sinha,P.Korhonen,J.Wallenius,P.Takala,Gooddebtor bad debt: Detecting semantic orientations in economic texts, Journal of the Association for Information Science and Technology 65 (4) (2014) 782–796

work page 2014

[37] [38]

Krallinger, O

M. Krallinger, O. Rabal, S. A. Akhondi, M. P. Pérez, J. Santamaría, G.P.Rodríguez,G.Tsatsaronis,A.Intxaurrondo,J.A.López,U.Nan- dal,etal.,Overviewofthebiocreativevichemical-proteininteraction track, in: Proceedings of the sixth BioCreative challenge evaluation workshop, Vol. 1, 2017, pp. 141–146

work page 2017

[38] [39]

Phong, N

L. Phong, N. Ha-Linh, N. Thu-Trang, N. Son, D. V. Hieu, Structured exploration and exploitation of label functions for automated data annotation. URLhttps://github.com/iSE-UET-VNU/EXPONA

work page

[39] [40]

D. Zhu, X. Shen, M. Mosbach, A. Stephan, D. Klakow, Weaker than youthink:Acriticallookatweaklysupervisedlearning,in:Proceed- ingsofthe61stAnnualMeetingoftheAssociationforComputational Linguistics (Volume 1: Long Papers), 2023, pp. 14229–14253

work page 2023

[40] [41]

A. P. Dawid, A. M. Skene, Maximum likelihood estimation of observer error-rates using the em algorithm, Journal of the Royal StatisticalSociety:SeriesC(AppliedStatistics)28(1)(1979)20–28

work page 1979

[41] [42]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies,volume1(longandshortpapers),2019,pp.4171–4186

work page 2019

[42] [43]

J. Li, H. Ding, J. Shang, J. McAuley, Z. Feng, Weakly supervised named entity tagging with learnable logical rules, arXiv preprint arXiv:2107.02282

work page arXiv

[43] [44]

Boecking, W

B. Boecking, W. Neiswanger, E. Xing, A. Dubrawski, Interactive weak supervision: Learning useful heuristics for data labeling, arXiv preprint arXiv:2012.06046

work page arXiv 2012

[44] [45]

Galhotra, B

S. Galhotra, B. Golshan, W.-C. Tan, Adaptive rule discovery for la- belingtextdata,in:Proceedingsofthe2021Internationalconference on management of data, 2021, pp. 2217–2225

work page 2021

[45] [46]

Oliveira, G

V. Oliveira, G. Nogueira, T. Faleiros, R. Marcacini, Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents: V. oliveira et al., Artificial Intelligence and Law 33 (2) (2025) 361–381

work page 2025

[46] [47]

Vishwakarma, Y

H. Vishwakarma, Y. Chen, S. J. Tay, S. S. S. Namburi, F. Sala, R.KorlakaiVinayak,Pearlsfrompebbles:Improvedconfidencefunc- tions for auto-labeling, Advances in Neural Information Processing Systems 37 (2024) 15983–16015

work page 2024

[47] [48]

Lam, H.-L

P. Lam, H.-L. Nguyen, X.-T. D. Dang, V.-S. Tran, M.-D. Le, T.- T. Nguyen, S. Nguyen, H. D. Vo, Leveraging local and global rela- tionships for corrupted label detection, Future Generation Computer Systems 166 (2025) 107729

work page 2025

[48] [49]

Z. Zhu, Z. Dong, Y. Liu, Detecting corrupted labels without training a model to predict, in: International conference on machine learning, PMLR, 2022, pp. 27412–27427

work page 2022

[49] [50]

Y. Yin, Y. Feng, S. Weng, Z. Liu, Y. Yao, Y. Zhang, Z. Zhao, Z.Chen,Dynamicdatafaultlocalizationfordeepneuralnetworks,in: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engi- neering, 2023, pp. 1345–1357

work page 2023

[50] [51]

S. Kim, S. Kang, D. Kim, J. Ok, H. Yu, Delving into instance- dependent label noise in graph data: A comprehensive study and benchmark,in:Proceedingsofthe31stACMSIGKDDConferenceon Knowledge Discovery and Data Mining V. 2, 2025, pp. 5539–5550

work page 2025

[51] [52]

Maharana, P

A. Maharana, P. Yadav, M. Bansal, D2 pruning: Message passing for balancing diversity & difficulty in data pruning, in: The Twelfth International Conference on Learning Representations

work page

[52] [53]

Frid-Adar, E

M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, H. Greenspan, Syntheticdataaugmentationusingganforimprovedliverlesionclas- sification,in:2018IEEE15thinternationalsymposiumonbiomedical imaging (ISBI 2018), IEEE, 2018, pp. 289–293

work page 2018

[53] [54]

Boukerche, L

A. Boukerche, L. Zheng, O. Alfandi, Outlier detection: Methods, models, and classification, ACM Computing Surveys (CSUR) 53 (3) (2020) 1–37

work page 2020

[54] [55]

Lam et al.:Preprint submitted to ElsevierPage 17 of 17

W.-C.Lin,C.-F.Tsai,Missingvalueimputation:areviewandanalysis of the literature (2006–2017), Artificial Intelligence Review 53 (2) (2020) 1487–1509. Lam et al.:Preprint submitted to ElsevierPage 17 of 17

work page 2006