Recognition: unknown
PrismaDV: Automated Task-Aware Data Unit Test Generation
Pith reviewed 2026-05-09 22:10 UTC · model grok-4.3
The pith
PrismaDV generates executable data unit tests by jointly analyzing downstream task code and dataset profiles to capture implicit assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PrismaDV shows that a compound AI system can identify data access patterns and infer implicit data assumptions from downstream task code together with dataset profiles, then produce executable unit tests whose failures correspond to actual impacts on task correctness.
What carries the argument
The central mechanism is the compound pipeline that extracts data access patterns from task code, infers assumptions by combining those patterns with dataset profiles, generates executable tests, and uses SIFTA to adapt the prompts from scarce execution outcomes of tests and tasks.
If this is right
- The generated tests more accurately reflect the end-to-end impact of data errors than task-agnostic or prior task-aware baselines.
- SIFTA learns prompts that outperform both hand-written prompts and those produced by generic prompt optimizers.
- Tests adapt automatically to specific datasets and tasks as execution feedback accumulates.
- The method scales to benchmarks containing 60 tasks across five different datasets.
Where Pith is reading between the lines
- Task-aware validation of this kind could be inserted into data pipelines to catch relevant errors before models or reports are produced.
- Self-improving test generators that refine themselves from their own run results become feasible.
- The same joint code-and-data analysis pattern may apply to other consumers such as database queries or API endpoints.
Load-bearing premise
That joint analysis of task code and dataset profiles is sufficient to accurately infer the implicit data assumptions that matter for end-to-end task correctness.
What would settle it
A dataset and task pair where data errors that cause task failure are not flagged by the generated tests, or where the tests flag data variations that leave task outcomes unchanged.
Figures
read the original abstract
Data is a central resource for modern enterprises, and data validation is essential for ensuring the reliability of downstream applications. However, existing automated data unit testing frameworks are largely task-agnostic: they validate datasets without considering the semantics and requirements of the code that consumes the data. We present PrismaDV, a compound AI system that analyzes downstream task code together with dataset profiles to identify data access patterns, infer implicit data assumptions, and generate task-aware executable data unit tests. To further adapt the data unit tests over time to specific datasets and downstream tasks, we propose "Selective Informative Feedback for Task Adaptation" (SIFTA), a prompt-optimization framework that leverages the scarce outcomes from the execution of data unit tests and downstream tasks. We evaluate PrismaDV on two new benchmarks spanning 60 tasks across five datasets, where it consistently outperforms both task-agnostic and task-aware baselines in generating unit tests that reflect the end-to-end impact of data errors. Furthermore, we show that with SIFTA, we can automatically learn prompts for PrismaDV's modules that outperform prompts written by hand or generated from a generic prompt optimizer. We publicly release our benchmarks and prototype implementation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PrismaDV, a compound AI system that jointly analyzes downstream task code and dataset profiles to identify data access patterns, infer implicit assumptions, and generate executable task-aware data unit tests. It introduces SIFTA, a prompt-optimization method that uses scarce execution outcomes from unit tests and tasks to adapt prompts automatically. The system is evaluated on two newly introduced benchmarks spanning 60 tasks across five datasets, with claims of consistent outperformance over both task-agnostic and task-aware baselines in capturing the end-to-end effects of data errors. SIFTA is further shown to yield prompts superior to hand-written ones or those from generic optimizers. The benchmarks and prototype implementation are released publicly.
Significance. If the empirical claims hold under detailed scrutiny, the work would represent a meaningful step toward context-sensitive data validation that accounts for downstream task semantics, which is relevant for reliable data pipelines and ML systems. The release of new benchmarks and the SIFTA adaptation framework are positive contributions that could enable further research; the public code release supports reproducibility.
major comments (2)
- [Evaluation / Results] Evaluation section (results on the two benchmarks): The abstract and summary claim consistent outperformance on 60 tasks, but no quantitative metrics (e.g., precision/recall of error detection, F1 scores, or end-to-end task accuracy deltas), statistical significance tests, or detailed baseline implementations are referenced. This makes it impossible to assess the magnitude or robustness of the reported gains and is load-bearing for the central empirical claim.
- [§3] §3 (PrismaDV architecture and inference of implicit assumptions): The description of how the compound AI system infers implicit data assumptions from static code analysis plus profiles does not address cases where critical assumptions (e.g., statistical correlations, business-rule invariants, or runtime error-propagation paths) are not explicitly visible in the code. Without concrete mechanisms, examples, or ablation showing recovery of such properties, the task-awareness advantage over baselines risks being an artifact of benchmark construction rather than a general property.
minor comments (2)
- [Abstract] The abstract refers to 'two new benchmarks spanning 60 tasks across five datasets' but does not name the datasets or tasks; adding this information would improve clarity.
- [§4] Notation for SIFTA components (e.g., how 'scarce outcomes' are formalized as feedback signals) could be made more precise to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. We believe these revisions will enhance the clarity and strength of our work.
read point-by-point responses
-
Referee: [Evaluation / Results] Evaluation section (results on the two benchmarks): The abstract and summary claim consistent outperformance on 60 tasks, but no quantitative metrics (e.g., precision/recall of error detection, F1 scores, or end-to-end task accuracy deltas), statistical significance tests, or detailed baseline implementations are referenced. This makes it impossible to assess the magnitude or robustness of the reported gains and is load-bearing for the central empirical claim.
Authors: We appreciate the referee's observation regarding the presentation of our evaluation results. While the evaluation section includes comparative results across the 60 tasks on the two benchmarks, we agree that the manuscript would benefit from more explicit quantitative metrics and details. In the revised manuscript, we will expand the evaluation section to include specific values for precision, recall, and F1 scores of the generated unit tests in detecting data errors, report deltas in end-to-end task accuracy, include statistical significance tests (e.g., paired t-tests or Wilcoxon tests with p-values), and provide detailed descriptions of the baseline implementations, including how the task-aware baselines were adapted. This will allow readers to better assess the magnitude and robustness of the improvements. revision: yes
-
Referee: [§3] §3 (PrismaDV architecture and inference of implicit assumptions): The description of how the compound AI system infers implicit data assumptions from static code analysis plus profiles does not address cases where critical assumptions (e.g., statistical correlations, business-rule invariants, or runtime error-propagation paths) are not explicitly visible in the code. Without concrete mechanisms, examples, or ablation showing recovery of such properties, the task-awareness advantage over baselines risks being an artifact of benchmark construction rather than a general property.
Authors: We thank the referee for this insightful comment on the architectural description. The PrismaDV system combines static analysis of downstream code with dataset profiles to identify access patterns and infer assumptions, with the LLM components aiding in synthesizing these into test cases. However, we acknowledge that the current §3 does not sufficiently detail the handling of implicit assumptions not directly visible in the code. In the revision, we will augment §3 with additional examples demonstrating inference of statistical correlations (via profile analysis), business-rule invariants (through contextual LLM reasoning), and error-propagation paths (leveraging task execution feedback). We will also add an ablation study that isolates the impact of these inference capabilities, to demonstrate that the advantages are general and not specific to the benchmark design. revision: yes
Circularity Check
No circularity in empirical system evaluation
full rationale
The paper describes an empirical compound AI system (PrismaDV) and prompt optimizer (SIFTA) whose central claims rest on benchmark comparisons against task-agnostic and task-aware baselines across 60 tasks. No mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided abstract or description that reduce any result to its inputs by construction. The evaluation uses newly introduced benchmarks and reports outperformance on end-to-end impact metrics, which are independent of internal redefinitions. This is a standard self-contained empirical contribution without load-bearing circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, and Nan Tang. 2016. De- tecting data errors: Where are we and what needs to be done?Proceedings of the VLDB Endowment9, 12 (2016), 993–1004
2016
-
[2]
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, et al. 2025. Gepa: Reflective prompt evolution can outperform reinforcement learning.arXiv preprint arXiv:2507.19457(2025)
work page internal anchor Pith review arXiv 2025
-
[3]
Amazon. 2025. Automatic Suggestion of Constraints. https://github.com/ awslabs/deequ/blob/master/src/main/scala/com/amazon/deequ/examples/ constraint_suggestion_example.md. [Online; accessed March-2025]
2025
- [4]
-
[5]
Jeffery Cao, Lampros Flokas, Yujian Xu, Eugene Wu, Xu Chu, and Cong Yu. 2025. Prompt Editor: A Taxonomy-driven System for Guided LLM Prompt Develop- ment in Enterprise Settings. InCompanion of the 2025 International Conference on Management of Data(Berlin, Germany)(SIGMOD/PODS ’25). Association for Computing Machinery, New York, NY, USA, 59–62. doi:10.114...
- [6]
-
[7]
Qixu Chen, Yeye He, Raymond Chi-Wing Wong, Weiwei Cui, Song Ge, Haidong Zhang, Dongmei Zhang, and Surajit Chaudhuri. 2025. Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables.Proc. ACM Manag. Data3, 3, Article 133 (June 2025), 27 pages. doi:10.1145/3725396
-
[8]
Qixu Chen, Yeye He, Raymond Chi-Wing Wong, Weiwei Cui, Song Ge, Haidong Zhang, Dongmei Zhang, and Surajit Chaudhuri. 2025. Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables.Pro- ceedings of the ACM on Management of Data3, 3 (2025), 1–27
2025
-
[9]
CNN. 2023. A corrupt file led to the FAA ground stoppage. It was also found in the backup system. https://edition.cnn.com/travel/article/faa-ground-stop- causes/index.html. [Online; accessed Aug-2025]
2023
-
[10]
Databricks. 2025. Manage data quality with pipeline expectations. https://docs. databricks.com/aws/en/dlt/expectations. [Online; accessed Aug-2025]
2025
-
[11]
Saman Dehghan. 2024. Assessing Code Reasoning in Large Language Models: A Literature Review of Benchmarks and Future Directions. (2024)
2024
-
[12]
Sijie Dong, Soror Sahri, Themis Palpanas, and Qitong Wang. 2025. Automated Data Quality Validation in an End-to-End GNN Framework. (2025)
2025
-
[13]
Great Expectations. 2024. Great Expectations. https://greatexpectations.io/. [Online; accessed January-2025]
2024
-
[14]
Meihao Fan, Ju Fan, Nan Tang, Lei Cao, Guoliang Li, and Xiaoyong Du. 2025. AutoPrep: Natural Language Question-Aware Data Preparation with a Multi- Agent Framework.PVLDB18, 10 (2025), 3504–3517. https://www.vldb.org/ pvldb/vol18/p3504-fan.pdf
2025
-
[15]
Anna Fariha, Ashish Tiwari, Alexandra Meliou, Arjun Radhakrishna, and Sumit Gulwani. 2021. CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning(SIGMOD ’21). Association for Computing Machinery, New York, NY, USA, 2706–2710. doi:10.1145/3448016.3452750
-
[16]
Saeed Fathollahzadeh, Essam Manfsour, and Matthias Boehm. 2025. Demonstrat- ing CatDB: LLM-based Generation of Data-centric ML Pipelines. InCompanion of the 2025 International Conference on Management of Data. 87–90
2025
- [17]
-
[18]
Lampros Flokas, Jeffery Cao, Yujian Xu, Eugene Wu, Xu Chu, and Cong Yu. 2025. Towards a Framework for Hierarchical Text Segmentation using Large Language Models. InProceedings of the Workshop on Data Management for End-to-End Machine Learning. 1–9
2025
-
[19]
Sainyam Galhotra, Anna Fariha, Raoni Lourenço, Juliana Freire, Alexandra Me- liou, and Divesh Srivastava. 2022. Dataprism: Exposing disconnect between data and systems. InProceedings of the 2022 International Conference on Management of Data. 217–231
2022
-
[20]
Google. 2023. Deliver trusted insights with Dataplex data profiling and automatic data quality. https://cloud.google.com/blog/products/data-analytics/dataplex- data-profiling-and-automatic-data-quality-are-ga?hl=en. [Online; accessed Aug-2025]
2023
-
[21]
Stefan Grafberger, Hao Chen, Olga Ovcharenko, and Sebastian Schelter. 2025. Towards Regaining Control over Messy Machine Learning Pipelines. InWorkshop on Data-AI Systems (DAIS) at ICDE
2025
-
[22]
Borgwardt, Malte J
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test.Journal of Machine Learning Research13, Mar (2012), 723–773
2012
-
[23]
Alex Gu, Baptiste Rozière, Hugh Leather, Armando Solar-Lezama, Gabriel Syn- naeve, and Sida I Wang. 2024. Cruxeval: A benchmark for code reasoning, understanding and execution.arXiv preprint arXiv:2401.03065(2024)
work page internal anchor Pith review arXiv 2024
- [24]
-
[25]
Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. [n. d.]. Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers. InThe Twelfth International Conference on Learning Representations
-
[26]
Sairam Gurajada, Eser Kandogan, and Sajjadur Rahman. 2025. Effectiveness of Prompt Optimization in NL2SQL Systems. InNovel Optimizations for Vision- ary AI Systems Workshop at SIGMOD 2025. https://openreview.net/forum?id= BnLbe5eQaP
2025
-
[27]
GXCloud. 2025. ExpectAI. https://greatexpectations.io/blog/gx-expectAI%20/. [Online; accessed Aug-2025]
2025
-
[28]
Hazar Harmouch and Felix Naumann. 2017. Cardinality estimation: An experi- mental survey.Proceedings of the VLDB Endowment11, 4 (2017), 499–512
2017
-
[29]
Alireza Heidari, Joshua McGrath, Ihab F Ilyas, and Theodoros Rekatsinas. 2019. Holodetect: Few-shot learning for error detection. InProceedings of the 2019 International Conference on Management of Data. 829–846
2019
-
[30]
Zachary Huang. 2025. PocketFlow. https://github.com/The-Pocket/PocketFlow. [Online; accessed Aug-2025]
2025
-
[31]
Zhipeng Huang and Yeye He. 2018. Auto-detect: Data-driven error detection in tables. InProceedings of the 2018 International Conference on Management of Data. 1377–1392
2018
-
[32]
Zezhou Huang and Eugene Wu. 2024. Cocoon: Semantic table profiling using large language models. InProceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics. 1–7
2024
- [33]
-
[34]
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770(2023)
work page internal anchor Pith review arXiv 2023
-
[35]
Joe Hellerstein. 2024. The Data School with Professor Joe Hellerstein – Big Shifts in Data and Analytics. https://www.youtube.com/watch?v=-J0dy3jtLDk. Online; accessed 12 Jan. 2026
2024
-
[36]
Eser Kandogan, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Sairam Gurajada, and Estevam Hruschka. 2025. Orchestrating Agents and Data for Enterprise: A Blueprint Architecture for Compound AI. In2025 IEEE 41st International Confer- ence on Data Engineering Workshops (ICDEW). 18–27. doi:10.1109/ICDEW67478. 2025.00007
-
[37]
Zohar Karnin, Kevin Lang, and Edo Liberty. 2016. Optimal quantile approxima- tion in streams. In2016 ieee 57th annual symposium on foundations of computer science (focs). IEEE, 71–78
2016
-
[38]
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, Heather Miller, et al. [n. d.]. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. InR0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models
-
[39]
Hoa Thi Le, Angela Bonifati, and Andrea Mauri. 2025. Graph Consistency Rule Mining with LLMs: an Exploratory Study. (2025)
2025
- [40]
- [41]
-
[42]
Changshu Liu, Shizhuo Dylan Zhang, Ali Reza Ibrahimzada, and Reyhaneh Jabbarvand. 2024. Codemind: A framework to challenge large language models for code reasoning.arXiv preprint arXiv:2402.09664(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[43]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In2008 eighth ieee international conference on data mining. IEEE, 413–422
2008
-
[44]
Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, and LINGMING ZHANG. 2024. RepoQA: Evaluating Long Context Code Understanding. InFirst Workshop on Long-Context Foundation Models @ ICML 2024. https://openreview.net/forum?id=hK9YSrFuGf
2024
-
[45]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambro- sio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al. [n. d.]. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understand- ing and Generation. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)
-
[46]
Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro Fernandez, Samuel Mad- den, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2019. Raha: A Configuration-Free Error Detection System.SIGMOD(2019)
2019
-
[47]
Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an llm to help with code understanding. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13. 13 Hao Chen, Arnab Phani, & Sebastian Schelter
2024
-
[48]
Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an LLM to Help With Code Understanding. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal)(ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 97, 13 pages. doi:10.1145/3597503.3639187
-
[49]
Mohammad Hossein Namaki, Avrilia Floratou, Fotis Psallidas, Subru Krishnan, Ashvin Agrawal, Yinghui Wu, Yiwen Zhu, and Markus Weimer. 2020. Vamsa: Automated provenance tracking in data science scripts. InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 1542–1551
2020
-
[50]
Dung Manh Nguyen, Thang Chau Phan, Nam Le Hai, Tien-Thong Doan, Nam V Nguyen, Quang Pham, and Nghi DQ Bui. [n. d.]. CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs. InThe Thirteenth International Conference on Learning Representa- tions
-
[51]
David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini, and Krishnaram Kenthapadi. 2022. Amazon sagemaker model monitor: A system for real-time insights into deployed machine learning models. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3671–3681
2022
-
[52]
Alexander Novikov, Ngân V ˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. 2025. AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131(2025)
work page internal anchor Pith review arXiv 2025
-
[53]
Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab. 2024. Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Asso...
-
[54]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems32 (2019)
2019
-
[55]
Alberto Sánchez Pérez, Alaa Boukhary, Paolo Papotti, Luis Castejón Lozano, and Adam Elwood. 2025. An LLM-Based Approach for Insight Generation in Data Analysis. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Tech- nologies (Volume 1: Long Papers), Luis Chiruzzo, ...
-
[56]
Mitchell, and Estevam Hruschka
Pouya Pezeshkpour, Eser Kandogan, Nikita Bhutani, Sajjadur Rahman, Tom M. Mitchell, and Estevam Hruschka. 2024. Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions.CoRR abs/2402.01108 (2024). https://doi.org/10.48550/arXiv.2402.01108
-
[57]
Neoklis Polyzotis, Martin Zinkevich, Sudip Roy, Eric Breck, and Steven Whang
-
[58]
Data validation for machine learning.MLSys1 (2019), 334–347
2019
-
[59]
Sergey Redyuk, Zoi Kaoudi, Volker Markl, and Sebastian Schelter. 2021. Au- tomating Data Quality Validation for Dynamic Data Ingestion.. InEDBT. 61–72
2021
-
[60]
Kenneth A Ross, Divesh Srivastava, Peter J Stuckey, and S Sudarshan. 1998. Foundations of aggregation constraints.Theoretical Computer Science193, 1-2 (1998), 149–179
1998
-
[61]
Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2015. On challenges in machine learning model management.IEEE Data Engineering Bullettin(2015)
2015
-
[62]
Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biess- mann, and Andreas Grafberger. 2018. Automating large-scale data quality verifi- cation.Proceedings of the VLDB Endowment11, 12 (2018), 1781–1794
2018
-
[63]
Sebastian Schelter, Tammo Rukat, and Felix Biessmann. 2021. JENGA: A frame- work to study the impact of data errors on the predictions of machine learning models.EDBT(2021)
2021
-
[64]
Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural computation13, 7 (2001), 1443–1471
2001
-
[65]
Amazon Web Services. 2025. AWS Glue Data Quality. https://docs.aws.amazon. com/glue/latest/dg/glue-data-quality.html. [Online; accessed Aug-2025]
2025
-
[66]
Amazon Web Services. 2025. pyDeequ. https://github.com/awslabs/python- deequ. [Online; accessed Aug-2025]
2025
- [67]
-
[68]
Shreya Shankar, Labib Fawaz, Karl Gyllstrom, and Aditya Parameswaran. 2023. Automatic and precise data validation for machine learning. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2198–2207
2023
-
[69]
Shreya Shankar, Haotian Li, Parth Asawa, Madelon Hulsebos, Yiming Lin, J. D. Zamfirescu-Pereira, Harrison Chase, Will Fu-Hinthorn, Aditya G. Parameswaran, and Eugene Wu. 2024. spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines.Proc. VLDB Endow.17, 12 (Aug. 2024), 4173–4186. doi:10.14778/3685800.3685835
-
[70]
Chen Shen, Jin Wang, Sajjadur Rahman, and Eser Kandogan. 2024. Demonstration of a Multi-agent Framework for Text to SQL Applications with Large Language Models. InCIKM. 5280–5283. doi:10.1145/3627673.3679216
-
[71]
Jie Song and Yeye He. 2021. Auto-validate: Unsupervised data validation us- ing data-domain patterns inferred from data lakes. InProceedings of the 2021 International Conference on Management of Data. 1678–1691
2021
- [72]
-
[73]
Tensorflow. 2025. TensorFlow Data Validation - An Example of a Key Component of TensorFlow Extended. https://colab.research.google.com/github/tensorflow/ tfx/blob/master/docs/tutorials/data_validation/tfdv_basic.ipynb. [Online; ac- cessed March-2025]
2025
-
[74]
Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Shi Han, Dongmei Zhang, and Surajit Chaudhuri. 2023. Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4991–5003
2023
-
[75]
The Verge. 2020. Excel spreadsheet error blamed for UK’s 16,000 missing coronavirus cases. https://www.theverge.com/2020/10/5/21502141/uk-missing- coronavirus-cases-excel-spreadsheet-error. [Online; accessed Aug-2025]
2020
-
[76]
Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, Xiaoheng Xie, and Xi- angyu Zhang. 2024. LLMDFA: analyzing dataflow in code with large language models.Advances in Neural Information Processing Systems37 (2024), 131545– 131574
2024
- [77]
-
[78]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reason- ing in large language models.Advances in neural information processing systems 35 (2022), 24824–24837
2022
-
[79]
Wired. 2018. Timeline of trouble: how the TSB IT meltdown un- folded. https://www.theguardian.com/business/2018/jun/06/timeline-of-trouble- how-the-tsb-it-meltdown-unfolded. [Online; accessed Aug-2025]
2018
-
[80]
Wired. 2020. How a Facebook Bug Took Down Your Favorite iOS Apps. https: //www.wired.com/story/facebook-sdk-ios-apps-spotify-tiktok-crash/. [Online; accessed Aug-2025]
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.