The authors create and release a large real-world dirty postal address dataset with ground truth to benchmark data cleaning methods and highlight limitations of existing approaches.
Arocena, Boris Glavic, Giansalvatore Mecca, Renée J
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DB 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LasRepair++ pairs an LLM instructor with an SLM corrector, refines context via EM, and down-weights uncertain repairs using column-calibrated confidence, reporting 18.1% average F1 gain over baselines on data repair tasks.
citing papers explorer
-
Clean Me If You Can: A Large Collection of Real-World Addresses for Data Cleaning Benchmarking
The authors create and release a large real-world dirty postal address dataset with ground truth to benchmark data cleaning methods and highlight limitations of existing approaches.
-
Collaborative Large and Small Language Models for Accurate and Scalable Data Repair
LasRepair++ pairs an LLM instructor with an SLM corrector, refines context via EM, and down-weights uncertain repairs using column-calibrated confidence, reporting 18.1% average F1 gain over baselines on data repair tasks.