MaDI-Bench: An End-to-End Data Integration Benchmark

Aaron Steiner; Christian Bizer; Ralph Peeters

arxiv: 2606.30371 · v1 · pith:MU57QG24new · submitted 2026-06-29 · 💻 cs.DB · cs.CL

MaDI-Bench: An End-to-End Data Integration Benchmark

Aaron Steiner , Ralph Peeters , Christian Bizer This is my paper

Pith reviewed 2026-06-30 03:25 UTC · model grok-4.3

classification 💻 cs.DB cs.CL

keywords data integrationbenchmarkschema matchingentity matchingdata fusionend-to-end evaluationrelational tables

0 comments

The pith

MaDI-Bench supplies the first public benchmark for complete end-to-end relational data integration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Data integration requires a sequence of interdependent steps including schema matching, value normalization, entity blocking and matching, and data fusion. Existing benchmarks test these steps separately or omit parts of the pipeline, leaving no standard way to evaluate full systems. The paper introduces MaDI-Bench with base tasks across several domains that each demand the entire pipeline plus a generic method for generating task variants. Validation with human-engineered, best-of-breed, and LLM-based pipelines shows the benchmark can track both step-wise and overall performance.

Core claim

MaDI-Bench contributes a set of base end-to-end data integration tasks spanning several application domains, each requiring the full schema matching, value normalization, entity matching, and conflict resolution pipeline, together with a generic method for deriving task variants that mitigates rapid benchmark saturation as data integration systems advance.

What carries the argument

The Mannheim Data Integration Benchmark consisting of base end-to-end tasks across domains and a generic variant-derivation method that together support evaluation of the full integration process.

Load-bearing premise

The chosen base tasks across domains plus the variant-derivation method are sufficient to represent realistic integration challenges and to keep the benchmark from saturating quickly as systems advance.

What would settle it

New integration systems reach near-perfect end-to-end accuracy across all current variants without needing additional tasks, or rankings on the benchmark diverge sharply from performance on real-world integration projects.

Figures

Figures reproduced from arXiv: 2606.30371 by Aaron Steiner, Christian Bizer, Ralph Peeters.

read the original abstract

Data integration combines heterogeneous data sets into a single, coherent representation. Data integration involves a sequence of interdependent tasks including schema matching, value normalization, entity blocking, entity matching, and data fusion. Existing benchmarks either evaluate these steps in isolation or cover only incomplete versions of the data integration pipeline, omitting specific steps. The lack of public end-to-end data integration benchmarks hinders research on data integration methods that address the integration process as a whole. This paper fills this gap by introducing the Mannheim Data Integration Benchmark (MaDI-Bench), the first benchmark for the end-to-end integration of relational tables covering all steps of the integration process. MaDI-Bench contributes (i) a set of base end-to-end data integration tasks spanning several application domains, each requiring the full schema matching, value normalization, entity matching, and conflict resolution pipeline; and (ii) a generic method for deriving task variants that mitigates rapid benchmark saturation as data integration systems advance. We validate the benchmark using human-engineered pipelines, a best-of-breed pipeline, and an LLM-based pipeline. The validation demonstrates the utility of the benchmark for measuring the step-wise as well as the end-to-end performance of data integration pipelines. All benchmark artifacts are available for public download.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MaDI-Bench adds a full-pipeline benchmark and variant method for data integration, which is useful if the tasks hold up under scrutiny, but the abstract leaves the realism and longevity questions open.

read the letter

The paper's main contribution is releasing MaDI-Bench as the first public set of end-to-end relational table integration tasks that run schema matching, normalization, blocking, matching, and fusion in one go, plus a generic way to spin up variants so the benchmark does not get solved overnight. That fills a clear gap left by prior work that stopped at single steps or partial chains.

They back it with base tasks drawn from several domains and show results from human pipelines, a best-of-breed combination, and an LLM pipeline. The public release of artifacts is straightforward and helpful for anyone who wants to run the same tests.

The soft spot is exactly the one the stress-test flags: whether the chosen base tasks and the variant procedure actually force non-trivial use of every step and keep producing hard instances as systems get better. If the domains stay narrow or the variants mostly permute surface features, the end-to-end numbers will not generalize and the benchmark will saturate. The abstract does not give the selection criteria or quantitative difficulty numbers, so it is impossible to judge from this text alone whether that risk is managed.

This is for database researchers and practitioners who need to compare complete integration pipelines rather than isolated components. Anyone building or evaluating LLM-assisted integration tools would find the validation section relevant.

I would send it to peer review. A usable end-to-end benchmark is worth referee time even if the task construction needs more documentation and stress-testing in revision.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce MaDI-Bench as the first benchmark for end-to-end data integration of relational tables, covering all steps of the integration process including schema matching, value normalization, entity blocking, entity matching, and data fusion. It provides a set of base tasks across domains and a generic method for deriving variants to prevent rapid saturation, validated using human-engineered, best-of-breed, and LLM-based pipelines to show utility for measuring step-wise and end-to-end performance.

Significance. If the benchmark tasks are representative of real-world challenges and the variant method effectively maintains difficulty, this work could provide a valuable standardized resource for advancing research on complete data integration pipelines, addressing the current lack of such benchmarks.

major comments (2)

[Abstract] Abstract: The central claim that MaDI-Bench fills the gap as the first true end-to-end benchmark rests on the assumption that the chosen base tasks across domains plus the generic variant-derivation method produce instances requiring all pipeline steps in a non-trivial way and remain challenging; the provided text gives no details on task construction, domain selection criteria, or how variants introduce structural/semantic difficulties beyond surface features, leaving the realism and anti-saturation properties unverified.
[Validation] Validation description: The validation with three pipeline types is asserted to demonstrate utility for step-wise and end-to-end measurement, but without reported quantitative results, task statistics, or exclusion criteria, the support for these claims cannot be assessed from the manuscript.

minor comments (1)

[Abstract] Abstract: The enumerated steps include 'entity blocking' and 'conflict resolution' but the opening list omits blocking and uses 'data fusion'; ensure consistent terminology for the covered pipeline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive suggestions. We address each of the major comments below, outlining our planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that MaDI-Bench fills the gap as the first true end-to-end benchmark rests on the assumption that the chosen base tasks across domains plus the generic variant-derivation method produce instances requiring all pipeline steps in a non-trivial way and remain challenging; the provided text gives no details on task construction, domain selection criteria, or how variants introduce structural/semantic difficulties beyond surface features, leaving the realism and anti-saturation properties unverified.

Authors: The referee correctly identifies that the manuscript would benefit from more explicit details on task construction and the variant method to substantiate the central claims. We will add a new subsection in the benchmark description that outlines the domain selection criteria, emphasizing diversity in schema complexity, data volume, and heterogeneity across domains. Additionally, we will elaborate on how the generic variant-derivation method introduces structural and semantic difficulties beyond surface-level changes, including examples of non-trivial variations. This will also include preliminary analysis supporting the realism and resistance to saturation. revision: yes
Referee: [Validation] Validation description: The validation with three pipeline types is asserted to demonstrate utility for step-wise and end-to-end measurement, but without reported quantitative results, task statistics, or exclusion criteria, the support for these claims cannot be assessed from the manuscript.

Authors: We agree that the validation claims require stronger quantitative backing. We will revise the validation section to include detailed quantitative results for the human-engineered, best-of-breed, and LLM-based pipelines, such as precision, recall, and F1 scores for each integration step and end-to-end. We will also report task statistics (e.g., number of source tables, attributes, and records) and specify exclusion criteria for any data points or pipeline runs. These additions will allow readers to better assess the utility for step-wise and end-to-end measurement. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark definition is the explicit contribution

full rationale

The paper defines and releases MaDI-Bench as a set of base end-to-end integration tasks plus a variant-generation procedure. No derivation, equation, or prediction is claimed; the central claim is simply that these artifacts constitute the first public end-to-end benchmark. No self-citations are load-bearing for any result, no parameters are fitted and then relabeled as predictions, and no uniqueness theorem or ansatz is smuggled in. The work is therefore self-contained by construction as a benchmark paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the paper defines concrete benchmark tasks using standard data-integration concepts already present in the literature.

pith-pipeline@v0.9.1-grok · 5747 in / 1121 out tokens · 61290 ms · 2026-06-30T03:25:18.899783+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 5 canonical work pages · 1 internal anchor

[1]

A. Doan, A. Halevy, and Z. Ives,Principles of Data Integration. Morgan Kaufmann, 2012

2012
[2]

X. L. Dong and D. Srivastava,Big data integration. Morgan & Claypool Publishers, 2015

2015
[3]

Generic Schema Matching, Ten Years Later,

P. A. Bernstein, J. Madhavan, and E. Rahm, “Generic Schema Matching, Ten Years Later,”Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 695–701, 2011

2011
[4]

An Overview of End-to-End Entity Resolution for Big Data,

V . Christophides, V . Efthymiou, T. Palpanaset al., “An Overview of End-to-End Entity Resolution for Big Data,”ACM Computing Surveys, vol. 53, no. 6, pp. 1–42, 2020

2020
[5]

Data Fusion,

J. Bleiholder and F. Naumann, “Data Fusion,”ACM Computing Surveys, vol. 41, no. 1, pp. 1–41, 2009

2009
[6]

Automatic End-to-End Data Integration using Large Language Models,

A. Steiner and C. Bizer, “Automatic End-to-End Data Integration using Large Language Models,” 2026

2026
[7]

https://api.semanticscholar.org/CorpusID:282389107

Y . Zhu, L. Wang, C. Yanget al., “A Survey of Data Agents: Emerging Paradigm or Overstated Hype?”arXiv preprint arXiv:2510.23587, 2025

work page arXiv 2025
[8]

A Survey of LLM×DATA,

X. Zhou, J. He, W. Zhouet al., “A Survey of LLM×DATA,”arXiv preprint arXiv:2505.18458, 2025

work page arXiv 2025
[9]

Large Language Model-based Data Science Agent: A Survey,

K. Chen, P. Wang, Y . Yuet al., “Large Language Model-based Data Science Agent: A Survey,”arXiv preprint arXiv:2508.02744, 2025

work page arXiv 2025
[10]

Valentine: Evaluating matching techniques for dataset discovery,

C. Koutras, G. Siachamis, A. Ionescu, K. Psarakis, J. Brons, M. Fragk- oulis, C. Lofi, A. Bonifati, and A. Katsifodimos, “Valentine: Evaluating matching techniques for dataset discovery,” in2021 IEEE 37th Inter- national Conference on Data Engineering (ICDE). IEEE, 2021, pp. 468–479

2021
[11]

Deep learning for entity matching: A design space exploration,

S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y . Park, G. Krishnan, R. Deep, E. Arcaute, and V . Raghavendra, “Deep learning for entity matching: A design space exploration,” inProceedings of the 2018 international conference on management of data, 2018, pp. 19–34

2018
[12]

Truth discovery with multiple conflicting information providers on the web,

X. Yin, J. Han, and P. S. Yu, “Truth discovery with multiple conflicting information providers on the web,” inProceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007, pp. 1048–1052

2007
[13]

TPC-DI: The First Industry Benchmark for Data Integration,

M. Poess, T. Rabl, H.-A. Jacobsenet al., “TPC-DI: The First Industry Benchmark for Data Integration,”Proceedings of the VLDB Endowment, vol. 7, no. 13, pp. 1367–1378, 2014

2014
[14]

ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines,

T. Jin, Y . Zhu, and D. Kang, “ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines,”Proceedings of the VLDB Endowment, vol. 19, no. 2, pp. 84–98, 2025

2025
[15]

WDC Products: A Multi- Dimensional Entity Matching Benchmark,

R. Peeters, R. C. Der, and C. Bizer, “WDC Products: A Multi- Dimensional Entity Matching Benchmark,” inProceedings of the 27th International Conference on Extending Database Technology, 2024, pp. 22–33

2024
[16]

COMA: A System for Flexible Combination of Schema Matching Approaches,

H.-H. Do and E. Rahm, “COMA: A System for Flexible Combination of Schema Matching Approaches,” inProceedings of the 28th International Conference on Very Large Data Bases, 2002, pp. 610–621

2002
[17]

Magneto: Combining Small and Large Language Models for Schema Matching,

Y . Liu, E. H. M. Pena, A. Santoset al., “Magneto: Combining Small and Large Language Models for Schema Matching,”Proceedings of the VLDB Endowment, vol. 18, no. 8, pp. 2681–2694, 2025

2025
[18]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, pp. 3982–3992

2019
[19]

The Probabilistic Relevance Framework: BM25 and Beyond,

S. Robertson and H. Zaragoza, “The Probabilistic Relevance Framework: BM25 and Beyond,”Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009

2009
[20]

SC-Block: Supervised Con- trastive Blocking Within Entity Resolution Pipelines,

A. Brinkmann, R. Shraga, and C. Bizer, “SC-Block: Supervised Con- trastive Blocking Within Entity Resolution Pipelines,” inThe Semantic Web: 21st International Conference, ESWC, 2024, pp. 121–142

2024
[21]

Deep Entity Matching with Pre-Trained Language Models,

Y . Li, J. Li, Y . Suharaet al., “Deep Entity Matching with Pre-Trained Language Models,”Proceedings of the VLDB Endowment, vol. 14, no. 1, pp. 50–60, 2020

2020
[22]

Magellan: Toward Building Entity Matching Management Systems,

P. Konda, S. Das, P. Suganthan G. C.et al., “Magellan: Toward Building Entity Matching Management Systems,”Proceedings of the VLDB Endowment, vol. 9, no. 12, pp. 1197–1208, 2016

2016
[23]

Truth Discovery with Multiple Conflicting Information Providers on the Web,

X. Yin, J. Han, and P. S. Yu, “Truth Discovery with Multiple Conflicting Information Providers on the Web,” inProceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, pp. 1048–1052

2007
[24]

A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration,

B. Zhao, B. I. P. Rubinstein, J. Gemmellet al., “A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration,” Proceedings of the VLDB Endowment, vol. 5, no. 6, pp. 550–561, 2012

2012
[25]

Integrating Conflicting Data: The Role of Source Dependence,

X. L. Dong, L. Berti-Equille, and D. Srivastava, “Integrating Conflicting Data: The Role of Source Dependence,”Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 550–561, 2009

2009
[26]

Truth Discovery by Claim and Source Embedding,

S. Lyu, W. Ouyang, H. Shenet al., “Truth Discovery by Claim and Source Embedding,” inProceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 2183–2186

2017
[27]

FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data,

J. Zhu, Y . Mao, L. Chenet al., “FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data,”Proceedings of the VLDB Endowment, vol. 17, no. 6, pp. 1337–1349, 2024

2024
[28]

Divergence Measures Based on the Shannon Entropy,

J. Lin, “Divergence Measures Based on the Shannon Entropy,”IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 145–151, 1991

1991
[29]

On Wasserstein Two- Sample Testing and Related Families of Nonparametric Tests,

A. Ramdas, N. Garc ´ıa Trillos, and M. Cuturi, “On Wasserstein Two- Sample Testing and Related Families of Nonparametric Tests,”Entropy, vol. 19, no. 2, p. 47, 2017

2017
[30]

FEBRL: An Open Source Data Cleaning, Deduplication and Record Linkage System with a Graphical User Interface,

P. Christen, “FEBRL: An Open Source Data Cleaning, Deduplication and Record Linkage System with a Graphical User Interface,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 1065–1068

2008
[31]

Entity Matching using Large Language Models,

R. Peeters, A. Steiner, and C. Bizer, “Entity Matching using Large Language Models,” inProceedings of the 28th International Conference on Extending Database Technology, 2025, pp. 529–541

2025
[32]

Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching,

T. Wang, X. Chen, H. Linet al., “Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching,” in Proceedings of the 31st International Conference on Computational Linguistics, 2025, pp. 96–109

2025
[33]

DIPBench Toolsuite: A Framework for Benchmarking Integration Systems,

M. B ¨ohm, D. Habich, W. Lehneret al., “DIPBench Toolsuite: A Framework for Benchmarking Integration Systems,” inProceedings of the 2008 IEEE 24th International Conference on Data Engineering, 2008, pp. 1596–1599

2008
[34]

DIBS: A Data Integration Benchmark Suite,

A. M. Cabrera, C. J. Faber, K. Cepedaet al., “DIBS: A Data Integration Benchmark Suite,” inCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering, 2018, pp. 25–28

2018
[35]

Alaska: A Flexible Benchmark for Data Integration Tasks,

V . Crescenzi, A. De Angelis, D. Firmaniet al., “Alaska: A Flexible Benchmark for Data Integration Tasks,”arXiv preprint arXiv:2101.11259, 2021

work page arXiv 2021
[36]

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes,

Y . Deng, C. Chai, L. Caoet al., “LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes,”Proceedings of the VLDB Endowment, vol. 17, no. 8, pp. 1925–1938, 2024

1925
[37]

Evaluation of Pipelines for Data Integration into Knowledge Graphs

M. Hofer and E. Rahm, “Evaluation of Pipelines for Data Integration into Knowledge Graphs,”arXiv preprint arXiv:2605.22304, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[1] [1]

A. Doan, A. Halevy, and Z. Ives,Principles of Data Integration. Morgan Kaufmann, 2012

2012

[2] [2]

X. L. Dong and D. Srivastava,Big data integration. Morgan & Claypool Publishers, 2015

2015

[3] [3]

Generic Schema Matching, Ten Years Later,

P. A. Bernstein, J. Madhavan, and E. Rahm, “Generic Schema Matching, Ten Years Later,”Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 695–701, 2011

2011

[4] [4]

An Overview of End-to-End Entity Resolution for Big Data,

V . Christophides, V . Efthymiou, T. Palpanaset al., “An Overview of End-to-End Entity Resolution for Big Data,”ACM Computing Surveys, vol. 53, no. 6, pp. 1–42, 2020

2020

[5] [5]

Data Fusion,

J. Bleiholder and F. Naumann, “Data Fusion,”ACM Computing Surveys, vol. 41, no. 1, pp. 1–41, 2009

2009

[6] [6]

Automatic End-to-End Data Integration using Large Language Models,

A. Steiner and C. Bizer, “Automatic End-to-End Data Integration using Large Language Models,” 2026

2026

[7] [7]

https://api.semanticscholar.org/CorpusID:282389107

Y . Zhu, L. Wang, C. Yanget al., “A Survey of Data Agents: Emerging Paradigm or Overstated Hype?”arXiv preprint arXiv:2510.23587, 2025

work page arXiv 2025

[8] [8]

A Survey of LLM×DATA,

X. Zhou, J. He, W. Zhouet al., “A Survey of LLM×DATA,”arXiv preprint arXiv:2505.18458, 2025

work page arXiv 2025

[9] [9]

Large Language Model-based Data Science Agent: A Survey,

K. Chen, P. Wang, Y . Yuet al., “Large Language Model-based Data Science Agent: A Survey,”arXiv preprint arXiv:2508.02744, 2025

work page arXiv 2025

[10] [10]

Valentine: Evaluating matching techniques for dataset discovery,

C. Koutras, G. Siachamis, A. Ionescu, K. Psarakis, J. Brons, M. Fragk- oulis, C. Lofi, A. Bonifati, and A. Katsifodimos, “Valentine: Evaluating matching techniques for dataset discovery,” in2021 IEEE 37th Inter- national Conference on Data Engineering (ICDE). IEEE, 2021, pp. 468–479

2021

[11] [11]

Deep learning for entity matching: A design space exploration,

S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y . Park, G. Krishnan, R. Deep, E. Arcaute, and V . Raghavendra, “Deep learning for entity matching: A design space exploration,” inProceedings of the 2018 international conference on management of data, 2018, pp. 19–34

2018

[12] [12]

Truth discovery with multiple conflicting information providers on the web,

X. Yin, J. Han, and P. S. Yu, “Truth discovery with multiple conflicting information providers on the web,” inProceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007, pp. 1048–1052

2007

[13] [13]

TPC-DI: The First Industry Benchmark for Data Integration,

M. Poess, T. Rabl, H.-A. Jacobsenet al., “TPC-DI: The First Industry Benchmark for Data Integration,”Proceedings of the VLDB Endowment, vol. 7, no. 13, pp. 1367–1378, 2014

2014

[14] [14]

ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines,

T. Jin, Y . Zhu, and D. Kang, “ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines,”Proceedings of the VLDB Endowment, vol. 19, no. 2, pp. 84–98, 2025

2025

[15] [15]

WDC Products: A Multi- Dimensional Entity Matching Benchmark,

R. Peeters, R. C. Der, and C. Bizer, “WDC Products: A Multi- Dimensional Entity Matching Benchmark,” inProceedings of the 27th International Conference on Extending Database Technology, 2024, pp. 22–33

2024

[16] [16]

COMA: A System for Flexible Combination of Schema Matching Approaches,

H.-H. Do and E. Rahm, “COMA: A System for Flexible Combination of Schema Matching Approaches,” inProceedings of the 28th International Conference on Very Large Data Bases, 2002, pp. 610–621

2002

[17] [17]

Magneto: Combining Small and Large Language Models for Schema Matching,

Y . Liu, E. H. M. Pena, A. Santoset al., “Magneto: Combining Small and Large Language Models for Schema Matching,”Proceedings of the VLDB Endowment, vol. 18, no. 8, pp. 2681–2694, 2025

2025

[18] [18]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, pp. 3982–3992

2019

[19] [19]

The Probabilistic Relevance Framework: BM25 and Beyond,

S. Robertson and H. Zaragoza, “The Probabilistic Relevance Framework: BM25 and Beyond,”Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009

2009

[20] [20]

SC-Block: Supervised Con- trastive Blocking Within Entity Resolution Pipelines,

A. Brinkmann, R. Shraga, and C. Bizer, “SC-Block: Supervised Con- trastive Blocking Within Entity Resolution Pipelines,” inThe Semantic Web: 21st International Conference, ESWC, 2024, pp. 121–142

2024

[21] [21]

Deep Entity Matching with Pre-Trained Language Models,

Y . Li, J. Li, Y . Suharaet al., “Deep Entity Matching with Pre-Trained Language Models,”Proceedings of the VLDB Endowment, vol. 14, no. 1, pp. 50–60, 2020

2020

[22] [22]

Magellan: Toward Building Entity Matching Management Systems,

P. Konda, S. Das, P. Suganthan G. C.et al., “Magellan: Toward Building Entity Matching Management Systems,”Proceedings of the VLDB Endowment, vol. 9, no. 12, pp. 1197–1208, 2016

2016

[23] [23]

Truth Discovery with Multiple Conflicting Information Providers on the Web,

X. Yin, J. Han, and P. S. Yu, “Truth Discovery with Multiple Conflicting Information Providers on the Web,” inProceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, pp. 1048–1052

2007

[24] [24]

A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration,

B. Zhao, B. I. P. Rubinstein, J. Gemmellet al., “A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration,” Proceedings of the VLDB Endowment, vol. 5, no. 6, pp. 550–561, 2012

2012

[25] [25]

Integrating Conflicting Data: The Role of Source Dependence,

X. L. Dong, L. Berti-Equille, and D. Srivastava, “Integrating Conflicting Data: The Role of Source Dependence,”Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 550–561, 2009

2009

[26] [26]

Truth Discovery by Claim and Source Embedding,

S. Lyu, W. Ouyang, H. Shenet al., “Truth Discovery by Claim and Source Embedding,” inProceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 2183–2186

2017

[27] [27]

FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data,

J. Zhu, Y . Mao, L. Chenet al., “FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data,”Proceedings of the VLDB Endowment, vol. 17, no. 6, pp. 1337–1349, 2024

2024

[28] [28]

Divergence Measures Based on the Shannon Entropy,

J. Lin, “Divergence Measures Based on the Shannon Entropy,”IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 145–151, 1991

1991

[29] [29]

On Wasserstein Two- Sample Testing and Related Families of Nonparametric Tests,

A. Ramdas, N. Garc ´ıa Trillos, and M. Cuturi, “On Wasserstein Two- Sample Testing and Related Families of Nonparametric Tests,”Entropy, vol. 19, no. 2, p. 47, 2017

2017

[30] [30]

FEBRL: An Open Source Data Cleaning, Deduplication and Record Linkage System with a Graphical User Interface,

P. Christen, “FEBRL: An Open Source Data Cleaning, Deduplication and Record Linkage System with a Graphical User Interface,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 1065–1068

2008

[31] [31]

Entity Matching using Large Language Models,

R. Peeters, A. Steiner, and C. Bizer, “Entity Matching using Large Language Models,” inProceedings of the 28th International Conference on Extending Database Technology, 2025, pp. 529–541

2025

[32] [32]

Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching,

T. Wang, X. Chen, H. Linet al., “Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching,” in Proceedings of the 31st International Conference on Computational Linguistics, 2025, pp. 96–109

2025

[33] [33]

DIPBench Toolsuite: A Framework for Benchmarking Integration Systems,

M. B ¨ohm, D. Habich, W. Lehneret al., “DIPBench Toolsuite: A Framework for Benchmarking Integration Systems,” inProceedings of the 2008 IEEE 24th International Conference on Data Engineering, 2008, pp. 1596–1599

2008

[34] [34]

DIBS: A Data Integration Benchmark Suite,

A. M. Cabrera, C. J. Faber, K. Cepedaet al., “DIBS: A Data Integration Benchmark Suite,” inCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering, 2018, pp. 25–28

2018

[35] [35]

Alaska: A Flexible Benchmark for Data Integration Tasks,

V . Crescenzi, A. De Angelis, D. Firmaniet al., “Alaska: A Flexible Benchmark for Data Integration Tasks,”arXiv preprint arXiv:2101.11259, 2021

work page arXiv 2021

[36] [36]

LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes,

Y . Deng, C. Chai, L. Caoet al., “LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes,”Proceedings of the VLDB Endowment, vol. 17, no. 8, pp. 1925–1938, 2024

1925

[37] [37]

Evaluation of Pipelines for Data Integration into Knowledge Graphs

M. Hofer and E. Rahm, “Evaluation of Pipelines for Data Integration into Knowledge Graphs,”arXiv preprint arXiv:2605.22304, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026