Recognition: unknown
Skyblocking for Entity Resolution
read the original abstract
In this paper, for the first time, we introduce the concept of skyblocking, which aims to efficiently identify the "most preferred" blocking scheme in terms of a given set of selection criteria for entity resolution blocking. To capture all possible preferred blocking schemes, scheme skyline (i.e. blocking schemes on the skyline) has been studied in a multi-dimensional scheme space with dimensions corresponding to selection criteria for blocking (e.g. PC and PQ). However, applying traditional skyline techniques to learn scheme skylines is a non-trivial task. Due to the unique characteristics of blocking schemes, we face several challenges, such as: how to find a balanced number of match and non-match labels to effectively approximate a block scheme in a scheme space, and how to design efficient skyline algorithms to explore a scheme space for finding scheme skylines. To overcome these challenges, we propose a scheme skyline learning approach, which incorporates skyline techniques into an active learning process of scheme skylines. We have conducted experiments over four real-world datasets. The experimental results show that our approach is able to efficiently identify scheme skylines in a large scheme space only using a limited number of labels. Our approach also outperforms the state-of-the-art approaches for learning blocking schemes in several aspects, including: label efficiency, blocking quality and learning efficiency.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.