pith. machine review for the scientific record. sign in

arxiv: 1710.00204 · v2 · submitted 2017-09-30 · 💻 cs.DB

Recognition: unknown

Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework

Authors on Pith no claims yet
classification 💻 cs.DB
keywords humanqualityhumomachinecontrolentityresolutionapproaches
0
0 comments X
read the original abstract

Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel HUman and Machine cOoperation (HUMO) framework for entity resolution (ER), which divides an ER workload between the machine and the human. HUMO enables a mechanism for quality control that can flexibly enforce both precision and recall levels. We introduce the optimization problem of HUMO, minimizing human cost given a quality requirement, and then present three optimization approaches: a conservative baseline one purely based on the monotonicity assumption of precision, a more aggressive one based on sampling and a hybrid one that can take advantage of the strengths of both previous approaches. Finally, we demonstrate by extensive experiments on real and synthetic datasets that HUMO can achieve high-quality results with reasonable return on investment (ROI) in terms of human cost, and it performs considerably better than the state-of-the-art alternatives in quality control.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.