Aggregate Estimation Over Dynamic Hidden Web Databases

arxiv: 1403.2763 · v2 · pith:G6XFVBVSnew · submitted 2014-03-11 · 💻 cs.DB

Aggregate Estimation Over Dynamic Hidden Web Databases

Weimo Liu , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das This is my paper

classification 💻 cs.DB

keywords databaseshiddensearchaggregatealgorithmsdesigneddynamicproblem

0 comments p. Extension

pith:G6XFVBVS Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{G6XFVBVS}

Prints a linked pith:G6XFVBVS badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most real-world web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive real-world experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).

This paper has not been read by Pith yet.

Aggregate Estimation Over Dynamic Hidden Web Databases

discussion (0)