pith. sign in

arxiv: 1805.01083 · v1 · pith:GAGEYDI2new · submitted 2018-05-03 · 💻 cs.DB · cs.CL

Scalable Semantic Querying of Text

classification 💻 cs.DB cs.CL
keywords kokoextractionextractionslanguagetextconditionsscalesupports
0
0 comments X p. Extension
pith:GAGEYDI2 Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{GAGEYDI2}

Prints a linked pith:GAGEYDI2 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

We present the KOKO system that takes declarative information extraction to a new level by incorporating advances in natural language processing techniques in its extraction language. KOKO is novel in that its extraction language simultaneously supports conditions on the surface of the text and on the structure of the dependency parse tree of sentences, thereby allowing for more refined extractions. KOKO also supports conditions that are forgiving to linguistic variation of expressing concepts and allows to aggregate evidence from the entire document in order to filter extractions. To scale up, KOKO exploits a multi-indexing scheme and heuristics for efficient extractions. We extensively evaluate KOKO over publicly available text corpora. We show that KOKO indices take up the smallest amount of space, are notably faster and more effective than a number of prior indexing schemes. Finally, we demonstrate KOKO's scale up on a corpus of 5 million Wikipedia articles.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.