pith. sign in

arxiv: 1408.1011 · v3 · pith:2JU7SAGSnew · submitted 2014-08-05 · 💻 cs.DB · cs.DS· cs.IR

Non-hierarchical Structures: How to Model and Index Overlaps?

classification 💻 cs.DB cs.DScs.IR
keywords datanon-hierarchicalstructuresmodelstructuralalgorithmcomponentsextension
0
0 comments X
read the original abstract

Overlap is a common phenomenon seen when structural components of a digital object are neither disjoint nor nested inside each other. Overlapping components resist reduction to a structural hierarchy, and tree-based indexing and query processing techniques cannot be used for them. Our solution to this data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a novel extension of the XML data model for non-hierarchical structures. We introduce an algorithm for constructing TGSA from annotated documents; the algorithm can efficiently process non-hierarchical structures and is associated with formal proofs, ensuring that transformation of the document to the data model is valid. To enable high performance query analysis in large data repositories, we further introduce an extension of XML pre-post indexing for non-hierarchical structures, which can process both reachability and overlapping relationships.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.