pith. sign in

arxiv: 1904.13164 · v1 · pith:ZEANKFFUnew · submitted 2019-04-30 · 💻 cs.DB · cs.AI

Learning Restricted Regular Expressions with Interleaving

classification 💻 cs.DB cs.AI
keywords schemaexpressionsinterleavingregularalgorithmdatadocumentsinference
0
0 comments X
read the original abstract

The advantages for the presence of an XML schema for XML documents are numerous. However, many XML documents in practice are not accompanied by a schema or by a valid schema. Relax NG is a popular and powerful schema language, which supports the unconstrained interleaving operator. Focusing on the inference of Relax NG, we propose a new subclass of regular expressions with interleaving and design a polynomial inference algorithm. Then we conducted a series of experiments based on large-scale real data and on three XML data corpora, and experimental results show that our subclass has a better practicality than previous ones, and the regular expressions inferred by our algorithm are more precise.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.