Deterministic Regular Expressions in Linear TimeReport as inadecuate

Deterministic Regular Expressions in Linear Time - Download this document for free, or read online. Document in PDF available to download.

1 MOSTRARE - Modeling Tree Structures, Machine Learning, and Information Extraction LIFL - Laboratoire d-Informatique Fondamentale de Lille, Inria Lille - Nord Europe 2 NICTA - National ICT Australia Sydney 3 LIFL - Laboratoire d-Informatique Fondamentale de Lille

Abstract : Deterministic regular expressions are widely used in XML processing. For instance, all regular expressions in DTDs and XML Schemas are required to be deterministic. In this paper we show that determinism of a regular expression e can be tested in linear time. The best known algorithms, based on the Glushkov automaton, require Oσ|e| time, where σ is the number of distinct symbols in e. We fur- ther show that matching a word w against an expression e can be achieved in combined linear time O|e| + |w|, for a wide range of deterministic regular expressions: i star-free for multiple input words, ii bounded-occurrence, i.e., ex- pressions in which each symbol appears a bounded number of times, and iii bounded plus-depth, i.e., expressions in which the nesting depth of alternating plus union and con- catenation symbols is bounded. Our algorithms use a new structural decomposition of the parse tree of e. For match- ing arbitrary deterministic regular expressions we present an O|e| + |w| log log |e| time algorithm.

keyword : DTD XML SChema Deterministic Regular Expression Glushkov Automaton Linear time

Author: Benoit Groz - Sebastian Maneth - Slawomir Staworko -



Related documents