1,721,137 research outputs found
Combinatorial pattern matching: 15th annual symposium, CPM 2004 Istanbul, Turkey, july 5-7, 2004 proceedings
Multi-Method Dispatching: A Geometric Approach With Applications to String Matching Problems
Structuring labeled trees for optimal succinctness, and beyond
Consider an ordered, static tree T on t nodes where each node has a label from alphabet set ~. Tree T may be of arbitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u, the ith child of u, and any child of u with label ?. In a seminal work over fifteen years ago, Jacobson [15] observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the information-theoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings and multisets. Still, for the fundamental problem of structuring labeled trees succinctly, few results, if any, exist even though labeled trees arise frequently in practice, e.g. in the data as in markup text (XML) or in augmented data structures. We present a novel approach to the problem of succinct manipulation of labeled trees by designing what we call the xbw transform of the tree, in the spirit of the well-known Burrows-Wheeler transform for strings, xbw transform uses path-sorting and grouping to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. Using the properties of the xbw transform, we (i) derive the first-known (near-)optimal results for succinct representation of labeled trees with O(1) time for navigation operations, (ii) optimally support the powerful subpath search operation for the first time, and (iii) introduce a notion of tree entropy and present linear time algorithms for compressing a given labeled tree up to its entropy beyond the information-theoretic lower bound averaged over all tree inputs. Our xbw transform is simple and likely to spur new results in the theory of tree compression and indexing, and may have some practical impact in XML data processing
Compressing and Indexing Labeled Trees, with Applications
Consider an ordered, static tree T where each node has a label from alphabet σ. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, . . .) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-knownBurrows-Wheeler transform for strings [1994]. TheXBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and pathsearch operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is signifi-cantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster
- …
