
Abstract
XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents.
Item Type: | Paper |
---|---|
Faculties: | Mathematics, Computer Science and Statistics > Computer Science |
Subjects: | 000 Computer science, information and general works > 004 Data processing computer science |
URN: | urn:nbn:de:bvb:19-epub-14875-4 |
Language: | English |
Item ID: | 14875 |
Date Deposited: | 18. Apr 2013, 06:19 |
Last Modified: | 13. Aug 2024, 12:51 |