Abstract
XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents.
Dokumententyp: | Paper |
---|---|
Fakultät: | Mathematik, Informatik und Statistik > Informatik |
Themengebiete: | 000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik |
URN: | urn:nbn:de:bvb:19-epub-14875-4 |
Sprache: | Englisch |
Dokumenten ID: | 14875 |
Datum der Veröffentlichung auf Open Access LMU: | 18. Apr. 2013, 06:19 |
Letzte Änderungen: | 13. Aug. 2024, 12:51 |