Content-Aware DataGuides for Indexing Large Collections of XML Documents

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Weigel, Felix; Meuss, Holger; Bry, François und Schulz, Klaus U. (2003): Content-Aware DataGuides for Indexing Large Collections of XML Documents. [PDF, 835kB]

Vorschau

DOI: 10.5282/ubm/epub.14875

Abstract

XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents.

Dokumententyp:	Paper
Fakultät:	Mathematik, Informatik und Statistik > Informatik
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik
URN:	urn:nbn:de:bvb:19-epub-14875-4
Sprache:	Englisch
Dokumenten ID:	14875
Datum der Veröffentlichung auf Open Access LMU:	18. Apr. 2013 06:19
Letzte Änderungen:	13. Aug. 2024 12:51

Dokument bearbeiten