DSG: An End-to-End Document Structure Generator

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Rausch, Johannes; Rashiti, Gentiana; Gusev, Maxim; Zhang, Ce und Feuerriegel, Stefan ORCID: https://orcid.org/0000-0001-7856-8729 (2023): DSG: An End-to-End Document Structure Generator. 23rd IEEE International Conference on Data Mining (IEEE ICDM), Shanghai, China, Dec 01-04, 2023. Chen, Guihai; Khan, Latifur; Gao, Xiaofeng; Qiu, Meikang; Pedrycz, Witold und Wu, Xindong (Hrsg.): In: 2023 IEEE International Conference on Data Mining (ICDM), Piscataway, NJ: IEEE. S. 518-527

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1109/ICDM58522.2023.00061

Abstract

Information in industry, research, and the public sector is widely stored as rendered documents (e.g., PDF files, scans). Hence, to enable downstream tasks, systems are needed that map rendered documents onto a structured hierarchical format. However, existing systems for this task are limited by heuristics and are not end-to-end trainable. In this work, we introduce the Document Structure Generator (DSG), a novel system for document parsing that is fully end-to-end trainable. DSG combines a deep neural network for parsing (i) entities in documents (e.g., figures, text blocks, headers, etc.) and (ii) relations that capture the sequence and nested structure between entities. Unlike existing systems that rely on heuristics, our DSG is trained end-to-end, making it effective and flexible for real-world applications. We further contribute a new, large-scale dataset called E-Periodica comprising real-world magazines with complex document structures for evaluation. Our results demonstrate that our DSG outperforms commercial OCR tools and, on top of that, achieves state-of-the-art performance. To the best of our knowledge, our DSG system is the first end-to-end trainable system for hierarchical document parsing.

Dokumententyp:	Konferenzbeitrag (Paper)
Fakultät:	Betriebswirtschaft > Institute of Artificial Intelligence (AI) in Management Mathematik, Informatik und Statistik > Informatik
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 004 Informatik 300 Sozialwissenschaften > 330 Wirtschaft
ISBN:	979-8-3503-0788-7 ; 979-8-3503-0789-4
Ort:	Piscataway, NJ
Sprache:	Englisch
Dokumenten ID:	123434
Datum der Veröffentlichung auf Open Access LMU:	30. Jan. 2025 08:29
Letzte Änderungen:	30. Jan. 2025 08:29

Dokument bearbeiten