$content.nome.text $content.cognome.text

The challenges of German archival document categorization on insufficient labeled data

Fabian Hoppe;Tabea Tietz;Danilo Dessi';Nils Meyer;Mirjam Sprau;Mehwish Alam;Harald Sack

2020-01-01

Abstract

Document exploration in archives is often challenging due to the lack of organization in topic-based categories. Moreover, archival records only provide short text which is often insufficient for capturing the semantic. This paper proposes and explores a dataless categorization approach that utilizes word embeddings and TF-IDF to categorize archival documents. Additionally, it introduces a visual approach built on top of the word embeddings to enhance the exploration of data. Preliminary results suggest that current vector representations alone do not provide enough external knowledge to solve this task.

Short Card

Tab complete

Full Sheet(DC)

         Anno 
       
        2020 
       
         Lingua/e 
       
        Inglese 
       
         Titolo del Volume 
       
        WHiSe 2020 Workshop on Humanities in the Semantic Web 2020 
       
         Nome Editore 
       
        CEUR-WS 
       
         Titolo della Collana/serie 
       
        CEUR WORKSHOP PROCEEDINGS 
       
         Volume 
       
        2695 
       
         Da pagina 
       
        15 
       
         A pagina 
       
        20 
       
         Numero di pagine 
       
        6 
       
         Codice Scopus 
       
        2-s2.0-85095974032 
       
         Titolo del convegno 
       
        3rd Workshop on Humanities in the Semantic Web, WHiSe 2020 
       
         Referee 
       
        Esperti anonimi 
       
         Periodo del Convegno 
       
        2 June 2020 
       
         Luogo del Convegno 
       
        Heraklion, Greece (Virtual) 
       
         Caratterizzazione prevalente 
       
        scientifica 
       
         Parole chiave 
       
        Cultural Heritage; Dataless Categorization; Document Exploration; Text Categorization 
       
         Presenza di coautori internazionali 
       
        sì 
       
         Tipologia 
       
        4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno 
       
         Tutti gli autori 
       
        Hoppe, Fabian; Tietz, Tabea; Dessi', Danilo; Meyer, Nils; Sprau, Mirjam; Alam, Mehwish; Sack, Harald
         
         Tipologia sito docente 
       
        273 
       
         Numero autori 
       
        7 
       
         Tipologia 
       
        4.1 Contributo in Atti di convegno 
       
         Fulltext 
       
        open 
       
         Tipologia 
       
        info:eu-repo/semantics/conferencePaper 
       
         Type: 
       
        4.1 Contributo in Atti di convegno

Files in This Item:

File	Size	Format
2020 - The Challenges of German Archival Document Categorization on Insufficient Labeled Data.pdf open access Type: versione editoriale Size 259.07 kB Format Adobe PDF View/Open	259.07 kB	Adobe PDF	View/Open

University of Cagliari

University of Cagliari

The challenges of German archival document categorization on insufficient labeled data

Fabian Hoppe;Tabea Tietz;Danilo Dessi';Nils Meyer;Mirjam Sprau;Mehwish Alam;Harald Sack

2020-01-01

Abstract

Short Card

Tab complete

Full Sheet(DC)

The challenges of German archival document categorization on insufficient labeled data

Fabian Hoppe;Tabea Tietz;Danilo Dessi';Nils Meyer;Mirjam Sprau;Mehwish Alam;Harald Sack

2020-01-01

Abstract

Short Card Tab complete Full Sheet(DC)

Questionnaire and social

Short Card

Tab complete

Full Sheet(DC)