Word-level and higher level annotation of the Sardinian Medieval Corpus

Nicoletta Puddu
;
2018-01-01

Abstract

This paper is about the Sardinian Medieval Corpus (SMC), the first linguistically annotated digital resource of Medieval Sardinian. The first part presents the textual and linguistic characteristics and discusses them in the light of the problems they pose for both manual and automatic annotation. The second part describes the development of the first computational tools for the analysis of Medieval Sardinian, on the word level (lemmatization and part-of-speech tagging) and on the syntactic level (dependency parsing). It is shown how the manual and the automatic approach can be combined to build an annotated database effeciently, even for medieval texts.
2018
Inglese
Proceedings of the Second Workshop on Corpus-Based Research in the Humanities. CRH-2
9783901716430
Gerastree, Dept. of Geoinformation, TU
Vienna
AUSTRIA
Andrew U. Frank, Christine Ivanovic, Francesco Mambrini, Marco Passarotti, Caroline Sporleder
1
161
170
10
Corpus-Based Research in the Humanities, CRH-2
Contributo
Esperti anonimi
25-26 gennaio 2018
Vienna, Austria
internazionale
scientifica
Historical corpora; Sardinian; Digital humanities
4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
Puddu, Nicoletta; Achim, Stein
273
2
4.1 Contributo in Atti di convegno
reserved
info:eu-repo/semantics/conferencePaper
File in questo prodotto:
File Dimensione Formato  
Puddu_Stein_CRH2.pdf

Solo gestori archivio

Tipologia: versione editoriale
Dimensione 222.17 kB
Formato Adobe PDF
222.17 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Questionario e social

Condividi su:
Impostazioni cookie