Word-level and higher level annotation of the Sardinian Medieval Corpus
Nicoletta Puddu
;
2018-01-01
Abstract
This paper is about the Sardinian Medieval Corpus (SMC), the first linguistically annotated digital resource of Medieval Sardinian. The first part presents the textual and linguistic characteristics and discusses them in the light of the problems they pose for both manual and automatic annotation. The second part describes the development of the first computational tools for the analysis of Medieval Sardinian, on the word level (lemmatization and part-of-speech tagging) and on the syntactic level (dependency parsing). It is shown how the manual and the automatic approach can be combined to build an annotated database effeciently, even for medieval texts.File | Dimensione | Formato | |
---|---|---|---|
Puddu_Stein_CRH2.pdf Solo gestori archivio
Tipologia: versione editoriale
Dimensione 222.17 kB
Formato Adobe PDF
|
222.17 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.