An efficient algorithm for the authomatic building of a lexicon from textual corpora

FEDERICI, STEFANO
1998-01-01

Abstract

The LE-2111 SPARKLE (Shallow Parsing and Knowledge extraction for Language Engineering) project is aimed at the automatic extraction of lexical and semantic information from textual corpora in order to improve the performances of NLP systems. In this paper we describe an algorithm for the extraction of subcategorization patterns for Italian verbs. The extraction procedure is carried out on the basis of an efficient and accurate analogy-based engine and pre- and post-filters based on simple linguistic constraints. Despite the simplicity of the analogy-based algorithm the amount of lost information is negligible, and precision and recall over a set of hand-crafted subcategorization patterns (namely those produced within the LE PAROLE project) is fairly high.
1998
Inglese
Proceedings of Euralex 98
1
129
139
11
Euralex 98
contributo
4-8 Agosto 1998
Liegi
internazionale
4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
Federici, Stefano
273
1
4.1 Contributo in Atti di convegno
none
info:eu-repo/semantics/conferenceObject
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Questionario e social

Condividi su:
Impostazioni cookie