CISUC

LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese

Authors

Abstract

Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.

Keywords

Lemmatization, Normalization, Rules, Lexicon

Subject

Lemmatizer

Related Project

iCIS - Intelligent Computing in the Internet of Services

Conference

3rd Symposium on Languages, Applications and Technologies (SLATE’14), June 2014

DOI


Cited by

Year 2016 : 2 citations

 Wijaya, D. T. and Mitchell, T. (2016). Mapping verbs in different languages to knowledge base relations using web text as interlingua. In Proceedings of 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA.

 Wijaya, D. T. (2016). VerbKB: A Knowledge Base of Verbs for Natural Language Understanding. PhD thesis, Carnegie Mellon University.