LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese



Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.


Lemmatization, Normalization, Rules, Lexicon



Related Project

iCIS - Intelligent Computing in the Internet of Services


3rd Symposium on Languages, Applications and Technologies (SLATE’14), June 2014


Cited by

Year 2016 : 2 citations

 Wijaya, D. T. and Mitchell, T. (2016). Mapping verbs in different languages to knowledge base relations using web text as interlingua. In Proceedings of 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA.

 Wijaya, D. T. (2016). VerbKB: A Knowledge Base of Verbs for Natural Language Understanding. PhD thesis, Carnegie Mellon University.