LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese



Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.


Lemmatization, Normalization, Rules, Lexicon



Related Project

iCIS - Intelligent Computing in the Internet of Services


3rd Symposium on Languages, Applications and Technologies (SLATE’14), June 2014


Cited by

Year 2018 : 1 citations

 Sousa, L., de Mello, R., Cedrim, D., Garcia, A., Missier, P., Uchoa, A., Oliveira, A., and Romanovsky, A. (2018). Vazadengue: An information system for preventing and combating mosquito- borne diseases with social networks. Information Systems, page (online since February 2018).

Year 2017 : 1 citations

 Devezas, J. and Nunes, S. (2017). Information Extraction for Event Ranking. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017), volume 56 of OASIcs, pages 18:1–18:14, Dagstuhl, Germany. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.

Year 2016 : 4 citations

 Wijaya, D. T. and Mitchell, T. (2016). Mapping verbs in different languages to knowledge base relations using web text as interlingua. In Proceedings of 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA.

 Wijaya, D. T. (2016). VerbKB: A Knowledge Base of Verbs for Natural Language Understanding. PhD thesis, Carnegie Mellon University.

 de Almeida, H. M. C. (2016). Suffix identification in portuguese using transducers. Master’s thesis, Instituto Superior Técnico.

 Hachaj, T. and Ogiela, M. R. (2016). Clusters of trends detection in microblog- ging: Simple natural language processing vs hashtags–which is more informative? In Proceedings of 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS), pages 119–121. IEEE.