LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese
Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.
Lemmatization, Normalization, Rules, Lexicon
iCIS - Intelligent Computing in the Internet of Services
3rd Symposium on Languages, Applications and Technologies (SLATE’14), June 2014