CISUC

ECO and Onto.PT: A flexible approach for creating a Portuguese wordnet automatically

Authors

Abstract

A wordnet is an important tool for developing natural language processing applications for a language. However, most wordnets are handcrafted by experts, which limits their growth. In this article, we propose an automatic approach to create wordnets by exploiting textual resources, dubbed ECO. After extracting semantic relation instances, identified by discriminating textual patterns, ECO discovers synonymy clusters, used as synsets, and attaches the remaining relations to suitable synsets. Besides introducing each step of ECO, we report on how it was implemented to create Onto.PT, a public lexical ontology for Portuguese. Onto.PT is the result of the automatic exploitation of Portuguese dictionaries and thesauri, and it aims to minimise the main limitations of existing Portuguese lexical knowledge bases.

Keywords

wordnet, automatic, information extraction, portuguese

Subject

Natural Language Processing

Related Project

Onto.PT

Journal

Language Resources and Evaluation Journal, Vol. 48, #2, pp. 373-393, June 2014

DOI


Cited by

Year 2016 : 5 citations

 Simões, A., Guinovart, X. G., and Almeida, J. J. (2016). Enriching a portuguese wordnet using synonyms from a monolingual dictionary. In Proceedings of 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. ELRA.

 Fonseca, E., Vieira, R., and Vanin, A. (2016). Adapting an entity centric model for portuguese coreference resolution. In Proceedings of 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. ELRA.

 Fonseca, E., Vieira, R., and Vanin, A. (2016). Improving coreference resolution with semantic knowledge. In Proceedings of 12th International Conference on Computational Processing of the Portuguese Language (PROPOR 2016), volume 9727 of LNAI, pages 213–224, Tomar, Portugal. Springer.

 Reis, S. and Baptista, J. (2016). Let's play wit proverbs? nlp tools and resources for icall applications around proverbs for pf. In Proceedings of the International Congress on Interdisciplinarity in Social and Human Sciences, pages 435–454. Research Centre for Spatial and Organizational Dynamics, University of Algarve.

 Vieira, R., do Amaral, D., Collovini, S., Fonseca, E., Freitas, A., Freitas, L., Granada, R., Hilgert, L., Lopes, L., Schmidt, D., Severo, B., Souza, M., and Trojahn, C. (2016). Language resources for information extraction and semantic computing - NLP at PUCRS. In Proceedings Corpora and Tools for Processing Corpora at PROPOR 2016, pages 17–25.

Year 2015 : 7 citations

 Wilkens, R., Zilio, L., Ferreira, E., Gonçalves, G., and Villavicencio, A. (2015). Tesauros distribucionais para o português: avaliação de metodologias. In Proceedings of Symposium in Information and Human Language Technology, STIL 2015, pages 131–140, Natal, RN, Brazil.

 Mendonça, V., Coheur, L., and Sardinha, A. (2015). Vithea-kids: A platform for improving language skills of children with autism spectrum disorder. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, ASSETS ’15, pages 345–346, New York, NY, USA. ACM Press.

 Gonçalves, G. C. (2015). Construção e avaliação de modelos semânticos distribucionais. BSc’s thesis, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brasil.

 Simões, A. and Almeida, J. J. (2015). Experiments on enlarging a lexical ontology. In Languages, Applications and Technologies – Revised Selected Papers of 4th International Symposium SLATE 2015, Madrid, Spain, June, CCIS, pages 49–56. Springer.

 Evandro B. Fonseca, Renata Vieira, Aline A. Vanin. Dealing With Imbalanced Datasets For Coreference Resolution. Proceedings of the 28th International Florida Artificial Intelligence Research Society Conference (FLAIRS), pp. 169-174. AAAI, 2015.

 Lagutina, N., Paramonov, I., Vorontsova, I., and Kasatkina, N. (2015). An approach to automated thesaurus construction using clusterization-based dictionary analysis. In Proceedings of the 17th Conference of FRUCT Association, pages 104–109, Yaroslav, Russia. ITMO University.

 Amita Jain, Devendra K. Tayal, Sunny Rai. Shrinking digital gap through automatic generation of WordNet for Indian languages. AI & SOCIETY 30(2):215–222. Springer, May 2015.

Year 2014 : 2 citations

 Brett Drury, Paula C.F. Cardoso, Janie M. Thomas, Alneu de Andrade Lopes. Lexical Resources for the Identification of Causative Relations in Portuguese Texts. Proceedings of the 1st Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish (ToRPorEsp), pp 56-63. São Carlos, SP, Brasil. 2014

 Xavier Gómez Guinovar and Miguel Anxo Solla Portela (2014). O dicionario de sinónimos como recurso para a expansión de wordnet. Linguamática, 6(2):69–74