Towards the Improvement of a Topic Model with Semantic Knowledge



Although typically used in classic topic models, surface words cannot represent meaning on their own. Consequently, redundancy is common in those topics, which may, for instance, include synonyms.
To face this problem, we present SemLDA, an extended topic model that incorporates semantics from an external lexical-semantic knowledge base.
SemLDA is introduced and explained in detail, pointing out where semantics is included both in the pre-pocessing and generative phase of topic distributions. As a result, instead of topics as distributions over words, we obtain distributions over concepts, each represented by a set of synonymous words.
In order to evaluate SemLDA, we applied preliminary qualitative tests automatically against a state-of-the-art classical topic model. The results were promising and confirm our intuition towards the benefits of incorporating general semantics in a topic model.


Topic model, semantics, WordNet


Topic model, semantics, WordNet

Related Project

InfoCrowds - Social Web Information Retrieval for crowds mobility management


14th Portuguese Conference on Artificial Intelligence (EPIA 2015), September 2015


Cited by

Year 2017 : 1 citations

 Mireles, V., & Revenko, A. (2017). Evolution of Semantically Identified Topics. In HybridSemStats@ ISWC.