OHDOCLUS – Online and Hierarchical Document Clustering



Usually, clustering algorithms consider that document collections are static and are processed as a whole. However, in contexts where data is constantly being produced (e.g. the Web), systems that receive and process documents incrementally are becoming more and more important. We propose OHDOCLUS, an online and hierarchical algorithm for document clustering. OHDOCLUS creates a tree of clusters where documents are classified as soon as they are received. It is based on COBWEB and CLASSIT, two well-known data clustering algorithms that create hierarchies of probabilistic concepts and were seldom applied to text data. An experimental evaluation was conducted with categorized corpora, and the preliminary results confirm the validity of the proposed method.


Clustering, document clustering, online clustering, hierarchical clustering, dimensionality reduction


Document Clustering


Proceedings of the Eighth European Starting AI Researcher Symposium (STAIRS 2016), September 2016

PDF File


Cited by

No citations found