Classification of Social Media Posts according to their Relevance



Given the overwhelming quantity of messages posted in social networks, in order to to make their utilization more productive, it is imperative to filter out irrelevant information.
This work is focused on the automatic classification of public social data according to its potential \textit{relevance} to a general audience, according to journalistic criteria. This means filtering out information that is private, personal, not important or simply irrelevant to the public, improving the the overall quality of the social media information.\\
A range of natural language processing toolkits was first assessed while performing a set of standard tasks in popular datasets that cover newspaper and social network text. After that, different learning models were tested, using linguistic features extracted by some of the previous toolkits. The prediction of journalistic criteria, key in the assessment of relevance, was also explored, using the same features. A new classifier uses the journalist predictions, made by an ensemble of linguistic classifiers, as features to detect relevance. The obtained model achieved a F_1 score of 0.82 with an area under the curve(AUC) equal to 0.78.


Relevance Assessment, Social Mining, Information Extraction, Natural Language Processing, Automatic Text Classification

MSc Thesis

Classification of Social Media Posts according to their Relevance, September 2016

PDF File

Cited by

No citations found