Classification and Regression of Music Lyrics: Emotionally-Significant Features



This research addresses the role of lyrics in the music emotion recognition process. Our approach is based on several state of the art features complemented by novel stylistic, structural and semantic features. To evaluate our approach, we created a ground truth dataset containing 180 song lyrics, according to Russell’s emotion model. We conduct four types of experiments: regression and classification by quadrant, arousal and valence categories. Comparing to the state of the art features (ngrams - baseline), adding other features, including novel features, improved the F-measure from 68.2%, 79.6% and 84.2% to 77.1%, 86.3% and 89.2%, respectively for the three classification experiments. To study the relation between features and emotions (quadrants) we performed experiments to identify the best features that allow to describe and discriminate between arousal hemispheres and valence meridians. To further validate these experiments, we built a validation set comprising 771 lyrics extracted from the AllMusic platform, having achieved 73.6% F-measure in the classification by quadrants. Regarding regression, results show that, comparing to similar studies for audio, we achieve a similar performance for arousal and a much better performance for valence.


Music Emotion Recognition, Music Information Retrieval, Natural Language Processing

Related Project

MOODetector: A System for Mood-based Classification and Retrieval of Audio Music


8th International Conference on Knowledge Discovery and Information Retrieval – KDIR’2016, October 2016

PDF File

Cited by

No citations found