Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis



We propose a multi-modal approach to the music emotion recognition (MER) problem, combining information from distinct sources, namely audio, MIDI and lyrics. We introduce a methodology for the automatic creation of a multi-modal music emotion dataset resorting to the AllMusic database, based on the emotion tags used in the MIREX Mood Classification Task. Then, MIDI files and lyrics corresponding to a sub-set of the obtained audio samples were gathered. The dataset was organized into the same 5 emotion clusters defined in MIREX. From the audio data, 177 standard features and 98 melodic features were extracted. As for MIDI, 320 features were collected. Finally, 26 lyrical features were extracted. We experimented with several supervised learning and feature selection strategies to evaluate the proposed multi-modal approach. Employing only standard audio features, the best attained performance was 44.3% (F-measure). With the multi-modal approach, results improved to 61.1%, using only 19 multi-modal features. Melodic audio features were particularly important to this improvement.


music emotion recognition

Related Project

MOODetector: A System for Mood-based Classification and Retrieval of Audio Music


10th International Symposium on Computer Music Multidisciplinary Research – CMMR’2013, Marseille, France., October 2013

PDF File

Cited by

Year 2016 : 3 citations

 Wang, Cheng-I., Jennifer Hsu, and Shlomo Dubnov. "Machine Improvisation with Variable Markov Oracle: Toward Guided and Structured Improvisation." Computers in Entertainment (CIE) 14.3 (2016): 4.

 Ricardo Scholz, Geber Ramalho, Giordano Cabral. "Cross Task Study on MIREX Recent Results: An Index for Evolution Measurement and Some Stagnation Hypotheses". ISMIR 2016: 372-378

 Weihs, Claus, et al., "Music Data Analysis: Foundations and Applications." Taylor & Francis. (2016). ISBN: 978-1-4987-1956-8 / 978-1-4987-1957-5

Year 2015 : 4 citations

 Wang, Ju-Chiang, et al. "Modeling the affective content of music with a Gaussian mixture model." IEEE Transactions on Affective Computing 6.1 (2015): 56-68.

 Ren, Jia-Min, Ming-Ju Wu, and Jyh-Shing Roger Jang. "Automatic music mood classification based on timbre and modulation features." IEEE Transactions on Affective Computing 6.3 (2015): 236-246.

 Baniya, Babu Kaji, and Choong Seon Hong. "Music Mood Classification using Reduced Audio Features." (2015): 915-917.

 WONG, C. M. (2015). "User Customization for Music Emotion Classification using Online Sequential Extreme Learning Machine". (Outstanding Academic Papers by Students (OAPS)). Retrieved from University of Macau, Outstanding Academic Papers by Students Repository.

Year 2014 : 2 citations

 Sturm, Bob L. "A simple method to determine if a music information retrieval system is a “horse”." IEEE Transactions on Multimedia 16.6 (2014): 1636-1644.

 Ramamurthy, Karthikeyan Natesan, et al. "Consensus inference with multilayer graphs for multi-modal data." Signals, Systems and Computers, 2014 48th Asilomar Conference on. IEEE, 2014.