Melody Detection in Polyphonic Audio



Note: Publication: September 25, 2006; Defense: February 16, 2007

In this research work, we address the problem of melody detection in polyphonic audio. Our system comprises three main modules, where a number of rule based procedures are proposed to attain the specific goals of each unit: i) pitch detection; ii) determination of mu-sical notes (with precise temporal boundaries and pitches); and iii) identification of melodic notes. We follow a multi stage approach, inspired on principles from perceptual theory and musical practice. Physiological models and perceptual cues of sound organization are incorporated into our method, mimicking the behavior of the human auditory system to some extent. Moreover, musicological principles are applied, in order to support the identification of the musical notes that convey the main melodic line.
Our algorithm starts with an auditory model based pitch detector, where multiple pitches are extracted in each analysis frame. These correspond to a few of the most intense fun-damental frequencies, since one of our basis assumptions is that the main melody is usu-ally salient in musical ensembles.
Unlike most other melody extraction approaches, we aim to explicitly distinguish individual musical notes, characterized by specific temporal boundaries and MIDI note numbers. In addition, we store their exact frequency sequences and intensity related val-ues, which might be necessary for the study of performance dynamics, timbre, etc. We start this task with the construction of pitch trajectories that are formed by connecting pitch candidates with similar frequency values in consecutive frames. The objective is to find regions of stable pitches, which indicate the presence of musical notes.
Since the created tracks may contain more than one note, temporal segmentation must be carried out. This is accomplished in two steps, making use of the pitch and in-tensity contours of each track, i.e., frequency and salience based segmentation. In fre-quency based track segmentation, the goal is to separate all notes of different pitches that are included in the same trajectory, coping with glissando, legato and vibrato and other sorts of frequency modulation. As for salience based segmentation, the objective is to separate consecutive notes at the same pitch, which may have been incorrectly inter-preted as forming one single note.
Regarding the identification of the notes bearing the melody, we found our strategy on two core assumptions that we designate as the salience principle and the melodic smooth-ness principle. By the salience principle, we assume that the melodic notes have, in gen-eral, a higher intensity in the mixture (although this is not always the case). As for the melodic smoothness principle, we exploit the fact that melodic intervals tend normally to be small. Finally, we aim to eliminate false positives, i.e., erroneous notes present in the obtained melody. This is carried out by removing the notes that correspond to abrupt salience or duration reductions and by implementing note clustering to further discrimi-nate the melody from the accompaniment.
Experimental results were conducted, showing that our method performs satisfacto-rily under the specified assumptions. However, additional difficulties are encountered in song excerpts where the intensity of the melody in comparison to the surrounding ac-companiment is not so favorable.
To conclude, despite its broad range of applicability, most of the research problems involved in melody detection are complex and still open. Most likely, sufficiently robust, general, accurate and efficient algorithms will only become available after several years of intensive research.


melody detection in polyphonic audio, music information retrieval, melody perception, musicology, pitch detection, conversion of pitch sequences into musical notes, pitch tracking and temporal segmentation, onset detection, identification of melodic notes


Music Information Retrieval

PhD Thesis

Melody Detection in Polyphonic Audio, February 2007

PDF File

Cited by

Year 2012 : 1 citations

 1. Salamon J. and Urbano J. (2012), “Current Challenges in the Evaluation of Predominant Melody Extraction Algorithms”, 13th International Society for Music Information Retrieval Conference (ISMIR 2012).

Year 2011 : 2 citations

 Martens, Ga. "Extraction and representation of semantic information in digital media." (2011): HASH-0x6fd2dd8.

 Muller, M.; Ellis, D.P.W.; Klapuri, A.; Richard, G. (2011), "Signal Processing for Music Analysis," Selected Topics in Signal Processing, IEEE Journal of , vol.5, no.6, pp.1088-1110, Oct. 2011.

Year 2010 : 2 citations

 JL Durrieu (2010). “Transcription et séparation automatique de la mélodie principale dans les signaux de musique polyphoniques”, PhD Thesis, Telecom Paris Tech

 J-L Durrieu, G. Richard, B. David, C. Févotte, Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals, IEEE Transactions on Audio, Speech and Language Processing, Accepted, To appear in March 2010.

Year 2009 : 2 citations

 Behringer R., Stansbie A., Stavropoulos N., Ward M. (2009). Information Computing Technology (ICT) for Music Composition and Seamless Performance Interfaces, Electronic Music Review, Vol.12, March 2009 (online magazine).

 Wendelboe M. (2009). “Bestemmelse af melodien i polyfon musik”, MSc Thesis, University of Copenhagen, Denmark.

Year 2008 : 7 citations

 Casey M. A., Veltkamp R., Goto M., Leman M., Rhodes C. and Slaney M. (2008). “Content-Based Music Information Retrieval: Current Directions and Future Challenge”, Proceedings of the IEEE, Vol. 96, No. 4. (April 2008), pp. 668-696.

 Leman M. (2008). “Systematic musicology at the crossroads of modern music research”, in Systematic and Comparative Musicology: Concepts, Methods, Findings, ed. Schenider A., Musikwissenschaftliches Institut der Universität Hamburg.

 Martins L. G. (2008). A Computational Framework for Sound Segregation in Music Signals. PhD Thesis, University of Porto, Portugal.

 Morten Wendelboe (2008). “Udtræk af melodien fra et musikalsk nummer”. Techical Report. University of Copenhagen.

 Ryynänen M. (2008). “Automatic Transcription of Pitch Content in Music and Selected Applications”. PhD Thesis, Tampere University of Technology, Finland.

 Styns F. and Leman M. (2008). “Sound, sense and music mediation: a historical-philosophical perspective“. In Polotti P. & Rocchesso D. (Eds.) Sound to Sense - Sense to sound - A state of the art in Sound and Music Computing.(pp.15-46). Berlin: Logos.

 Wendelboe M. (2008). “Udtræk af melodien fra et musikalsk nummer (Extracting the melody from a musical number)”. Technical Report, Copenhagen University.

Year 2007 : 1 citations

 Moelants D., Cornelis O., Leman M., Gansemans J., De Caluwe R., De Tré G., Matthé T., Hallez A. (2007). "The Problems and Opportunities of Content-based Analysis and Description of Ethnic Music?. International Journal of Intangible Heritage, Vol. 2, pp. 57-68.

Year 2006 : 1 citations

 1. Moelants, Dirk, et al. "Problems and opportunities of applying data-& audio-mining techniques to ethnic music." (2006): 334-336.