Polyglot Playlists: The Global Pop Music Canon as a Language Learning Tool

Song has long held a role in the language acquisition process — from tribal call and response rituals to parents singing lullabies to children — but adult language learners often face two key challenges when seeking music to incorporate into their learning process: (1) recommendation algorithms will continue to recommend a learner content in languages the learner is already familiar with and (2) identifying songs at a learner’s level is time consuming and requires detailed knowledge of what is curricularly important. To understand, this project first builds a Global Pop Music Corpus (GPMC) of 1000 songs by popular artists in each of the 10 most spoken languages pulling from a range of public sources and APIs, including Spotify, YouTube, MusicBrainz, Wikidata and more. The GPMC is then enhanced by genre and style metadata for songs as well as word frequency data from Twitter and OpenSubtitles. By conducting Natural Language Processing (NLP) analysis on lyrics from the Global Pop Music Corpus, this paper explores several key questions. First, what languages have pop music that lends themselves best to this kind of learning approach across language learning levels? Second, what genres and styles of music are most conducive to this learning approach across language learning levels? And third, how often do lyrics provide vocabulary in sequences compatible with proven language acquisition techniques like “cloze learning”, “spaced repetition”, or “comprehensible input”? We find examples of songs with High Educational Potential (HEP) across genres and languages, and provide recommendations on how this work could be useful to independent learners as well as educators.

To top