• Image via mikrofon-vergleich.de

The algorithm behind music

Move over, American Idol.

The next big success story in the music industry won’t be discovered in high profile talent competitions. Instead, it will be identified in data sets by complex algorithms designed to uncover usage and business trends.

On the surface, this method sounds dry and more devoid of emotion than Simon Cowell’s critiques, but it’s actually the ultimate way the public selects “the next big thing.” Every time the public clicks on YouTube links, posts concert photos on Twitter, or chats about bands on Facebook, they contribute to a body of information called big data. The term refers to a collection of data sets that are large and contain complex interrelationships. Think about the structure of social media networks. They contain millions of individual user profiles that are linked together by friendships, ‘likes’, group memberships, and so on.Essentially, big data mirrors the structure of these platforms.

In the music industry, big data is generated by activities like online sales, downloads, and communication conducted through apps or social media environments. Metrics measured include “the amount of times songs are played or skipped, as well as the level of traction they receive on social media based on actions such as Facebook likes and tweets.” Analytic tools determine the overall popularity of fan pages and register positive or negative comments about artists. Together, this information identifies current trends, assesses the digital pulse of artists, and leads to sales through singles, merchandise, concert tickets, and even subscriptions to music streaming services.

In terms of discovering new talent, big data plays an important role in generating interest at major record labels. In many cases, companies tally an artist’s page views, ‘likes’, and followers. Then, numbers can easily be compared against other artists in the same genre. Once an act has generated a hundred thousand plus Facebook or Twitter followers, talent managers take notice and start drumming up interest within the music industry itself.

Big data selecting the next big Top 40 hit

The ability to identify current trends and predict the next megastar comes with large financial rewards for everyone involved. For instance, data scientists studied the impact of social media on iTunes album and track sales by comparing one’s metrics with the other’s revenue. They concluded that social media activity correlates to an increase in album and track sales. More specifically, YouTube views have the largest impact on sales; a finding that prompted many record labels to upload large budget music videos onto the platform to promote singles. Before spending millions on video production, analysis is used to identify which songs are likely to become hits based on the online activities of targeted audiences. The accuracy of these predictions is correlated to the quality of big data analysis.

Entrepreneurs within the music industry are now experimenting with new methods to develop algorithms that harvest information with greater efficiency and accuracy. One of the most notable examples is a joint venture between EMI Music and Data Science London called The EMI Million Interview Dataset. It is described as “one of the richest and largest music appreciation datasets ever made available – a massive, unique, rich, high-quality dataset compiled from global research that contains interests, attitudes, behaviours, familiarity, and appreciation of music as expressed by music fans.”

David Boyle, Senior Vice President for Insight at EMI Music, explains, “(It is) comprised of a million interviews broaching topics like level of passion for a particular music genre and sub-genre, preferred methods for music discovery, favorite music artists, thoughts on music piracy, music streaming, music formats, and fan demographics.”

The goal of the project is to release this collection of information to the public and improve the quality of business within the music industry.

“We’ve had great success using data to help us and our artists understand consumers, and we’re excited to share some of our data to help others do the same,” says Boyle.

In 2012, EMI Music and Data Science London took the project one step further by hosting the Music Data Science Hackathon. EMC, a world leader in data science and big data solutions, joined the venture and provided IT infrastructure. Over a 24-hour period, 175 data scientists developed 1,300 formulas and algorithms to answer the question: “Can you predict if a listener will love a new song?” The results hinted at the power of collective intelligence and participants developed formulas that were described as world class.

“The insights revealed in this hackathon hint at the power and potential that Big Data holds – both for intellectual discovery and for incremental business value for organizations of every kind,” says Chris Roche, Regional Director for EMC Greenplum.

But how do you pay the artists?

After the industry has determined a song has hit potential and releases it as a single, how does it calculate royalties when the song is played on social media platforms or streaming sites? Right now, “record labels of all sizes face a growing problem of having to reconcile reams of data from streaming companies like Spotify, Deezer, and YouTube, but have fewer people than ever to do so.”

One of the central challenges from an information management perspective is that most database management systems were not developed to handle data sets that are as large and complex as big data. For instance, the size of digital data files generated by music distributors are far beyond what programs like Excel can handle. This creates problems including missing data and file labels that are not compatible with accounting software.

In most cases, all of these issues are sorted out by accountants, adding additional time and labor to an already heavy work load. In many cases, a large percentage of a label’s overhead is tied up in the accounting department.

To combat these issues, entrepreneurs develop business intelligence platforms that have the capacity to organize and analyze big data. One of the best examples is the Austrian company Rebeat, who describe their services as “royalty accounting with three clicks.” Founded in 2006, it has quickly grown into Europe’s leading digital distributor and provides access to 300 digital services worldwide. Essentially, Rebeat streamlines accounting practices and handles backend work, like matching data fields in accounting software, so the accounting department is free to manage budgets. They also provide an infrastructure to manage royalty payments in accordance with contractual agreements, direct agreements with digital music stores, generate graphs to track sales, and most importantly, export data into CSV files.

Of course, the service comes with a price. Forbes reported that record labels must use Rebeat as a distributor so they can access company data, which costs a 15% sales commission and fixed fee of $649 each year. Estimates suggest, however, that in most cases a label’s accounting overlay often costs far more, which means that signing with Rebeat could turn out to be a money saver.


The full impact of big data on the music industry is still uncharted territory. There is still much to be learned about the potential of big data on sales and the way it will influence the creation and distribution of music in any genre. 

Venkat Viswanathan, management consultant and CEO LatenView, predicts that in the future there will be fewer record labels and a continued dominance of digital sales. “With inexpensive music-streaming services like Spotify, which claims to have 20 million users and five million paying subscribers, the line between owning and listening to music will continue to blur.

I also expect to see a growing number of tools to help analyze data faster and better, in such areas as predictive analytics and data conversion,” says Venkat. Some worry that big data may dehumanize the production of music by treating it as a mathematical equation as opposed to an art form. 

However, others say that big data is based entirely on a human element because it isn’t generated by machines but by consumers themselves; it reflects the true opinions of music lovers and fans. At the end of the day, music discovery has always been linked to social behaviors.As the world becomes increasingly digital in nature, many social interactions happen online.

As long as analysts continue to integrate personal taste and recommendation into their projects in a seamless manner, algorithms will continue to harvest the voices of fans around the globe.  

Forecasted start year: 
2012 to 2015
Disqus Comment Count: Comments


Load comments