June 1, 2024
I have a friend who has built up an amazing collection of songs on Spotify. They did this by over the years simply using the “Like” feature when they like a song. As a result, they have a collection of over a thousand songs representing their diverse music taste.
When I learned about this, I became jealous. I too wanted such a collection, but I didn’t want to wait years for it to build up into something formidable.
For a couple of years now I’ve been using Last.fm. It’s overall pretty nice: it tracks everything you listen to automatically, meaning you can set it up once and never touch it again and it’ll still continue collecting the data for you.
Since I was already using Last.fm, I thought why not I get my data from the site and put all the songs I’ve determined to “like” into my Liked Songs?
First I needed to get my data from Last.fm. I stumbled upon a project called lastfmstats.com. It’s a really cool website: I highly recommend checking it out for its own merits as the graphs and the analyses it provides are quite fascinating. For the sake of this post, however, we will be using it simply to get our data in a machine-readable format.
Simply type your Last.fm username and let it do its thing. Depending on how many scrobbles you have it may take a while for it to collect your data but after it finishes you have the option to download your scrobbles as a JSON file.
Now that we have our data, let’s get to processing it!
I used Python for my scripting, so I will use Python in this post as well, but really you can use anything.
To note ahead of time, “dicts” in Python are simply key-value stores. Since the scripts are written in Python, I will be using the word “dict” throughout the post, but just know it can be any key-value store in your respective language.
First, let’s load our JSON file into a variable called contents
:
Now let’s get how many times a track has been scrobbled:
Now we have a dict that maps each song we have ever listened to to how many times in total we’ve listened to it.
We don’t like every song we’ve ever scrobbled. Therefore, we need to narrow down our list to the tracks we actually like. We could manually review every song, but I decided to try to automate as much of this as possible.
My methodology was as follows:
Of course, everyone is different, so pick different numbers. I settled on these numbers by going through my Last.fm library and generalizing around what scrobble count I was confident in a song and what scrobble count it seemed more fuzzy.
There is one major flaw with this: just because you don’t listen to a song 10 or more times doesn’t necessarily mean you don’t like it. However, personally I was fine with this compromise; after all it would be unrealistic to expect a perfect collection from automation.
Here is an implementation of the filtering:
I then sorted filtered_tracks
by the total number of times I listened to a song purely for organization purposes, split the filtered_tracks
list into two lists: questionable_tracks
and for_sure_tracks
, and then dumped the contents of these two lists into two JSON files:
Now we end up with two files: for_sure_tracks.json
and questionable_tracks.json
.
I left the “for sure” tracks file as is. As for the file with the “questionable” tracks, I just went ahead and manually removed songs that I did not deem fit to be added to my Liked Songs collection.
And with that, filtering my data has been done!
We now have two files containing the songs we want to dump into our Liked Songs collection. Let’s now actually import our songs into Spotify!
First, get a token you can use to authenticate yourself with the API. I simply went to the Spotify for Developers website and scrolled down to the “See it in action” section, where there is an auth token of mine I could use to authenticate.
Next, I created a new playlist and found its ID by opening the playlist in the Spotify web app and investigating its URL:
Let’s now load back the two JSON files we created:
Now the next step we’d want to do is iterate through each song and add each song to our playlist. Seems easy right?
Things get slightly weird here.
If you wish to add a song to a playlist via the Spotify API, the API expects that you provide the Spotify URI of that song.
For example, the Spotify URI of “Never Gonna Give You Up” by Rick Astley is spotify:track:4PTG3Z6ehGkBFwjybzWkR8
As one can see, the Spotify URI for a track is composed of the song’s ID with a prefix of spotify:track:
But our data does not have the Spotify URIs of each track: we only know the track’s title and the artist.
This means that before we can add songs from our data to a playlist, we must figure out their Spotify URIs.
The Spotify API has a /search
endpoint which you can use to search songs by a query, similar to how you search normally. We can use this API endpoint to search for a song and then get its Spotify URI:
Then getting the Spotify URIs for each song should be a piece of cake:
Although this approach works well in cases where there are not many tracks, in cases where you are dealing with hundreds or possibly thousands of unique tracks, this method crumbles thanks to everyone’s favorite friend: rate limiting.
Spotify doesn’t have a fixed rate limit: it is calculated automatically based on the number of requests made in a 30 second time frame. This isn’t fun since this means that we can’t write a specific back off plan in order to avoid getting rate limited.
Additionally, the search API does not provide a way to batch requests, which means we have to send one HTTP request for every song, which scales terribly when you are dealing with large song counts as you will be forced to deal with rate limits.
When I first ran my script querying the Spotify API for every track, it worked for approximately the first 400 or 500 tracks, but then I got rate limited for 24 hours. This is not ideal whatsoever, so this option isn’t great.
If you’ve ever used the Last.fm website, you may have noticed that the webpages for tracks contain the URLs of various streaming services where you can listen to the track.
This is wonderful since this means that we can get our Spotify URIs from Last.fm and avoid dealing with the rate limits of Spotify.
Unfortunately, the Last.fm API doesn’t seem to have a method to retrieve these URLs, so we will resort to web scraping.
As of writing, the a
tag which links to the Spotify URL of the song has a class of play-this-track-playlink--spotify
. If this ever changes you can always use the DevTools of your browser to see what HTML attributes and classes can be used to identify the Spotify link.
Here is a method that retrieves the Spotify URI for a track via scraping Last.fm:
Although we’re still making one HTTP request for every song, Last.fm seems to be far more tolerant than Spotify. In my case I never ran into rate limits, but if you do end up running into rate limits from Last.fm, proxies could potentially be used as a workaround since we are simply making unauthenticated GET requests.
We now have two options to determine the Spotify URI of a track. Retrieving the URI via Last.fm is overall much better as it avoids the rate limits of Spotify. However, that doesn’t mean Option 1 won’t be used: not all Last.fm pages have the respective Spotify links, so for the tracks where Last.fm doesn’t provide a Spotify link, we will use Spotify’s API to search for the URI.
Now that we have a list of Spotify URIs, lets actually add them to our playlist:
You should end up with a playlist with all of the tracks that will compose your “Liked Songs” collection! At this point I recommend making sure that the right songs were added and fixing the songs that Spotify or Last.fm provided an incorrect Spotify URI for.
We have a nice playlist, but it’s a playlist: our goal is to add these songs to our Liked Songs collection.
This is just a matter of writing a script that fetches all the songs in our current playlist, then likes them so they get saved in our Liked Songs collection:
Once you run this, it will dump the contents of your playlist into your Liked Songs collection. And now, you finally have the Liked Songs collection of your dreams!
It took a bit of work, but at last we were able to create the Liked Songs collection based on what we listened to throughout the years.
One major downside to the approach I’ve outlined throughout this post is the heavy reliance on Last.fm. Not many people use the site, and even if you start now, it would take some time for it to build up the scrobbles (the same dilemma we faced when we wanted a Liked Songs collection).
One potential solution that could make something like this more accessible to all Spotify users is by requesting your entire listening history from Spotify and then processing the raw data from Spotify in a fashion similar to how we processed our raw Last.fm scrobbles. Interestingly, the same person who created lastfmstats.com also created spotifystats.app, which could be a potential source to use to make the raw Spotify data more friendly to process. Additionally, we probably wouldn’t have to deal with the issue of finding Spotify URIs as Spotify should provide them within the data. This is all speculation since I didn’t go this route, but it could be an interesting path to pursue.
I have open-sourced the three scripts I used in this post. You can check them out at Armster15/liked-songs-shenanigans-scripts