In this data engineering project, I develop an end-to-end solution to analyse my listening history by building a pipeline to extract my scrobbles from the last.fm API, process and model the data before finally visualising the results in a Tableau dashboard.
This is a batch pipeline scheduled to run each day by extracting the previous day's scrobbles and loading them into Google Cloud Storage, to be processed further on. The pipeline can also be configured to capture scrobbles between specified date points, such as to retrieve the last few years worth of data. This then feeds into a Tableau dashboard, which aims to provide some insight into my listening habits. Docs for the API can be found here: 🎵
A website which records your listening history, with integrations into Spotify, Deezer etc. I've had an account for a very long time, so it was great to use that historic data in a project. Please support them by making an account and syncing it with your streaming app of choice. A scrobble is a record of what you listened to, on an individual track level, so each scrobble equals one track played. In addition to the name of the track, each scrobble also contains information about the time you listened to it, name of the artist and album, genre and so on.
- What sort of genres do I listen to the most?
- At what time of the day do I listen to the most music?
- Has my listening preferences changed over the years?
- How many tracks do I listen to on average per day?
The data is extracted from the last.fm API, which is essentially the same listening data held on my last.fm profile. I use the user.getrecenttracks and artist.gettoptags methods, retrieving a record of my scrobbling history including: timestamps, artist, album, track, associated MusicBrainz IDs and artist genre tags.
To follow-along and run the pipeline in your own environment, please see the below sections. You may need to adjust some of the instructions if you're not running it on a Linux machine.
Section 1 - Environment Set-up
Section 2 - Infrastructure deployment
Section 4 - Mage Orchestration
Section 5 - Modelling with dbt