Backups are important for a number of reasons. However, lately, much of our data has moved to the cloud. This brings the need to also backup our data from those online services.
With this in mind, I’ve been building some solutions to keep my most relevant data periodically backup from the cloud to some system I have control.
One of the services that I use and consider meaningful to backup is YouTube. My main concern are the playlists. I have many of them, but I don’t care much about the actual video files, so this isn’t about downloading them all. The point is to have some peace of mind about having all the titles and URLs permanently stored somewhere.
Getting the JSON for each playlist
youtube-dl is mostly famous for downloading videos, it has many more capacities. One of them is exactly what we want for our project. With the right options, you can easily get a JSON file with all the videos from a playlist, including their title, channel, URL, description and more. You can use something like this:
youtube-dl --flat-playlist --dump-json PLAYLIST_URL > output.json
Just remember that you may need to provide cookies for your YouTube account in case of your playlists being private.
Then, combine it with a command to get all the playlists, some Bash scripting and in a moment we’ll have all our playlists stored as JSON.
So now we have a initial script. But what about next executions?
At first, I thought of storing each backup in a separate directory. It did work well, but I quickly thought about a better way: version control.
These are simple plain text files, which are perfect for a git repository. Storing them as such will give us some nice stuff such as automatic deduplication and easier diff.
After all that, we’re left to schedule the backup, so we can forget about it and let it run automatically. For that, I created a user systemd service that runs weekly.
If you are interested in the code, feel free to take a look.
Note: A bug on youtube-dl causes the listing to only include the first 100 videos of each playlist. Check that bug report for possible solutions while this is not fixed.
Why not just use Google Takeout?
Instead of writing this script, we could use Google Takeout. Among other data, it provides a CSV file with the URLs from your playlists. However:
- I’d like to also have some info about each video, such as the title. This makes easier to read and manipulate the data. Also, this info could be invaluable if any of those videos are gone.
- More importantly, Google Takeout is, at best, a semi-automatic process. It can send you an e-mail every 2 months with a link to a page for you to go and download the data. I’d like to completely automate the process of backup.
- Lastly, I’d like to schedule the backup to be more frequently than bi-monthly.