Guitar Karaoke

back
Do what you like to do -
not what others like
- Me
Mii Chat

It's like I can reverse engineer music now?
creepy... (⁀⊙﹏☉⁀)

Guitar Karaoke

I recently got a guitar and found out that most of the guitar learning tools cost money, subscriptions even. It took me ~2 months to learn how to play by ear using trial-error, therefore I made this program for myself where I use Demucs, Whisper, ffmpeg and some tinkering to make this shell script that produces one mkv file that has the original version, a vocal karaoke version, a guitar karaoke version and a backing track along with automatic lyrics. The mkv video file can be played on any device that has VLC or a similar media player installed.

Ever since I got my guitar, I've been paying more attention to the Instruments rather than the song as a whole. After a long time (I didn't like the song earlier), I listened to this song → Blur - Song 2. I felt like I could try playing along and get better at playing guitar. That's when I remembered what demucs could do and started working on this project.

Table of contents →

  1. Files
  2. Splitting the song into individual instruments
  3. Auto-Generate Lyrics and Mixing
  4. Custom Player
  5. Demo video on YouTube

Files

  • yt-dlp is the tool I used to download the song from youtube. Youtube's best audio format is in .webm, so I downloaded the song in that format.
  • I used ranger, a terminal-based file manager as the UI to choose the song.
  • Since demucs uses ffmpeg in the background and converts any file to .wav, I did no conversion after selecting the file.
  • Initialised the necessary variables and directories.

Splitting the song into individual instruments

  • I used demucs by Meta, to split the song into various stems as seen here. I was mainly interested in two of them -
    • htdemucs_6s → [6s - 6 stems] This is the most versatile demucs model. It can split up a song into these stems.
    • htdemucs_ft → [ft - Fine Tuned] This one is arguably the better model in terms of raw accuracy as it is a bag of 4 models, but it misses one crucial thing I am interested in. The guitar stem...
  • I was in a fix what model to use. That's when I got an idea. I tested it out with multiple songs and came up with these equations. As seen in the image below, I used bass.wav, drums.wav and vocals.wav from demucs_ft and other.wav, piano.wav from demucs_6s.
Mii Chat

I was so excited when I figured this out that I got up to get a glass of water (rare event)

Mii Chat

AAC (.m4a) is the only thing popularized by Apple I can accept
Oh, and maybe ipods

Auto-Generate Lyrics

  • Since we now had vocals.wav, we can directly use whisper, by OpenAI to auto-generate the lyrics. This is not very promising, but is okay for now.
  • Once the venv setup was done, I used the cli to get the lyrics for a song using the following code snippet.
  • It produced a .srt file. It's just like a simple text file with timestamps of the subtitles, in our case, lyrics.
Mii Chat

Tch (ᗒᗣᗕ)՞
What a bummer

Mii Chat

I should load these to my ipod once it finishes charging...
It's been charging for a month now (⋟﹏⋞)

Custom Player

  • Now that the file can be played anywhere, I thought of making a custom Karaoke Player using PyGame.
  • It took a while, but it was ready. A custom player where all instruments react in their own way and also enable or disable. Here is the video.
Mii Chat

Damn, time to play that song by
Mazzy Star