2 min read

Subtitle Edit and whisper.cpp STT on AMD (and other non-Nvidia?) GPUs with Vulkan

AMD GPUs don't have to be second-class citizens.
Subtitle Edit and whisper.cpp STT on AMD (and other non-Nvidia?) GPUs with Vulkan

Subtitle Edit (as of the latest version, 4.0.15) has quite good support for speech-to-text (STT) transcription with various AI models, including whisper.cpp and particularly on modern NVIDIA GPUs with Faster-Whisper. As someone who probably has a bit of hearing impairment and loves subtitles in any case, I love the ability to generate subtitles for content that otherwise doesn't have any.

I recently upgraded from an NVIDIA RTX 3070 to an AMD Radeon RX 9070 XT. However, Subtitle Edit support for Whisper-based transcription on other GPU platforms (particularly AMD) is not as great. It does offer the ability to use the const-me Whisper model, which describes itself as "a Windows port of the whisper.cpp implementation". However, that project appears to be relatively dead on Github, with the last commits in 2023. Meanwhile, the whisper.cpp project sees constant development. The latest update as of this writing, 1.8.3, made some headlines for claiming a "12x performance boost with integrated graphics."

Most notably, whisper.cpp offers broad support across a broad variety of platforms (Linux, Windows, Android, even WebAssembly) and with both CPU and GPU support, most notably the Vulkan API, which historically AMD has supported well on its GPUs.

It turns out that putting together a Vulkan-enabled whisper.cpp build on Windows is relatively easy by following the directions, especially with Visual Studio (the Community release works just fine; make sure you install the C++ Build Tools). You'll also need the free and open-source Vulkan SDK installed (I used the latest version, 1.4.341.1 as of this writing). Make sure you have the -DGGML_VULKAN=1 flag enabled. If you don't feel like compiling it yourself, I've provided my Vulkan-enabled build of whisper.cpp version 1.8.3 for download below.

Once you have the build, simply copy all the files to Subtitle Edit's whisper.cpp directory (typically C:\Users\<username>\AppData\Roaming\Subtitle Edit\Whisper\Cpp, overwriting as necessary. Now you should be able to use Subtitle Edit's audio-to-text feature using whisper.cpp and the same models as usual. On my 9070 XT, I am achieving roughly 7.5-8x realtime transcription speed when testing with real-world content using the large models, including 40-45 minute episodes of TV shows.