Tool for automatically downloading and transcribing podcasts
Find a file
2025-01-23 16:37:16 +10:30
lib Support using Vosk instead of Whisper for transcripts 2025-01-23 15:50:05 +10:30
.gitignore Ignore TODO 2025-01-21 23:00:39 +10:30
LICENSE Initial commit 2025-01-23 15:57:00 +10:30
process Support using Vosk instead of Whisper for transcripts 2025-01-23 15:50:05 +10:30
README.md Add README 2025-01-23 16:37:16 +10:30
update-config Save config value to auto-download new episodes 2025-01-23 13:39:09 +10:30

Podkastomat

🧰 A tool for automatically downloading and transcribing podcasts, with the option to translate them into English.

📚 Built for language learners.

Usage

To add a podcast:

./update-config add

This will ask you for the name, language code, and the RSS URL of the podcast, as well as whether you want the main process to automatically download the latest episode of this podcast when it runs.

To configure translating podcasts from a given language - in this example, German - into English (without this step, podcasts will be transcribed but not translated).

./update-config translate de

To process all configured podcasts

./process

To fetch and process the earliest 3 episodes of a particular podcast

./process 'some podcast' old 3

Online help for additional options is available via ./process --help

Downloaded episodes, and generated transcripts and translations, will be stored in podcasts/{language}/{podcast_name}. E.g.: podcasts/de/mission_klima_-_lösungen_für_die_krise

Manual configuration

Feel free to edit the config.json file

System Requirements

  • Linux (but it probably works on other platforms 🤷)
  • Python 3 (tested on 3.8.10)
  • One or both of the following:
    • Whisper (supports transcripts and translations)
    • Vosk (supports transcripts)
  • Mutagen

Notes

Whisper (used by default for transcriptions and translations) is quite slow and resource intensive.
It may be worth running the process script overnight, e.g. as a cron job.