Whisper fine-tuning resources I keep coming back to
A short curated list of the best Whisper fine-tuning resources: tutorials, notebooks, and managed compute examples.
Page 6 of 11 · 130 posts total
A short curated list of the best Whisper fine-tuning resources: tutorials, notebooks, and managed compute examples.
Evaluating whether fine-tuning Whisper improves transcription accuracy. Spoiler: it depends on model size and use case.
A script for fine-tuning OpenAI's Whisper speech recognition models using Modal's serverless GPU infrastructure.
A voice-controlled Linux virtual keyboard using Deepgram's Flux turn-taking STT API, built in Rust.
A GUI tool for collecting audio training data for ASR fine-tuning, with LLM-generated prompts and Hugging Face integration.
An MCP server for audio transcription using multimodal LLMs like Gemini, GPT-4o Audio, and Voxtral — not traditional ASR.
An MCP server that brings Gemini-powered audio transcription directly into Claude Code and Claude Desktop.
A desktop transcription app that sends audio directly to multimodal AI models for single-pass transcription and formatting.
A local voice typing app for Linux/Wayland using NVIDIA's Parakeet model. No cloud, no GPU, built-in punctuation.
A snapshot comparing Hebrew TTS quality across six providers, including voice cloning experiments via Replicate.
A concept for capturing end-of-day work progress via voice memos, processing them with Gemini AI, and delivering morning briefings.
An experiment using AI agents to simulate geopolitical dialogue between state actors, non-state actors, and civil society in the Middle East.