Gemini Transcription MCP: audio transcription as an MCP tool

I dictate a huge amount of my work — blog posts, documentation, even code comments — and the quality of my transcription pipeline directly affects my productivity. I'd been using various Whisper-based tools, but when Gemini started showing impressive multimodal audio capabilities, I wanted to bring that power directly into my Claude Code workflow without context-switching to a separate app. So I built an MCP server that makes Gemini-powered transcription available as a tool call from any MCP-compatible client. The result is Gemini Transcription MCP, and it's become one of my most-used MCP tools.

danielrosehill/Gemini-Transcription-MCP ★ 0

MCP for Gemini multimodal audio transcription with built in post-processing

TypeScriptUpdated Apr 2026

audio-multimodaldictationgeminigemini-mcpmcp

Seven tools for different transcription needs

Rather than shipping a single "transcribe this" tool, I built seven specialised variants because I found that different situations call for genuinely different transcription approaches. There's a lightly edited transcript tool that removes filler words and cleans up spoken language into readable text — my default for most work. A raw verbatim tool for when you need every "um" and "ah" preserved (useful for research or accessibility work). A VAD-preprocessed tool that strips silence before sending to Gemini, which helps with recordings that have long pauses. A format tool that can transform the transcript into specific output formats like emails or to-do lists in a single pass. A large file tool that compresses oversized audio to Opus first, handling the size limits gracefully. A custom prompt tool for when you want full control over the transcription instructions. And even a devspec tool that formats transcriptions as development specifications for AI coding agents — because I frequently dictate project requirements and want them structured for Claude Code consumption.

Flexible input and deployment

All tools accept audio via base64-encoded content, HTTP/HTTPS URLs, or SCP from a remote host for local deployments. Format support is broad — MP3, WAV, OGG, FLAC, AAC, and more, with automatic conversion for formats like Opus and WebM that Gemini doesn't handle natively. For Claude Code, setup is a single command. It's also available as an npm package and a Docker image for remote deployments with MCP aggregators like MetaMCP — the Docker image includes ffmpeg for format conversion, and the HTTP transport mode exposes a standard MCP endpoint plus a health check. The practical upshot: I can be working in Claude Code, point it at an audio file or a URL to a recording, and get a clean transcript without leaving my workflow. Later I built Cloud ASR MCP as a multi-backend evolution, but this Gemini-specific server remains my go-to for its reliability and the quality of Gemini's audio understanding. Available on npm and GitHub.

danielrosehill/Cloud-ASR-MCP View on GitHub

danielrosehill/Gemini-Transcription-MCP ★ 0

MCP for Gemini multimodal audio transcription with built in post-processing

TypeScriptUpdated Apr 2026

audio-multimodaldictationgeminigemini-mcpmcp