AI Typer V2 — v0.5.5
Voice dictation with single-pass multimodal cleanup. Gemini 3 Flash is the new default; Debian package available.
Tag
8 posts
Voice dictation with single-pass multimodal cleanup. Gemini 3 Flash is the new default; Debian package available.
An eval across 12 audio-capable models on OpenRouter, 5 MP3 bitrates, 4 dictation samples. The default upload quality in most dictation apps is 2–4× overprovisioned — and the more interesting finding had nothing to do with bitrate.
I built a 49-prompt test suite to evaluate Gemini 3.1 Flash Lite's audio understanding capabilities across 13 categories — from accent detection to deception analysis. Here's what worked, what didn't, and why it matters.
Testing whether Gemini 2.5's 65K token output limit can produce a full book from one prompt. Spoiler: Anthropic did it better.
I built a body language analysis app using Google AI Studio's vibe coding interface and Gemini's multimodal vision. Upload a photo, get expert-level analysis.
An AI-powered React app that analyzes how different countries approach policy challenges, with interactive clustering visualizations powered by Gemini.
An automated pipeline that converts raw voice recordings into polished blog posts using audio preprocessing, Gemini transcription, and AI-powered formatting.
A voice analysis application built with Google AI Studio and the Gemini API, exploring multimodal AI capabilities for audio processing.