Tag

gemini

8 posts

← All tags

Apr 17, 2026

AI Typer V2 — v0.5.5

Voice dictation with single-pass multimodal cleanup. Gemini 3 Flash is the new default; Debian package available.

dictation voice-to-text gemini openrouter

Apr 17, 2026

How much bitrate does an audio-multimodal LLM actually need?

An eval across 12 audio-capable models on OpenRouter, 5 MP3 bitrates, 4 dictation samples. The default upload quality in most dictation apps is 2–4× overprovisioned — and the more interesting finding had nothing to do with bitrate.

audio llm transcription eval

Mar 26, 2026

Testing Gemini 3.1 Flash Lite's Audio Understanding With 49 Structured Prompts

I built a 49-prompt test suite to evaluate Gemini 3.1 Flash Lite's audio understanding capabilities across 13 categories — from accent detection to deception analysis. Here's what worked, what didn't, and why it matters.

AI Gemini audio multimodal

Mar 25, 2026

One Prompt AI Book: can Gemini 2.5 write a book in a single prompt?

Testing whether Gemini 2.5's 65K token output limit can produce a full book from one prompt. Spoiler: Anthropic did it better.

Projects AI Open Source Experiments

Mar 25, 2026

Using Gemini's vision capabilities for body language analysis

I built a body language analysis app using Google AI Studio's vibe coding interface and Gemini's multimodal vision. Upload a photo, get expert-level analysis.

Projects AI Gemini Vision AI

Mar 25, 2026

Policy Visualiser: exploring global policy approaches with AI-powered clustering

An AI-powered React app that analyzes how different countries approach policy challenges, with interactive clustering visualizations powered by Gemini.

Projects AI Open Source Gemini

Mar 25, 2026

Voice Blog Creator: turning voice recordings into polished blog posts with Gemini

An automated pipeline that converts raw voice recordings into polished blog posts using audio preprocessing, Gemini transcription, and AI-powered formatting.

Projects AI Open Source Gemini

Mar 25, 2026

Voice Analyzer: an AI-powered voice analysis tool built with Gemini

A voice analysis application built with Google AI Studio and the Gemini API, exploring multimodal AI capabilities for audio processing.

Projects AI Open Source Gemini