ASR Training Data Collector: a GUI for gathering speech recognition training data
A GUI tool for collecting audio training data for ASR fine-tuning, with LLM-generated prompts and Hugging Face integration.
Page 7 of 12 · 138 posts total
A GUI tool for collecting audio training data for ASR fine-tuning, with LLM-generated prompts and Hugging Face integration.
An MCP server for audio transcription using multimodal LLMs like Gemini, GPT-4o Audio, and Voxtral — not traditional ASR.
An MCP server that brings Gemini-powered audio transcription directly into Claude Code and Claude Desktop.
A desktop transcription app that sends audio directly to multimodal AI models for single-pass transcription and formatting.
A local voice typing app for Linux/Wayland using NVIDIA's Parakeet model. No cloud, no GPU, built-in punctuation.
A snapshot comparing Hebrew TTS quality across six providers, including voice cloning experiments via Replicate.
A concept for capturing end-of-day work progress via voice memos, processing them with Gemini AI, and delivering morning briefings.
An experiment using AI agents to simulate geopolitical dialogue between state actors, non-state actors, and civil society in the Middle East.
A multi-agent system template for conducting comprehensive software and hardware technology evaluations using Claude Code.
An experimental Model United Nations simulation where AI agents embody country positions, vote on resolutions, and analyze bilateral impacts.
An experimental multi-agent system that simulates expert panel discussions, analyzing complex topics through multiple analytical lenses.
An experiment in perspective synthesis — using AI agents to simulate a conference where diverse personas deliver speeches on AI's impact.