Daniel Rosehill Hey, It Works!
LLM Detective: an undercover AI agent that evaluates other LLMs
· Daniel Rosehill

LLM Detective: an undercover AI agent that evaluates other LLMs

I built an AI agent that goes undercover to test other LLMs - probing for biases, guardrails, knowledge cutoffs, and behavioral patterns.

What if you could send an AI agent undercover to evaluate other language models? That's the idea behind LLM Detective — an experimental Python tool that conducts systematic investigations to assess capabilities, biases, guardrails, and behavioral patterns of target LLMs.

danielrosehill/LLM-Detective ★ 0

Agent that tries to probe other models' capabilities with conversation

PythonUpdated Oct 2025
ai-agentsevaluationsllms

How it works

The detective agent starts by doing pre-investigation research — automatically fetching model cards, capabilities documentation, and web search results about the target model. Then it conducts a series of tests, optionally simulating human interaction patterns with realistic typing delays and casual language to avoid triggering any special "talking to another AI" behavior.

There are 9 test categories: knowledge cutoff probing, vision and audio capability checks, bias detection, censorship pattern identification, guardrail triggering, conspiracy theory handling, positive reinforcement detection (catching models that are excessively enthusiastic), and agentic capability evaluation.

Multi-provider support

LLM Detective works with Ollama for local models, OpenAI, Anthropic, and any OpenAI-compatible API like OpenRouter. This means you can point it at virtually any accessible language model and get a standardized evaluation report. The reports are JSON-formatted with ratings on a 0-10 scale across multiple dimensions, plus full response logs.

Why this matters

As the number of available LLMs keeps growing, systematic evaluation becomes increasingly important. Whether you're choosing a model for production, assessing safety characteristics, or just curious about how different models behave under pressure, having an automated tool to run standardized tests is genuinely useful.

This is still experimental — future plans include meta-LLM evaluation for more sophisticated analysis, batch testing across multiple models, and a web dashboard for report visualization. Check out the repo on GitHub if you'd like to try it or contribute.

danielrosehill/LLM-Detective ★ 0

Agent that tries to probe other models' capabilities with conversation

PythonUpdated Oct 2025
ai-agentsevaluationsllms