Skip to main content

Deep Research & Query Classification

Ask an agent "what's 2+2" and it should answer instantly. Ask it "what are the latest treatment options for drug-resistant tuberculosis" and it should go dig through medical literature, cross-reference sources, identify gaps, and come back with citations.

The problem: most agents treat both queries the same way. Either they web-search everything (slow, wasteful) or they never search at all (hallucinate freely).

Wunderland solves this with an LLM-as-judge classifier that runs before the main LLM turn. A cheap, fast model (gpt-4o-mini, claude-haiku, or qwen2.5:3b) inspects the query and assigns a research depth tier. The agent then behaves accordingly.

Research Depth Tiers

TierWhen it triggersWhat happensBudget
noneGreetings, simple facts, code help, creative writingLLM answers from training data. No tools called.0s, 0 searches
quickWeather, stock price, "what is X", latest news1-2 web searches, cite sources30s, 10 searches
moderateProduct comparisons, travel recs, "best X for Y"Multi-source research with researchAggregate or researchInvestigate2min, 20 searches
deepMedical, legal, scientific, financial planning, learning plansFull deep_research pipeline: decompose, search, extract, gap-analyze, synthesize9min, 50 searches

The classifier defaults to none on failure. A broken classifier never blocks the main conversation.

The Research Tools

Six tools form the research stack, from lightweight to heavy:

ToolPurposeSide effects
web_searchSingle web search queryNone
news_searchSearch recent news articlesNone
researchInvestigateTargeted investigation of a specific topicNone
researchAcademicSearch academic papers and scholarly sourcesNone
researchAggregateAggregate findings across multiple search resultsNone
deep_researchFull 3-phase pipeline with recursive decompositionNone

The classifier decides which tools to inject into the prompt based on the depth tier. quick gets web_search and news_search. moderate gets researchAggregate and researchInvestigate. deep triggers the full deep_research tool.

The 3-Phase Deep Research Pipeline

When deep_research runs, it follows a structured process:

  User query


┌─────────────────────┐
│ Phase 1: DECOMPOSE │ LLM breaks query into sub-questions
│ (small model) │ Creates a research tree
└──────────┬──────────┘


┌─────────────────────┐
│ Phase 2: SEARCH │ For each sub-question:
│ - web_search │ 1. Search multiple source types
│ - Extract pages │ 2. Extract content from top results
│ - Analyze gaps │ 3. Identify what's still missing
│ - Recurse │ 4. Generate new sub-questions for gaps
└──────────┬──────────┘ (repeat until budget exhausted)


┌─────────────────────┐
│ Phase 3: SYNTHESIZE │ Mid-tier LLM merges all findings
│ (primary model) │ into a structured report with citations
└─────────────────────┘

Phase 2 iterates. Each iteration searches, extracts, analyzes gaps, and optionally spawns child queries to fill those gaps. The number of iterations depends on depth:

DepthDefault iterationsMax searchesMax extractionsMax LLM callsTime limit
quick1105330s
moderate3201082 min
deep65025209 min

A ResearchBudgetTracker enforces hard caps on all dimensions. When any budget is exhausted, the engine moves to synthesis with whatever findings it has.

LLM-as-Judge Auto-Classifier

Enabled by default. Before every chat turn, the classifier runs a fast LLM call with a structured prompt:

You are a query complexity classifier. Given a user query, classify it
into ONE of these research depth tiers: none, quick, moderate, deep.

Respond with ONLY a JSON object:
{"depth": "none|quick|moderate|deep", "reasoning": "one sentence why"}

The classifier uses the cheapest available model:

  • OpenAI: gpt-4o-mini
  • Gemini: gemini-2.0-flash-lite
  • Ollama: qwen2.5:3b

Classification typically adds 200-400ms to the turn. The result is cached per query.

Override Patterns

Explicit prefixes bypass the classifier entirely:

InputResolved depth
/research what are the best hiking trails in Patagoniamoderate
/deep explain the neurochemistry of psilocybindeep
Regular message (no prefix)Auto-classified

HTTP API

Body Fields

{
"message": "What are the latest advances in mRNA vaccine technology?",
"research": true
}
research valueEffect
trueForces moderate depth
"deep"Forces deep depth
"quick"Forces quick depth
omittedAuto-classified (default)

You can also disable auto-classification per request:

{
"message": "Hello",
"autoClassify": false
}

Streaming Research Progress

When "stream": true is set, research progress events are pushed as SSE:

curl -N -X POST http://localhost:3777/chat \
-H "Content-Type: application/json" \
-d '{"message": "Explain CRISPR gene editing safety concerns", "research": "deep", "stream": true}'

Progress events arrive as event: progress with a SYSTEM_PROGRESS payload:

event: progress
data: {"type":"SYSTEM_PROGRESS","toolName":"deep_research","phase":"decomposing","message":"Decomposing query into sub-questions","progress":0.1}

event: progress
data: {"type":"SYSTEM_PROGRESS","toolName":"deep_research","phase":"searching","message":"Searching sources \"CRISPR off-target effects\" (iter 1/6, 3 findings)","progress":0.3}

event: reply
data: {"type":"REPLY","reply":"## CRISPR Gene Editing Safety Concerns\n\n..."}

See the HTTP Streaming API guide for the full SSE protocol.

Configuration

agent.config.json

{
"research": {
"autoClassify": true,
"minDepthToInject": "quick"
}
}
FieldTypeDefaultDescription
research.autoClassifybooleantrueEnable the LLM-as-judge classifier
research.minDepthToInject"none" | "quick" | "moderate" | "deep""quick"Minimum classified depth before research tools are injected. Set to "moderate" to skip quick searches.

Setting autoClassify: false disables all automatic research — the agent only researches when you explicitly use /research, /deep, or pass the research body field.

Setting minDepthToInject: "moderate" means queries classified as quick are answered from training data. Only moderate and deep queries trigger tool injection.

Example Queries

QueryExpected classificationBehavior
"Hello, how are you?"noneDirect LLM response
"What's the weather in Tokyo?"quickSingle web search
"Best laptop for machine learning under $2000"moderateMulti-source comparison
"What are the treatment options for stage 3 non-small cell lung cancer?"deepFull research pipeline with medical literature
"/research top AI startups in 2026"moderate (forced)Multi-source research
"/deep history of the Byzantine Empire"deep (forced)Full pipeline

Key Files

FilePurpose
packages/wunderland/src/runtime/research-classifier.tsLLM-as-judge classifier
packages/agentos-extensions/registry/curated/research/deep-research/src/engine/DeepResearchTool.tsITool wrapper for the engine
packages/agentos-extensions/registry/curated/research/deep-research/src/engine/DeepResearchEngine.tsCore research pipeline
packages/agentos-extensions/registry/curated/research/deep-research/src/engine/ResearchBudgetTracker.tsBudget enforcement
packages/agentos-extensions/registry/curated/research/deep-research/src/engine/types.tsType definitions
packages/agentos-extensions/registry/curated/research/deep-research/src/tools/investigate.tsresearchInvestigate tool
packages/agentos-extensions/registry/curated/research/deep-research/src/tools/academic.tsresearchAcademic tool
packages/agentos-extensions/registry/curated/research/deep-research/src/tools/aggregate.tsresearchAggregate tool