HyDE Retrieval

HyDE improves RAG and memory retrieval by generating a hypothetical answer before embedding. Instead of embedding the raw user query, HyDE first asks an LLM to produce a plausible answer, then embeds that answer for vector search. The hypothesis is semantically closer to actual stored documents than a question is, yielding better recall.

Based on:

Gao et al. 2023 "Precise Zero-Shot Dense Retrieval without Relevance Labels"
Lei et al. 2025 "Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support"

How It Works

Standard:  Query --> Embed(query)       --> Vector Search --> Results
HyDE:     Query --> LLM(hypothesis)    --> Embed(hypothesis) --> Vector Search --> Results
                    ^                         ^
                    Extra LLM call            Better semantic match

The key insight: questions and answers live in different regions of embedding space. A question like "What causes memory leaks in Node?" is far from the answer text "Memory leaks in Node.js are caused by...". But a hypothetical answer generated from the question is much closer to the stored answer, producing higher cosine similarity scores.

When to Use HyDE

Good candidates:

Knowledge base queries where the question phrasing differs from document style
Vague or exploratory queries ("that thing about deployment")
Memory recall where stored traces are statement-form, not question-form
Background/batch processing where latency is less critical

Avoid when:

Real-time chat with tight latency budgets (adds one LLM call per query)
Simple keyword-style lookups where direct embedding already works well
The query is already in statement/answer form

Configuration

agent.config.json

HyDE is configured per-request, not globally. The HydeRetriever class and its config types are exported from @framers/agentos/rag.

{
  "rag": {
    "hyde": {
      "enabled": true,
      "initialThreshold": 0.7,
      "minThreshold": 0.3,
      "thresholdStep": 0.1,
      "adaptiveThreshold": true,
      "maxHypothesisTokens": 200,
      "fullAnswerGranularity": true
    }
  }
}

Configuration Options

Option	Type	Default	Description
`enabled`	`boolean`	`false`	Master switch for HyDE
`initialThreshold`	`number`	`0.7`	Starting similarity threshold
`minThreshold`	`number`	`0.3`	Lowest threshold before giving up
`thresholdStep`	`number`	`0.1`	How much to reduce threshold per step
`adaptiveThreshold`	`boolean`	`true`	Enable step-down when no results found
`maxHypothesisTokens`	`number`	`200`	Max tokens for hypothesis generation
`fullAnswerGranularity`	`boolean`	`true`	Generate full prose answers vs keywords

Programmatic API

1. RetrievalAugmentor (main RAG pipeline)

import { RetrievalAugmentor } from '@framers/agentos/rag';

const augmentor = new RetrievalAugmentor();
await augmentor.initialize(config, embeddingManager, vectorStoreManager);

// Register an LLM caller for hypothesis generation
augmentor.setHydeLlmCaller(async (systemPrompt, userPrompt) => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userPrompt },
    ],
    max_tokens: 200,
  });
  return response.choices[0].message.content ?? '';
});

// Enable HyDE per-request
const result = await augmentor.retrieveContext('What causes memory leaks?', {
  hyde: {
    enabled: true,
    // Optional: pre-supply a hypothesis to skip the LLM call
    // hypothesis: 'Memory leaks are caused by...',
    // Optional: tune thresholds for this request
    // initialThreshold: 0.8,
    // minThreshold: 0.4,
  },
});

// HyDE diagnostics are in the result
console.log(result.diagnostics?.hyde);
// {
//   hypothesis: 'Memory leaks in Node.js are typically caused by...',
//   hypothesisLatencyMs: 342,
//   effectiveThreshold: 0.7,
//   thresholdSteps: 0,
// }

import { MultimodalIndexer, HydeRetriever } from '@framers/agentos/rag';

const indexer = new MultimodalIndexer({
  embeddingManager,
  vectorStore,
  visionProvider,
});

// Attach a HyDE retriever
indexer.setHydeRetriever(new HydeRetriever({
  llmCaller: myLlmCaller,
  embeddingManager,
  config: { enabled: true },
}));

// Search with HyDE
const results = await indexer.search('architecture diagram', {
  modalities: ['image'],
  hyde: { enabled: true },
});

3. CognitiveMemoryManager (memory recall)

import { CognitiveMemoryManager, HydeRetriever } from '@framers/agentos';

const memoryManager = new CognitiveMemoryManager();
await memoryManager.initialize(config);

// Attach a HyDE retriever
memoryManager.setHydeRetriever(new HydeRetriever({
  llmCaller: myLlmCaller,
  embeddingManager,
  config: { enabled: true },
}));

// Retrieve memories with HyDE
const result = await memoryManager.retrieve(
  'that deployment discussion',
  currentMood,
  { hyde: true },
);

4. Standalone HydeRetriever

import { HydeRetriever } from '@framers/agentos/rag';

const retriever = new HydeRetriever({
  llmCaller: async (system, user) => {
    // Your LLM call here
    return hypotheticalAnswer;
  },
  embeddingManager,
  config: {
    enabled: true,
    adaptiveThreshold: true,
    initialThreshold: 0.7,
    minThreshold: 0.3,
  },
});

// Generate hypothesis only
const { hypothesis, latencyMs } = await retriever.generateHypothesis(
  'What is retrieval augmented generation?',
);

// Full retrieve cycle with adaptive thresholding
const result = await retriever.retrieve({
  query: 'What is RAG?',
  vectorStore: myVectorStore,
  collectionName: 'knowledge-base',
});

Adaptive Thresholding

HyDE supports adaptive threshold stepping: if no results are found at the initial similarity threshold, it steps down until content is found or the minimum threshold is reached. This ensures HyDE never "comes up empty."

Initial threshold: 0.7  -->  No results
Step down to:      0.6  -->  No results
Step down to:      0.5  -->  Found 3 results!  (stop here)

The thresholdSteps diagnostic tells you how many steps were needed.

Audit Trail

When includeAudit: true is passed to retrieveContext(), HyDE operations appear in the audit trail with operation type 'hyde':

const result = await augmentor.retrieveContext(query, {
  hyde: { enabled: true },
  includeAudit: true,
});

const hydeOp = result.auditTrail?.operations.find(
  (op) => op.operationType === 'hyde',
);
// hydeOp.hydeDetails.hypothesis
// hydeOp.hydeDetails.effectiveThreshold
// hydeOp.hydeDetails.thresholdSteps
// hydeOp.tokenUsage (embedding + LLM tokens)

Performance Implications

Metric	Without HyDE	With HyDE
LLM calls per query	0	1
Embedding calls	1	1 (hypothesis instead of query)
Vector searches	1	1-N (N = adaptive steps)
Typical added latency	0	200-500ms (LLM generation)
Recall improvement	baseline	+10-30% on vague queries

The LLM call uses a small, fast model by default (configured via the caller). Using gpt-4o-mini or similar keeps latency under 300ms for most queries.

Graceful Degradation

HyDE degrades gracefully in all failure scenarios:

No LLM caller registered: Falls back to direct query embedding with a diagnostic message.
LLM call fails: Falls back to direct query embedding.
Hypothesis embedding fails: Falls back to direct query embedding.
No results at any threshold: Returns empty results (same as without HyDE).

The system never throws due to HyDE failures -- it always falls back to the standard retrieval path.

How It Works​

When to Use HyDE​

Configuration​

agent.config.json​

Configuration Options​

Programmatic API​

1. RetrievalAugmentor (main RAG pipeline)​

2. MultimodalIndexer (cross-modal search)​

3. CognitiveMemoryManager (memory recall)​

4. Standalone HydeRetriever​

Adaptive Thresholding​

Audit Trail​

Performance Implications​

Graceful Degradation​