Grounding Guard

RAG-grounded hallucination detection using a hybrid NLI cross-encoder and LLM-as-judge pipeline. Verifies that claims in LLM output are faithful to retrieved source documents.

Package: @framers/agentos-ext-grounding-guard

Overview

The Grounding Guard extension provides two modes of operation:

Passive protection via a built-in guardrail that automatically verifies response faithfulness against RAG source documents during streaming and on final response
Active capability via an agent-callable tool (check_grounding) for on-demand grounding verification

It checks each claim in the agent's response against the retrieved source chunks:

Supported -- claim is entailed by at least one source document
Contradicted -- claim directly contradicts a source document
Unverifiable -- claim cannot be found in any source (potential hallucination)

The verification pipeline uses two tiers:

Tier 1: NLI Cross-Encoder (cross-encoder/nli-deberta-v3-small, ~40MB INT8) -- fast entailment/contradiction detection at ~30ms per claim-source pair
Tier 2: LLM-as-Judge -- chain-of-thought verification for ambiguous claims (~150--500ms, only when configured)

Only runs when RAG sources are present. No sources means no verification (no-op).

Prerequisites

The grounding guard requires RAG source plumbing -- specifically, ragSources must be present on the GuardrailOutputPayload. This is populated automatically by AgentOS when RAG retrieval is performed for a request.

The ragSources field contains RagRetrievedChunk[] from the RetrievalAugmentor, threaded from the GMI through the response stream to the guardrail layer. When no RAG retrieval was performed, the field is undefined and the guardrail is a no-op.

Installation

npm install @framers/agentos-ext-grounding-guard

The NLI model requires @huggingface/transformers (already an AgentOS dependency):

npm install @huggingface/transformers

Usage

Direct factory usage

import { AgentOS } from '@framers/agentos';
import { createGroundingGuardPack } from '@framers/agentos-ext-grounding-guard';

const groundingPack = createGroundingGuardPack({
  entailmentThreshold: 0.7,
  contradictionThreshold: 0.7,
  maxUnverifiableRatio: 0.5,
  contradictionAction: 'flag',
  llm: {
    provider: 'anthropic',
    model: 'claude-haiku-4-5-20251001',
    apiKey: process.env.ANTHROPIC_API_KEY,
  },
});

const agent = new AgentOS();
await agent.initialize({
  ...config,
  manifest: { packs: [{ factory: () => groundingPack }] },
});

Manifest-based loading

await agent.initialize({
  manifest: {
    packs: [
      {
        package: '@framers/agentos-ext-grounding-guard',
        options: {
          contradictionAction: 'block',
          maxUnverifiableRatio: 0.3,
        },
      },
    ],
  },
});

ClaimExtractor

Decomposes response text into atomic factual claims for grounding verification using a two-tier extraction approach.

Tier 1: Heuristic Sentence Splitting

For simple sentences (20 words or fewer, single clause):

Split on sentence boundaries (. , ? , ! , \n)
Filter non-factual content (questions, hedges, meta-commentary, greetings, code blocks)
Simple sentences pass through as-is

Heuristic filters (sentences that are NOT factual claims):

Questions: ends with ?
Hedges: starts with "I think", "maybe", "perhaps", "it seems", "I believe"
Meta: "I hope this helps", "let me know if", "feel free to", "here's"
Greetings: "hello", "hi there", "sure!", "great question", "of course"
Code blocks: anything inside triple-backtick fences

Tier 2: LLM Decomposition

For complex sentences (>20 words or multiple clauses):

Detected by: word count > 20, independent-clause conjunctions (", and ", "; ", " while ", " however "), or 3+ numbers/proper nouns.

Sent to a lightweight LLM with a structured decomposition prompt that returns an array of atomic factual claims.

When no LLM is configured, all sentences use heuristic mode (no decomposition).

GroundingChecker

Verifies claims against source documents using the two-tier pipeline.

Tier 1: NLI Cross-Encoder

Model: cross-encoder/nli-deberta-v3-small (~40MB, INT8 quantized)
Input: pipeline({ text: claim, text_pair: source_text })
Output: entailment / contradiction / neutral scores

For each claim, all top N source chunks (default 5, sorted by relevance score) are compared in parallel. The best-matching source is returned as the attribution:

Claim: "The API rate limit is 1000 req/min"

Source 1 (relevance 0.91): "Premium users get 1000 requests per minute"
  -> NLI: entailment 0.92 -> SUPPORTED

Source 2 (relevance 0.85): "Free tier is limited to 500 req/min"
  -> NLI: contradiction 0.78 -> (not best match, source 1 wins)

Best match: Source 1, verdict: SUPPORTED, confidence: 0.92

Tier 2: LLM-as-Judge Escalation

When NLI scores are ambiguous (neither entailment nor contradiction above threshold), the claim is escalated to an LLM with a chain-of-thought verification prompt:

What does the claim assert?
Does the source explicitly support, contradict, or not address this?
Are there any subtle contradictions or missing context?

Returns: verdict (supported/contradicted/unverifiable), confidence, and reasoning.

If no source produces entailment or contradiction above threshold and no LLM is configured, the verdict is unverifiable.

Streaming and Final Evaluation

Streaming Phase (TEXT_DELTA)

During streaming, the guardrail buffers text at sentence boundaries:

Append TEXT_DELTA to sentence buffer
On sentence boundary: extract sentence, filter non-factual content
For factual claims: run NLI against top-5 ragSources
Contradiction > 0.7: FLAG immediately (fast first-pass)
Entailment > 0.7: pass (supported)
Neutral: defer to final phase

Final Phase (isFinal / FINAL_RESPONSE)

On stream completion, a comprehensive check runs:

Extract ALL claims from the full response text (heuristic + LLM decomposition for complex sentences)
For each claim: NLI against ragSources (top-5 per claim)
Ambiguous claims: escalate to LLM-as-judge
Aggregate results:
- Any contradicted claim: FLAG or BLOCK (per contradictionAction)
- Unverifiable ratio > maxUnverifiableRatio: FLAG (per unverifiableAction)
- All supported: pass

Verdicts

Verdict	Meaning	Trigger
`supported`	Claim is entailed by at least one source	NLI entailment > threshold
`contradicted`	Claim directly contradicts a source	NLI contradiction > threshold, or LLM confirms contradiction
`unverifiable`	Claim not found in any source	Neither entailment nor contradiction above threshold; LLM says unverifiable

Configuration

`GroundingGuardOptions`

Option	Type	Default	Description
`nliModelId`	`string`	`'cross-encoder/nli-deberta-v3-small'`	NLI cross-encoder model ID.
`entailmentThreshold`	`number`	`0.7`	NLI score threshold for entailment (SUPPORTED).
`contradictionThreshold`	`number`	`0.7`	NLI score threshold for contradiction (CONTRADICTED).
`maxUnverifiableRatio`	`number`	`0.5`	Maximum fraction of unverifiable claims before flagging.
`contradictionAction`	`'flag' \| 'block'`	`'flag'`	Action when a contradiction is detected.
`unverifiableAction`	`'flag' \| 'block'`	`'flag'`	Action when unverifiable ratio is exceeded.
`llm`	`{ provider, model, apiKey?, baseUrl? }`	—	LLM config for claim decomposition and ambiguous escalation. When omitted, heuristic-only claims + NLI-only verification.
`maxSourcesPerClaim`	`number`	`5`	Max source chunks to compare each claim against.
`enableStreamingChecks`	`boolean`	`true`	Enable streaming sentence-level NLI checks. When false, only the final comprehensive check runs.
`quantized`	`boolean`	`true`	Use INT8 quantized NLI model for lower memory.
`guardrailScope`	`'input' \| 'output' \| 'both'`	`'output'`	Grounding verification only applies to output.

Agent Tools

`check_grounding`

On-demand grounding verification. Lets agents proactively verify claims against source text before including them in responses.

Agent: Let me verify this synthesized answer is grounded in the sources.
-> check_grounding({
    text: "The API rate limit is 1000 req/min",
    sources: ["Premium users get 1000 requests per minute."]
  })
<- {
    grounded: true,
    claims: [{
      claim: "The API rate limit is 1000 req/min",
      verdict: "supported",
      confidence: 0.92,
      bestSource: { chunkId: "tool-source-0", content: "Premium users...", score: 0.92 },
      escalated: false
    }],
    totalClaims: 1,
    supportedCount: 1,
    contradictedCount: 0,
    unverifiableCount: 0,
    unverifiableRatio: 0.0,
    summary: "1/1 claims supported"
  }

The tool accepts sources: string[] (plain text) for simplicity. These are wrapped as synthetic RagRetrievedChunk objects internally.

Reason Codes

Reason Code	Trigger	Metadata
`GROUNDING_CONTRADICTION`	Claim contradicts a source	Per-claim verification results, best source
`GROUNDING_UNVERIFIABLE`	Too many claims not found in sources	Unverifiable ratio, claim details
`GROUNDING_NO_SOURCES`	Informational flag when response has no RAG sources	--

Memory Impact

Component	Memory	When Loaded
NLI model (nli-deberta-v3-small INT8)	~40MB	First grounding check
ClaimExtractor (heuristic rules)	~5KB	Pack load
Per-stream sentence buffer	~1KB per stream	First TEXT_DELTA
Total	~40MB	--

The NLI model is lazy-loaded and shared via ISharedServiceRegistry. If another extension uses the same model, zero additional memory.

Graceful Degradation

Condition	Behavior
No `ragSources` on payload	No-op -- returns null (cannot ground without sources)
NLI model fails to load	Falls back to LLM-only (if configured), else no-op
LLM not configured	NLI-only mode -- heuristic claims, no decomposition, no ambiguous escalation
Both NLI and LLM unavailable	No-op with warning logged
Empty response (no claims extracted)	Returns null
`ragSources` present but empty array	Returns null (no sources to compare against)

Overview​

Prerequisites​

Installation​

Usage​

Direct factory usage​

Manifest-based loading​

ClaimExtractor​

Tier 1: Heuristic Sentence Splitting​

Tier 2: LLM Decomposition​

GroundingChecker​

Tier 1: NLI Cross-Encoder​

Tier 2: LLM-as-Judge Escalation​

Streaming and Final Evaluation​

Streaming Phase (TEXT_DELTA)​

Final Phase (isFinal / FINAL_RESPONSE)​

Verdicts​

Configuration​

GroundingGuardOptions​

Agent Tools​

check_grounding​

Reason Codes​

Memory Impact​

Graceful Degradation​

Related Documentation​