Image Segmentation (SAM2 / GroundedSAM)
Turn an image plus a prompt (text, point, box, or "segment everything") into pixel masks, hosted through Replicate. Masks drop straight into image editing and CLIP region search.
Table of Contents
Overview
segment() is a provider-agnostic factory. It accepts an image and exactly one
prompt, runs the appropriate model, and returns one SegmentMask per detected
region. Geometric prompts (point, box, automatic) route to SAM2; open-vocabulary
text prompts route to a GroundedSAM chain.
| Capability | Description |
|---|---|
| Text prompt | "the chimney" → masks for matching regions (hosted GroundedSAM) |
| Automatic | "Segment everything" → masks for every salient region (hosted SAM2) |
| Point prompt | Foreground/background clicks (needs a coordinate-capable provider) |
| Box prompt | A bounding box → the tight mask inside it (needs a coordinate-capable provider) |
| Mask convention | White = object, black = background (drops into editImage) |
segment() API
import { segment } from '@framers/agentos';
const result = await segment({
image: imageBuffer, // Buffer | Uint8Array | file path
prompt: 'the leather sofa', // exactly one prompt mode
});
for (const m of result.masks) {
console.log(m.bbox, m.score, m.label);
}
interface SegmentOptions {
image: Buffer | Uint8Array | string;
provider?: 'replicate' | string; // default 'replicate'
model?: string;
// exactly one prompt mode per call:
prompt?: string; // text -> GroundedSAM
points?: Array<{ x: number; y: number; label?: 'foreground' | 'background' }>;
box?: { x: number; y: number; width: number; height: number };
automatic?: boolean; // "segment everything"
maxMasks?: number; // cap on returned masks (automatic/text can produce many)
minScore?: number; // confidence floor
providerOptions?: Record<string, unknown>;
userId?: string;
}
Exactly one prompt mode must be set. Setting zero or more than one throws
InvalidSegmentationPromptError.
Prompt Modes
The hosted Replicate provider supports text (GroundedSAM) and automatic
(SAM2 "segment everything"). Point and box prompts are part of the API
surface for a coordinate-capable provider (such as a future local SAM2 provider);
the Replicate provider returns SegmentationModeNotSupportedError for them.
// Text (open vocabulary) — GroundedSAM
await segment({ image, prompt: 'all the windows' });
// Automatic — every salient region, capped at 10 — SAM2
await segment({ image, automatic: true, maxMasks: 10 });
// Point / box — require a coordinate-capable provider (not hosted Replicate)
await segment({ image, points: [{ x: 320, y: 210, label: 'foreground' }] });
await segment({ image, box: { x: 40, y: 40, width: 200, height: 160 } });
Result Shape
interface SegmentMask {
mask: Buffer; // PNG; white = object, black = background
bbox: { x: number; y: number; width: number; height: number };
score: number; // 0–1
label?: string; // grounding phrase for text prompts
index: number;
}
interface SegmentationResult {
masks: SegmentMask[];
width: number; // source image dimensions
height: number;
providerId: string;
modelId: string;
promptMode: 'text' | 'points' | 'box' | 'automatic';
usage?: { totalMasks: number; totalCostUSD?: number };
durationMs: number;
}
An empty masks array is a valid (non-error) result — nothing matched the
prompt or cleared minScore.
Provider Setup
The Replicate provider reads REPLICATE_API_TOKEN from the environment.
export REPLICATE_API_TOKEN=r8_...
Defaults: text prompts use schananas/grounded_sam; automatic uses meta/sam-2.
Override per call or via provider options:
await segment({
image,
automatic: true,
model: 'meta/sam-2',
providerOptions: { replicate: { pollIntervalMs: 1000, timeoutMs: 120000, input: { points_per_side: 32 } } },
});
Custom backends implement ISegmentationProvider and register via
registerSegmentationProvider(id, provider).
Consumer Round-Trips
Masks feed two existing AgentOS surfaces.
Mask-guided editing — replace the segmented region with editImage:
import { segment, maskToEditMask, editImage } from '@framers/agentos';
const { masks } = await segment({ image, prompt: 'the floor' });
const mask = await maskToEditMask(masks, { target: 'object' });
const edited = await editImage({ image, mask, mode: 'inpaint', prompt: 'oak parquet flooring' });
maskToEditMask accepts one mask or many (unioned). target: 'background'
inverts so everything except the object is edited.
Region cutout and search — alpha-cut a sprite, then CLIP-embed it:
import { segment, cropRegion, createVisionPipeline } from '@framers/agentos';
const { masks } = await segment({ image, automatic: true });
const vision = await createVisionPipeline({ strategy: 'local-only', tier1: { enableCLIP: true } });
for (const m of masks) {
const cutout = await cropRegion(image, m); // transparent PNG of just that object
const { embedding } = await vision.embed(cutout);
// upsert `embedding` into a vector store for "find similar region" search
}
Errors
| Error | When |
|---|---|
InvalidSegmentationPromptError | Zero or more than one prompt mode supplied |
SegmentationModeNotSupportedError | The provider does not support the resolved mode |
SegmentationProviderError | Provider/network failure (code: 'provider_failed'), poll timeout (code: 'timeout'), or malformed model id (code: 'invalid_request') |
Scope
Shipped: hosted Replicate provider — text (GroundedSAM) and automatic
(SAM2) — plus the maskToEditMask and cropRegion bridges.
Not in this surface: coordinate (point/box) prompting, which needs a coordinate-capable provider; a local/offline SAM provider; in-browser WebGPU segmentation; and video / cross-frame tracking.
Related Documentation
- Image Editing — consumes segmentation masks for inpainting
- Vision Pipeline — OCR, layout, and CLIP embeddings
- Image Generation — provider-agnostic generation