Image Segmentation (SAM2 / GroundedSAM)

Turn an image plus a prompt (text, point, box, or "segment everything") into pixel masks, hosted through Replicate. Masks drop straight into image editing and CLIP region search.

Overview
segment() API
Prompt Modes
Result Shape
Provider Setup
Consumer Round-Trips
Errors
Scope

Overview

segment() is a provider-agnostic factory. It accepts an image and exactly one prompt, runs the appropriate model, and returns one SegmentMask per detected region. Geometric prompts (point, box, automatic) route to SAM2; open-vocabulary text prompts route to a GroundedSAM chain.

Capability	Description
Text prompt	"the chimney" → masks for matching regions (hosted GroundedSAM)
Automatic	"Segment everything" → masks for every salient region (hosted SAM2)
Point prompt	Foreground/background clicks (needs a coordinate-capable provider)
Box prompt	A bounding box → the tight mask inside it (needs a coordinate-capable provider)
Mask convention	White = object, black = background (drops into `editImage`)

segment() API

import { segment } from '@framers/agentos';

const result = await segment({
  image: imageBuffer,           // Buffer | Uint8Array | file path
  prompt: 'the leather sofa',   // exactly one prompt mode
});

for (const m of result.masks) {
  console.log(m.bbox, m.score, m.label);
}

SegmentOptions:

interface SegmentOptions {
  image: Buffer | Uint8Array | string;
  provider?: 'replicate' | string;   // default 'replicate'
  model?: string;

  // exactly one prompt mode per call:
  prompt?: string;                                   // text -> GroundedSAM
  points?: Array<{ x: number; y: number; label?: 'foreground' | 'background' }>;
  box?: { x: number; y: number; width: number; height: number };
  automatic?: boolean;                               // "segment everything"

  maxMasks?: number;     // cap on returned masks (automatic/text can produce many)
  minScore?: number;     // confidence floor
  providerOptions?: Record<string, unknown>;
  userId?: string;
}

Exactly one prompt mode must be set. Setting zero or more than one throws InvalidSegmentationPromptError.

Prompt Modes

The hosted Replicate provider supports text (GroundedSAM) and automatic (SAM2 "segment everything"). Point and box prompts are part of the API surface for a coordinate-capable provider (such as a future local SAM2 provider); the Replicate provider returns SegmentationModeNotSupportedError for them.

// Text (open vocabulary) — GroundedSAM
await segment({ image, prompt: 'all the windows' });

// Automatic — every salient region, capped at 10 — SAM2
await segment({ image, automatic: true, maxMasks: 10 });

// Point / box — require a coordinate-capable provider (not hosted Replicate)
await segment({ image, points: [{ x: 320, y: 210, label: 'foreground' }] });
await segment({ image, box: { x: 40, y: 40, width: 200, height: 160 } });

Result Shape

interface SegmentMask {
  mask: Buffer;        // PNG; white = object, black = background
  bbox: { x: number; y: number; width: number; height: number };
  score: number;       // 0–1
  label?: string;      // grounding phrase for text prompts
  index: number;
}

interface SegmentationResult {
  masks: SegmentMask[];
  width: number;       // source image dimensions
  height: number;
  providerId: string;
  modelId: string;
  promptMode: 'text' | 'points' | 'box' | 'automatic';
  usage?: { totalMasks: number; totalCostUSD?: number };
  durationMs: number;
}

An empty masks array is a valid (non-error) result — nothing matched the prompt or cleared minScore.

Provider Setup

The Replicate provider reads REPLICATE_API_TOKEN from the environment.

export REPLICATE_API_TOKEN=r8_...

Defaults: text prompts use schananas/grounded_sam; automatic uses meta/sam-2. Override per call or via provider options:

await segment({
  image,
  automatic: true,
  model: 'meta/sam-2',
  providerOptions: { replicate: { pollIntervalMs: 1000, timeoutMs: 120000, input: { points_per_side: 32 } } },
});

Custom backends implement ISegmentationProvider and register via registerSegmentationProvider(id, provider).

Consumer Round-Trips

Masks feed two existing AgentOS surfaces.

Mask-guided editing — replace the segmented region with editImage:

import { segment, maskToEditMask, editImage } from '@framers/agentos';

const { masks } = await segment({ image, prompt: 'the floor' });
const mask = await maskToEditMask(masks, { target: 'object' });
const edited = await editImage({ image, mask, mode: 'inpaint', prompt: 'oak parquet flooring' });

maskToEditMask accepts one mask or many (unioned). target: 'background' inverts so everything except the object is edited.

Region cutout and search — alpha-cut a sprite, then CLIP-embed it:

import { segment, cropRegion, createVisionPipeline } from '@framers/agentos';

const { masks } = await segment({ image, automatic: true });
const vision = await createVisionPipeline({ strategy: 'local-only', tier1: { enableCLIP: true } });

for (const m of masks) {
  const cutout = await cropRegion(image, m);     // transparent PNG of just that object
  const { embedding } = await vision.embed(cutout);
  // upsert `embedding` into a vector store for "find similar region" search
}

Errors

Error	When
`InvalidSegmentationPromptError`	Zero or more than one prompt mode supplied
`SegmentationModeNotSupportedError`	The provider does not support the resolved mode
`SegmentationProviderError`	Provider/network failure (`code: 'provider_failed'`), poll timeout (`code: 'timeout'`), or malformed model id (`code: 'invalid_request'`)

Scope

Shipped: hosted Replicate provider — text (GroundedSAM) and automatic (SAM2) — plus the maskToEditMask and cropRegion bridges.

Not in this surface: coordinate (point/box) prompting, which needs a coordinate-capable provider; a local/offline SAM provider; in-browser WebGPU segmentation; and video / cross-frame tracking.

Image Editing — consumes segmentation masks for inpainting
Vision Pipeline — OCR, layout, and CLIP embeddings
Image Generation — provider-agnostic generation

Table of Contents​

Overview​

segment() API​

Prompt Modes​

Result Shape​

Provider Setup​

Consumer Round-Trips​

Errors​

Scope​

Related Documentation​

Table of Contents