CaptionPass
AlphaYou're using an early release of CaptionPass. Features and limits may change; report bugs or ideas using the contact email in the footer or on Pricing.

CaptionPass roadmap

Where we're going

Today's Alpha is a focused delivery QA layer: upload timed text, pick a platform preset, get a clean file and an explainable report. Long term, CaptionPass is meant to become a caption processing platform—modular by design so observability, collaboration, and a true multi-format engine can stack without rewriting the foundation.

North star: four layers

The product vision stacks capabilities the same way serious media stacks ingest → intelligence → engine → workflow:

  1. Ingest — uploads, jobs, optional audio/video when we add speech pipelines.
  2. Speech & language intelligence — transcription, diarization, translation-with-retiming (later).
  3. Caption engine — semantic timeline model, rules, multi-format export (“caption compiler”).
  4. Collaboration & observability — review workflows, diffing, metrics, alerts—enterprise-grade guardrails.

Evolution sequence (planned)

Rather than shipping three unrelated products, we intend to grow in layers—each phase funds the next:

  • 1

    Observability & QA depth

    Expand deterministic checks and summary scores into ongoing caption health: readability, drift, density, confidence aggregation where models exist. Dashboards and alerts follow—‘Datadog for captions’ as the wedge into teams that already burn time on manual review.

  • 2

    Collaboration-lite → Caption IDE

    Structured review: draft → reviewed → approved; timeline-anchored comments; caption diffs (Git-like clarity over vague autosave). Real-time multi-editor sync is the hardest lift—planned after workflow primitives prove retention.

  • 3

    Multi-format caption compiler

    Write once, deploy everywhere: SRT, WebVTT, broadcast-oriented outputs (e.g. CEA-608/708), styled overlays where platforms allow, burned-in renders via FFmpeg where needed; style tokens and platform previews where feasible.

  • 4

    Platform maturity

    API-first usage, batch at scale, enterprise SSO, audit logs, SLA-oriented reporting—the natural extension once the core engine and workflows are trusted.

Strategic pillars (why each matters)

PillarCompetitionDifferentiator
Multi-format engineEditors export; few treat formatting as portable.Compiler mindset—semantic timeline in, platform-safe artifacts out.
Collaboration layerDocs-lite collab in generic video tools.Caption-native workflows: diff, approvals, roles—not generic comments only.
ObservabilityAlmost nothing dedicated to caption quality metrics.Defined KPIs, regression alerts, trust in automation—especially for accessibility and live.

What Alpha includes today

  • Multi-format ingest: SRT, WebVTT, SBV, ASS/SSA, TTML/IMSC/DFXP, and CaptionPass JSON IR.
  • Deterministic fixes + validation report (errors, warnings, named applied fixes).
  • Delivery presets: YouTube, TikTok/shorts-style, HTML5/WebVTT, LMS/TTML (IMSC1-friendly), Generic safe, and Developer JSON IR for automation.
  • Ephemeral processing—no accounts required for the free tier.

Explicitly not in Alpha: transcription, broadcast SCC output, team workspaces, persistent metrics stores, or realtime co-editing—those belong to later phases above.

Honest constraints

Collaboration and rich observability raise infra and engineering cost (real-time sync, durable metrics). We will ship them incrementally so margins stay healthy and scope stays honest—see Pricing and internal cost docs for hosting assumptions.

Enterprise & partnerships

For roadmap input, volume licensing, or integrations, email Sharkey@captionpass.com.

Ready to try the current build? Back to the tool.