CaptionPass roadmap

Where we're going

Today's Alpha is a focused delivery QA layer: upload timed text, pick a platform preset, get a clean file and an explainable report. Long term, CaptionPass is meant to become a caption processing platform—modular by design so observability, collaboration, and a true multi-format engine can stack without rewriting the foundation.

North star: four layers

The product vision stacks capabilities the same way serious media stacks ingest → intelligence → engine → workflow:

Ingest — uploads, jobs, optional audio/video when we add speech pipelines.
Speech & language intelligence — transcription, diarization, translation-with-retiming (later).
Caption engine — semantic timeline model, rules, multi-format export (“caption compiler”).
Collaboration & observability — review workflows, diffing, metrics, alerts—enterprise-grade guardrails.

Evolution sequence (planned)

Rather than shipping three unrelated products, we intend to grow in layers—each phase funds the next:

1
Observability & QA depth
Expand deterministic checks and summary scores into ongoing caption health: readability, drift, density, confidence aggregation where models exist. Dashboards and alerts follow—‘Datadog for captions’ as the wedge into teams that already burn time on manual review.
2
Collaboration-lite → Caption IDE
Structured review: draft → reviewed → approved; timeline-anchored comments; caption diffs (Git-like clarity over vague autosave). Real-time multi-editor sync is the hardest lift—planned after workflow primitives prove retention.
3
Multi-format caption compiler
Write once, deploy everywhere: SRT, WebVTT, broadcast-oriented outputs (e.g. CEA-608/708), styled overlays where platforms allow, burned-in renders via FFmpeg where needed; style tokens and platform previews where feasible.
4
Platform maturity
API-first usage, batch at scale, enterprise SSO, audit logs, SLA-oriented reporting—the natural extension once the core engine and workflows are trusted.

Strategic pillars (why each matters)

Pillar	Competition	Differentiator
Multi-format engine	Editors export; few treat formatting as portable.	Compiler mindset—semantic timeline in, platform-safe artifacts out.
Collaboration layer	Docs-lite collab in generic video tools.	Caption-native workflows: diff, approvals, roles—not generic comments only.
Observability	Almost nothing dedicated to caption quality metrics.	Defined KPIs, regression alerts, trust in automation—especially for accessibility and live.

What Alpha includes today

Multi-format ingest: SRT, WebVTT, SBV, ASS/SSA, TTML/IMSC/DFXP, and CaptionPass JSON IR.
Deterministic fixes + validation report (errors, warnings, named applied fixes).
Delivery presets: YouTube, TikTok/shorts-style, HTML5/WebVTT, LMS/TTML (IMSC1-friendly), Generic safe, and Developer JSON IR for automation.
Ephemeral processing—no accounts required for the free tier.

Explicitly not in Alpha: transcription, broadcast SCC output, team workspaces, persistent metrics stores, or realtime co-editing—those belong to later phases above.

Honest constraints

Collaboration and rich observability raise infra and engineering cost (real-time sync, durable metrics). We will ship them incrementally so margins stay healthy and scope stays honest—see Pricing and internal cost docs for hosting assumptions.

Enterprise & partnerships

For roadmap input, volume licensing, or integrations, email Sharkey@captionpass.com.

Ready to try the current build? Back to the tool.

North star: four layers

Evolution sequence (planned)

Observability & QA depth

Collaboration-lite → Caption IDE

Multi-format caption compiler

Platform maturity

Strategic pillars (why each matters)

What Alpha includes today

Honest constraints

Enterprise & partnerships