CaptionPass roadmap
Where we're going
Today's Alpha is a focused delivery QA layer: upload timed text, pick a platform preset, get a clean file and an explainable report. Long term, CaptionPass is meant to become a caption processing platform—modular by design so observability, collaboration, and a true multi-format engine can stack without rewriting the foundation.
North star: four layers
The product vision stacks capabilities the same way serious media stacks ingest → intelligence → engine → workflow:
- Ingest — uploads, jobs, optional audio/video when we add speech pipelines.
- Speech & language intelligence — transcription, diarization, translation-with-retiming (later).
- Caption engine — semantic timeline model, rules, multi-format export (“caption compiler”).
- Collaboration & observability — review workflows, diffing, metrics, alerts—enterprise-grade guardrails.
Evolution sequence (planned)
Rather than shipping three unrelated products, we intend to grow in layers—each phase funds the next:
- 1
Observability & QA depth
Expand deterministic checks and summary scores into ongoing caption health: readability, drift, density, confidence aggregation where models exist. Dashboards and alerts follow—‘Datadog for captions’ as the wedge into teams that already burn time on manual review.
- 2
Collaboration-lite → Caption IDE
Structured review: draft → reviewed → approved; timeline-anchored comments; caption diffs (Git-like clarity over vague autosave). Real-time multi-editor sync is the hardest lift—planned after workflow primitives prove retention.
- 3
Multi-format caption compiler
Write once, deploy everywhere: SRT, WebVTT, broadcast-oriented outputs (e.g. CEA-608/708), styled overlays where platforms allow, burned-in renders via FFmpeg where needed; style tokens and platform previews where feasible.
- 4
Platform maturity
API-first usage, batch at scale, enterprise SSO, audit logs, SLA-oriented reporting—the natural extension once the core engine and workflows are trusted.
Strategic pillars (why each matters)
| Pillar | Competition | Differentiator |
|---|---|---|
| Multi-format engine | Editors export; few treat formatting as portable. | Compiler mindset—semantic timeline in, platform-safe artifacts out. |
| Collaboration layer | Docs-lite collab in generic video tools. | Caption-native workflows: diff, approvals, roles—not generic comments only. |
| Observability | Almost nothing dedicated to caption quality metrics. | Defined KPIs, regression alerts, trust in automation—especially for accessibility and live. |
What Alpha includes today
- Multi-format ingest: SRT, WebVTT, SBV, ASS/SSA, TTML/IMSC/DFXP, and CaptionPass JSON IR.
- Deterministic fixes + validation report (errors, warnings, named applied fixes).
- Delivery presets: YouTube, TikTok/shorts-style, HTML5/WebVTT, LMS/TTML (IMSC1-friendly), Generic safe, and Developer JSON IR for automation.
- Ephemeral processing—no accounts required for the free tier.
Explicitly not in Alpha: transcription, broadcast SCC output, team workspaces, persistent metrics stores, or realtime co-editing—those belong to later phases above.
Honest constraints
Collaboration and rich observability raise infra and engineering cost (real-time sync, durable metrics). We will ship them incrementally so margins stay healthy and scope stays honest—see Pricing and internal cost docs for hosting assumptions.
Enterprise & partnerships
For roadmap input, volume licensing, or integrations, email Sharkey@captionpass.com.
Ready to try the current build? Back to the tool.