WCAG-minded captions — reading speed, sound tags, and burned-in contrast
WCAG success criteria around captions (for example 1.2.4 Captions for live audio, 1.2.5 Audio description where relevant) describe what users need in the player — not which file extension you upload. Still, there is a straight line from those requirements to measurable properties in your SRT, WebVTT, or TTML: reading rate, line breaks, speaker identification, and non-speech information.
Reading speed and line length
WCAG does not mandate a single CPS number for every language — but if viewers cannot finish a line before it disappears, the caption fails its purpose. Use the platform targets in reading speed for captions as a practical baseline, then tighten for dense dialogue or education content.
Sound tags and SDH habits
Sound cues in square brackets — [door slams], [music playing] — help deaf and hard-of-hearing audiences when mixing is muddy. Some social hosts strip bracketed text or reject non-dialogue lines; know your receiver before you bake SDH conventions into a deliverable that must pass a strict uploader.
Burned-in (open) captions and contrast
When captions are part of the pixels — see burned-in vs soft subtitles — WCAG contrast guidance for text still applies visually. Small white-on-yellow type over a busy scene is technically captions and practically unreadable. If you must burn in, favor high-contrast outlines or backgrounds and avoid ultra-thin weights.
Automation without losing intent
QA tools (including CaptionPass) can flag overlaps, impossible durations, and aggressive line length — but they cannot judge artistic intent. Treat automated scores as a safety net, then have a human pass for speaker changes, music lyrics rights, and tone.
Automate technical QA via the HTTP API when you need structured report output alongside normalized text.
More guides
- SRT vs VTT — when each format silently failsComma vs dot timestamps, WEBVTT headers, and where YouTube, TikTok, and HTML5 bite.
- Caption file encoding — UTF-8, BOM, and garbled textWhy uploads show mojibake or blank cues: UTF-8 vs legacy encodings and quick fixes.
- Burned-in vs soft subtitles — what to deliver whenOpen captions burned into the picture vs separate SRT/VTT tracks — tradeoffs for editors and clients.
- Reading speed for captions — CPS, line length, and platformsCharacters per second, lines per cue, and where YouTube, TikTok, and HTML5 push back.
- Why your captions are not showing — a triage guideHTML5, YouTube, and TikTok checks when subtitles vanish after upload.
- Fix overlapping subtitlesWhat overlap means and why some players drop overlapping cues.
- TTML and DFXP — broadcast-style timed text on the webNamespaces, timing, styling stripped in practice, and when TTML is the right interchange vs SRT or WebVTT.
- CaptionPass JSON IR and the developer-json presetLossless-ish cue interchange for tooling: when to use JSON IR, version tag, and how it pairs with the HTTP API.
- Timecode, frame rate, and caption syncWhy captions drift or jump: drop-frame vs non-drop, fractional frame rates, and export settings that survive upload.