SRT vs VTT, and where each one quietly breaks

SRT and VTT look similar enough that many editors treat them as interchangeable. They are not. Here is the short, opinionated version of when each format wins, and where the differences cause silent failures during delivery.

The format-level differences

Timestamp punctuation. SRT uses a comma: 00:01:23,456. VTT uses a dot: 00:01:23.456. Mix them and the cue is treated as invalid.
Mandatory header. A WebVTT file must begin with WEBVTT on the first line. SRT has no header.
Styling. VTT supports cue tags, regions, and STYLE blocks. SRT supports a narrow subset of HTML-ish tags that most players ignore.
Identifiers. SRT cues are numbered sequentially. VTT cue identifiers are optional and freeform.

Where this bites in delivery

YouTube accepts both formats but is unforgiving on timestamps. A single comma in a VTT file usually causes the cue to silently drop. HTML5 video players reject any VTT file missing the WEBVTT header without surfacing a useful error. TikTok's caption upload is the strictest of the three; it will reject any file with styling tags or overlapping cues.

Quick rules

HTML5 player on the open web — ship VTT.
YouTube — either format, but SRT is fewer foot-guns.
TikTok or social — SRT, plain text, no styling tags.
Anything else — CaptionPass's “Generic safe” preset.

Want a one-shot fix and a list of what was broken in your file? Run it through CaptionPass.