Caption file encoding — UTF-8, BOM, and garbled text

Subtitle files are plain text. If the bytes on disk do not match what the uploader expects, you get replacement characters (“mojibake”), random symbols, or cues that parse as empty. The fix is almost always normalize to UTF-8 and know when a BOM matters.

UTF-8 is the default that actually works

Modern browsers, YouTube, and most pro tools assume UTF-8 for SRT and WebVTT. If your editor exported Latin-1, Windows-1252, or UTF-16, characters outside ASCII can display wrong after upload even though they looked fine in your timeline.

The UTF-8 BOM (byte order mark)

Some Windows tools write an invisible BOM at the start of a UTF-8 file. A few parsers treat that as part of the first line — which breaks the mandatory WEBVTT header on line one. If WebVTT mysteriously fails in HTML5 or a host rejects the file, open in a hex-friendly editor and check for a BOM before the first visible characters.

Quick checks before you ship

Re-export as UTF-8 without BOM when your tool offers the choice.
Open the raw .srt / .vtt in a conservative text editor — if curly quotes or em dashes already look wrong there, fix encoding before fixing captions.
After upload, if only some languages break, suspect a mixed-encoding merge (two files concatenated with different byte rules).

CaptionPass reads your upload as text and reports structural issues; running a problem file through the tool often catches encoding-related parse failures early. Try it on the home page.