SRTLab 1.0

by Alexander Thomas (aka Dr. Lex)
Mail

What is it?

This is a Perl script that can perform certain operations on SubRip (.srt) subtitle files. For instance, it can scale and offset the time stamps of all subtitles based on pairs of current and expected time values. It can also check files for subtitles that appear overly long or too briefly, and attempt to fix the latter (which is of course not always possible).
It also offers an option to strip ‘hearing aid’ annotations from SDH subtitles (the typical lines like “[CLEARS THROAT]” for people with hearing problems). This is useful to convert a SDH into regular subtitles. It only works well if the annotations have a standard format though.
Something that often makes it harder to automatically remove hearing impaired annotations, or that makes subtitles more annoying to read, are errors introduced by the OCR process that converts DVD or BluRay subtitles to text. SRTLab has a feature that will correct many of the common OCR errors, if the subtitles are in English or a similar language.

After being stuck in 0.9x versions for years, I finally released this as version 1.0, because I consider it usable enough.

Download and Usage

GitHub

The script is hosted on GitHub.
For your convenience, here is a direct link to the file itself. Make sure to set execute permissions on the downloaded file.

SRTLab can be run under any environment that has a Perl interpreter, like Linux, Mac OS X, or Windows with Cygwin or a native Perl implementation.
Check the GitHub project page for additional information, or if you are a developer and want to contribute to the development.

The script itself prints the following usage information when invoked with ‘-h’:

srtlab [options] file1.srt [file2.srt ...] > output.srt
SRT file editing tool.
  Multiple input files are joined sequentially. Make sure that the first
    timestamp of each file comes after the last stamp of the previous.
Options:
  Time values must be in the format [-]HH:MM:SS.sss, or a floating-point number
    representing seconds.
  -e: in-place editing: overwrite first file instead of printing to stdout
    (BE CAREFUL!)
  -c: remove empty subtitles (empty = really empty, no whitespace characters).
  -s S: scale all timestamps.
    S can be a floating-point number or any of these shortcuts:
    NTSCPAL:  0.95904    = 23.976/25 (subs for NTSC framerate to PAL video)
    PALNTSC:  1.04270938 = 25/23.976 (PAL framerate to NTSC)
    NTSCFILM: 0.999      = 23.976/24 (NTSC framerate to film)
    PALFILM:  1.04166667 = 25/24     (PAL framerate to film)
    FILMNTSC: 1.001001   = 24/23.976 (film framerate to NTSC)
    FILMPAL:  0.96       = 24/25     (film framerate to PAL)
  -o O: offset all timestamps by time O.  Offset is added after scaling, i.e.
    new times are calculated as S*t+O.
  -a Ta1 Ta2 Tb1 Tb2: automatically calculate S and O from two pairs of times.
    Ta1 is the time at which a subtitle appears in the current SRT file, Ta2 is
    where it should appear in the output. The same for Tb1 and Tb2, for another
    subtitle.  For best accuracy, use the earliest and latest subtitles.
  -b Ta1 Ta2: like -a, but only calculate the offset O.
  -A F: automatically calculate S and O through a least-squares fit on multiple
    pairs of timestamps from a text file F. Each line must be a pair of stamps,
    separated by a space. The first stamp indicates when a subtitle currently
    appears and the second one when it should appear.
  -B F: like -A, but only calculate average offset from the pairs in file F.
  -i I: insert a new subtitle at index I (in the original file). This command
    can be repeated, e.g., to insert two subs at index 3, use -ii 3 3.
  -j J: insert a new subtitle at original time J (can be repeated as well).
  -J file.srt: insert subtitles from the given SRT file, using their timestamps
    relative to the original times of the other input files.
  -f: try to fix common OCR errors (tuned for English only). This may help to
    obtain a better result with -H.
  -H: attempt to remove typical non-verbal annotations in subs for the hearing
    impaired, e.g., (CLEARS THROAT).  You should combine this with -c.
    Repeat -H to try to remove non-capitalized annotations (mind that this has
    a higher risk to mess things up, so only use when necessary).
  -k K: extend the duration of each subtitle by K (at most, if no overlap).
  -l: report subtitles that appear too briefly or overly long, or overlap.
  -L: report and attempt to repair subtitles that appear too briefly or overlap.
  -d D: use custom seconds/characters ratio for minimum subtitle length in -l
    and -L (default: 0.034).
  -x: report subtitles with bad style, like too many lines.
  -m: add BOM character to output file if it is Unicode.
  -M: do not add BOM character to output file (default is same as input).
  -r: maintain Redmond-style compatibility with typewriters (CRLF). If this
    option is not enabled, any existing CR will be obliterated.
  -u: save output in UTF-8.
  -U: erase all subtitles that have a URL in them (should combine with -c).
  -w: Strip whitespace from beginning and end of lines
  -t: strip all SRT formatting and only output the text.
  -v: verbose mode.
  -V: print version and exit.

Version History

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.