Friday, May 28, 2010

Subtitle manipulation tools for Linux

Subtitles may not mean much for the English-speaking part of the world, but for the rest of us, they are the difference between truly enjoying a movie or just watching the screen, trying to decipher the events. While Windows has a nice variety of tools to manipulate subtitles, Linux applications too can accomplish such tasks. From editing to ripping to converting, here is a list of some useful tools.

Gnome Subtitles is probably the best subtitle editing application in Linux. Not only does it offer an easy way to quickly translate subtitles in your own language, but it also provides a video player so that you can sync the text of the file with the words from the movie. Just pause, translate, and move on. You can mark the text as bold, italic, or underline with the click of a button. The application has a find and replace function that allows you to jump to known movie locations, and supports the different video framerates common to todays's movies: 23.976, 24, 25, 29.97, and 30.

On the left of the subtitle editing part of the window there is a column depicting the line number, and two other columns labeled From and To. A text line appears on the screen accordingly to the values of these two columns. You can choose between frames and times; times shows the position of the subtitle line in the movie by hour, minute, second, and millisecond and is easier to work with.

KSubtile is a KDE application that does the same thing as Gnome Subtitles, though it's not as intuitive and makes use of MPlayer to display the video. As soon as you load a subtitle, you'll see the number of lines in the file and the total length in minutes in the info tab. With KSubtile you can cut and copy subtitle lines from one part of the movie and paste them elsewhere. One downside of KSubtile is that it doesn't support UTF-8. One nice touch is a zoom feature that allows you to quickly find specified text. The subtitle text is represented as vertical lines in the Navigator part of the window. Wider lines mean frequent dialog. The editor offers two text boxes in the middle and two left and right arrows on the sides to help you navigate to the point you wish to change. Alternatively, you can press Select and jump directly to the place you want to edit the text.

While KSubtile makes use of MPlayer, the two components don't stay in sync, making the association pretty useless. As you scroll through the movie, the subtitles in the editing area remain the same, so you can't edit them based on the position in the video file sequence.

SubtitleEditor is a lightweight application, but it might help you translate faster than any of the other tools. In its main window you have the subtitle text, the start and end time sequence, the line number, and duration, among other columns. Just click on a line and you can already edit the text. It supports both ISO-8859-15 and UTF-8 and can play the associated movie file with MPlayer, VLC, or a video player of your choice. It even shows you the audio part of the movie so you'll know where to edit.

The application provides several view modes in the main window. Advanced view, for example, shows all the necessary columns, plus a style column and one that shows the number of character per line -- useful to approximate the text length as it appears on the screen with a certain font size. The Translation mode features the original text and an empty line at the right, ready to be filled. When you finish translating a file you can save it as SRT, SUB, TXT, ASS, or SSA.

A subtitling primer

A subtitle is a text representation of the dialog, narration, music, or sound effects in a video file. Subtitles are available in multiple formats, grouped in four main categories: text-based (SUB, SRT, SSA), HTML-based (such as SMI), XML (USF files), and image-based (Vobsub SUB/IDX files). SUB and SRT are the two most commonly used; both support line breaks. MPSub and SubViewer SUB files also support metadata info, in contrast with MicroDVD SUB files and SubRip SRT files.

Subtitles may also differ by the timing they use. MicroDVD SUB files and SAMI SMI files use frame IDs to keep the text in sync with the video, while MPSub SUB files use sequential time to do the same. The rest of the formats use the elapsed time of the video file to make marks that can be used as reference points for the text.

Other subtitle editing apps you might want to try are Gaupol, GSubEdit and Jubler.

Subtitle tools in Linux are not limited to editing applications. If you want to convert a SUB file into SRT format, you can use the sub2srt Perl script. It converts MicroDVD and SUBRipper files to SubViewer format. You can even change the framerate with the -f switch.

Subtitles is an all-purpose subtitle Swiss Army knife. It can reload, retime, and convert subtitle files in a lot of ways. For example, say you have a subtitle file that displays the text three seconds late compared to the video file. All you have to do is type into a terminal subs -i -b 3 subtitle_file.sub and it's fixed.

Or suppose you downloaded subtitles for a movie encoded at 24fps, but the file itself is at 25fps. You can convert it with subs -i -a 24/25 subtitle_file.sub. You can even merge two subtitle files in such a way that the final file has the read time information from the first subtitle file and the text from the second one. Just use subs -z subtitle_file1.sub subtitle_file2.sub.

You know those comments that appear in some movie subtitles for the hearingly impaired? Things like [Laughter] or [Distant roar]. You can strip those out by using subs -e 's/[\s-]*\[.*\]\s*\n*//gs' subtitle_file.sub. Among other things that Subtitles can do are splitting a file after a determined period of time, separating overlapped lines, and joining files into a single subtitle.

How about ripping? That is, getting the image subtitles from a DVD into a SRT text file. You can rip the DVD with any DVD ripping tool, creating a /vob directory on your hard drive. Then install the transcode package and run tccat -i /home/user/rip/vob | tcextract -x ps1 -t vob -a 0x22 > subs-en. In this particular case, the 0x22 stands for English. To rip the subtitles for other languages, check out which ones are available with mplayer -dvd-device /space/st-tng/disc1/ -dvd 1 -vo null -ao null -frames 0 -v 2>&1 | grep sid and add the corresponding output number to the hexadecimal 0x20. If the Italian subtitle file has an SID of 7, you would do a tccat -i /home/user/rip/vob | tcextract -x ps1 -t vob -a 0x27 > subs-it to get them.

Since the DVD subtitles are embedded images, you can extract them with subtitle2pgm -o english -c 255,0,255,255 <>. You need an external OCR application to detect the text in the image files. gocr does the job nicely: pgm2txt english. Finally, use srttool to create the SRT file: srttool -s -w <> english.srt. Check the file afterwards to be sure all characters were detected correctly and all the words make sense. You might want to use OpenOffice.org to spell check the text.

If you'd rather rip subtitles the easy way, you can use KSubtitleRipper, a front end to the subtitleripper package. It uses the same extracting and converting tools as the CLI examples above, but unifies their use into a comprehensive GUI. If gocr can't recognize some characters, KSubtitleRipper asks you to replace them manually. It can work with both .vob and .sub files and provides internal spell checking.

So there you have it: you can now rip, convert, translate or manipulate subtitle files in whatever way you like, thanks to these simple yet effective utilities. Source: linux.com