| table of contents --> next chapter <-- previous chapter | last modified on August 23, 2002 |
For this process you must have transcode and its sources. You need
tccat and tcextract from transcode itself and the
files in
First let's see which subtitles are available. We can use mplayer
for this task:
Now that we have the sid (subtitle ID) for the language that we want we can
fire up the transcode tools and let them extract the raw subtitle
stream:
The last step is to let srttool include the actual text into the
Voila, you have a working subtitle file. You can watch them with e.g.
transcode/contrib/subrip from the transcode
sources.
5.1.1. Compiling the tools
Unfortunately no binary package (RPM, deb) that I know of includes
subrip so we have to compile and install it ourselves. But this is
rather easy.
cd
transcode/contrib/subrip) and invoke make.srttool, subtitle2pgm
and pgm2txt to a directory in your PATH.pgm2txt if your
gocr does not support the -p option: at the end
there are two lines containing -p ${DBPATH}. Simply remove
it (after consultin gocr's manpage).5.1.2. Extracting the subtitle stream
Here I assume that you've copied your DVD with vobcopy -m meaning
that it has been completely mirrored including the .IFO files.
If not then you'll have to adjust the sources.mplayer -dvd-device /space/st-tng/disc1/ -dvd 1
-vo null -ao null -frames 0 -v 2>&1 | grep sid
This causes
mplayer to just print a lot of information about the source and
not to play anything at all. It should give you a list of subtitles:
[open] subtitle ( sid ): 0 language: da
[open] subtitle ( sid ): 1 language: de
[open] subtitle ( sid ): 2 language: en
[open] subtitle ( sid ): 3 language: es
[open] subtitle ( sid ): 4 language: fr
[open] subtitle ( sid ): 5 language: it
[open] subtitle ( sid ): 6 language: nl
[open] subtitle ( sid ): 7 language: no
[open] subtitle ( sid ): 8 language: sv
[open] subtitle ( sid ): 9 language: entccat -i /space/st-tng/dic1/ -T 1 -L | tcextract -x ps1 -t vob
-a 0x22 > subs-en
The -a 0x21 is the subtitle
stream's hexadecimal number: 0x20 + sid. Here I use the English subtitles.
5.1.3. Converting the raw stream
Ok, we have a raw subtitle stream - but what can we do with it? First we have
to convert each subtitle entry into a picture. This can be easily done
withsubtitle2pgm -o english -c 255,255,0,255 < subs-en
Here's a catch however. With -c you can specify the grey levels
used in the conversion. The idea is to make the job for gocr as
easy as possible. Therefore you might have to experiment with the parameters -
but this is easy, too. I've taken the following samples from my Star Trek -
The Next Generation DVD:
As you can see you need a picture that does not contain outlined characters.-c 0,255,255,255 - this is obviously wrong.-c 255,0,255,255 - this looks good.-c 255,255,0,255 - we don't want this one.-c 255,255,255,0 - we don't want this either.subtitle2pgm creates a lot of images - one for each subtitle -
and a control file, called english.srtx in my case, that contains
the duration for each subtitle. The next step is to let gocr recognize
the text:
pgm2txt english
Be warned - gocr will ask you often about charcters that it can't
recognize. This is normal. Once you're done you should run ispell
over all the newly created text files:
ispell -d american english*txt
Adjust the languange to your needs, of course..srtx file:
srttool -s -w < english.srtx > english.srt
mplayer -sub english.srt mymovie.avi
| table of contents --> next chapter <-- previous chapter | this guide was written by Moritz Bunkus |