How to convert a .txt subtitle file to .srt format?
Solution 1
This is very similar to @goldilock's approach but, IMO, simpler and can deal with empty lines in the file and replaces |
with a line break :
#!/usr/bin/env perl
my ($time, $text, $next_time, $next_text);
my ($c,$i)=0;
while (<>) {
## skip bad lines
next unless /^\s*([:\d]+)\s*:(.+)/;
## If this is the first line. I could have used $. but this is
## safer in case the file contains an empty line at the beginning.
if ($c == 0) {
$time=$1;
$text=$2;
$c++;
}
else {
## This is the counter for the subtitle index
$i++;
## Save the current values
$next_time=$1;
$next_text=$2;
## I am assuming that the | should be interpreted
## as a newline, remove this if I'm wrong.
$text=~s/\|/\n/g;
## Print the previous subttitle
print "$i\n$time,100 --> $next_time,000\n$text\n\n";
## Save the current one for the next line
$time=$next_time; $text=$next_text;
}
}
## Print the last subtitle. It will be dislayed for a minute
## 'cause I'm lazy.
$i++;
$time=~/(\d+:)(\d+)(:\d+)/;
my $newtime=$1 . (sprintf "%02d", $2+1) . $3;
print "$i\n$time,100 --> $newtime,000\n$text\n\n";
Save the script as a file and make it executable, then run:
./script.pl subfile > good_subs.srt
The output I get on your sample was:
1
00:00:44,100 --> 00:01:01,000
" Myślę, więc jestem".
Kartezjusz, 1596-1650
2
00:01:01,100 --> 00:01:06,000
Trzynaste Pietro
3
00:01:06,100 --> 00:01:10,000
Podobno niewiedza uszczęśliwia.
4
00:01:10,100 --> 00:01:13,000
Po raz pierwszy w życiu
zgadzam się z tym.
5
00:01:13,100 --> 00:01:15,000
Wolałbym...
6
00:01:15,100 --> 00:01:19,000
nigdy nie odkryć
tej straszliwej prawdy.
7
00:01:19,100 --> 00:02:19,000
Teraz już wiem...
Solution 2
What Thorsten meant is something like this:
#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);
my $END = '!!ZZ_END';
my $LastTitleDuration = 5;
my $count = 1;
my $line = <STDIN>;
chomp $line;
my $next = <STDIN>;
while ($line) {
$next = lastSubtitle($line) if !$next;
last if !$next;
chomp $next;
if (!($next =~ m/^\d\d:\d\d:\d\d:.+/)) {
print STDERR 'Skipping bad data at line '.($count+1).":\n$line\n";
$next = <STDIN>;
next;
}
printf STDOUT
"%d\r\n%s,100 --> %s,000\r\n%s\r\n\r\n",
$count++,
substr($line, 0, 8),
substr($next, 0, 8),
substr($line, 9)
;
} continue {
$line = $next;
$next = <STDIN>;
}
sub lastSubtitle {
my $line = shift;
$line =~ /^(\d\d:\d\d:)(\d\d):(.+)/;
return 0 if $3 eq $END;
return sprintf("$1%2d:$END", $2 + $LastTitleDuration);
}
When I feed your sample data into this, I get:
1
00:00:44,100 --> 00:01:01,000
" Myślę, więc jestem".|Kartezjusz, 1596-1650
2
00:01:01,100 --> 00:01:06,000
Trzynaste Pietro
3
00:01:06,100 --> 00:01:10,000
Podobno niewiedza uszczęśliwia.
4
00:01:10,100 --> 00:01:13,000
Po raz pierwszy w życiu|zgadzam się z tym.
5
00:01:13,100 --> 00:01:15,000
Wolałbym...
6
00:01:15,100 --> 00:01:19,000
nigdy nie odkryć|tej straszliwej prawdy.
7
00:01:19,100 --> 00:01:24,000
Teraz już wiem...
Couple of points:
The subtitles actually start 1/10th second late so they do not overlap, and because I was too lazy to add in some math involving the second timestamp. They then stay on until 1/10th second before the next title.
The last title stays up for
$LastTitleDuration
(5 seconds).I used CRLF line endings as per the SupRip wikipedia article although that may not be necessary.
-
It presumes the first line of input is not malformed. Beyond that, they are checked, and errors are reported to stdout, so:
readAlongToSRT.pl < readAlong.txt > whatever.srt
Should create the file but still print errors to the screen.
Processing will stop at a blank line.
See terdon's comment below re: the possible significance of
|
in the subtitle content. You may want to insert$line =~ s/|/\r\n/g;
before theprintf STDOUT
line.
This took me 20 minutes and the only test data I had was those 7 lines, so don't count on it being perfect. If there are ever line breaks in the subtitles, that will cause a problem. I presumed there aren't; if that is the case I suggest you remove them from the input first rather than trying to deal with them here.
Related videos on Youtube
VinoPravin
I'm a Debian user who wants to know all about the linux world. If there's a problem with some software or hardware under this operating system, I can fix it, of course I need some time to do that. I don't know many things, but sooner or later I always develop an OpenSource solution and make things work whether they like it or not.
Updated on September 18, 2022Comments
-
VinoPravin over 1 year
I have a subtitle file, it looks like this:
00:00:44:" Myślę, więc jestem".|Kartezjusz, 1596-1650 00:01:01:Trzynaste Pietro 00:01:06:Podobno niewiedza uszczęśliwia. 00:01:10:Po raz pierwszy w życiu|zgadzam się z tym. 00:01:13:Wolałbym... 00:01:15:nigdy nie odkryć|tej straszliwej prawdy. 00:01:19:Teraz już wiem...
I'm not sure what format this is, but I wanted to convert the subtitles to .srt. Unfortunately
gnome-subtitles
andsubtitleeditor
can't recognize this kind of format.gnome-subtitles
says:Unable to detect the subtitle format. Please check that the file type is supported.
subtitleeditor
says:Please check that the file contains subtitles in a supported format.
file
output:UTF-8 Unicode text
Is there a way to convert this file to .srt format?
-
goldilocks over 10 years<joke>This must be "read along using a stopwatch" format.</joke>
-
VinoPravin over 10 yearsSo, there's nothing I can do about it?
-
Thorsten Staerk over 10 yearsyou can find the srt format here en.wikipedia.org/wiki/SubRip, it should be obvious how to convert
-
-
terdon over 10 yearsDamn, beat me to it and using the same approach! Nice one, +1. I think that the
|
in the original format should be changed to\n
but that's just a guess. -
goldilocks over 10 years@terdon Hmmm, yeah that might make sense.
-
goldilocks over 10 yearsThe last subtitle in your output ends 0.1 seconds before it starts!
-
VinoPravin over 10 yearsThis works pretty well. I just need to customize some entries because they're displayed for a little bit too long. Maybe there's a way to put that in the script, let's say 5-8secs max. If you want to experiment more with the subtitles, I uploaded it to pasebin : pastebin.com/vZP419eG
-
terdon over 10 years@goldilocks damn, sorry, forgot
use Time::Machine
:). Thanks, fixed. -
terdon over 10 years@MikhailMorfikov it's possible but increases the complexity because that means that we need to manipulate times, so that
1:59 + 20 = 2:19
. This means either complex code or using external modules and seemed beyond the scope of the question. -
goldilocks over 10 years+1 Nice job. For the time you could use an algorithm based on the length of the text string, say 1/2 second per character but not exceeding the start of the next title.