How to convert a .txt subtitle file to .srt format?

10,779

Solution 1

This is very similar to @goldilock's approach but, IMO, simpler and can deal with empty lines in the file and replaces | with a line break :

#!/usr/bin/env perl
my ($time, $text, $next_time, $next_text);
my ($c,$i)=0;
while (<>) {
    ## skip bad lines
    next unless /^\s*([:\d]+)\s*:(.+)/;
    ## If this is the first line. I could have used $. but this is
    ## safer in case the file contains an empty line at the beginning.
    if ($c == 0) {
      $time=$1; 
      $text=$2;
      $c++;
    }
    else {
      ## This is the counter for the subtitle index
      $i++;
      ## Save the current values
      $next_time=$1; 
      $next_text=$2;     
      ## I am assuming that the | should be interpreted
      ## as a newline, remove this if I'm wrong.
      $text=~s/\|/\n/g;     
      ## Print the previous subttitle
      print "$i\n$time,100 --> $next_time,000\n$text\n\n";        
      ## Save the current one for the next line
      $time=$next_time; $text=$next_text;
    }
}     
## Print the last subtitle. It will be dislayed for a minute
## 'cause I'm lazy.
$i++;
$time=~/(\d+:)(\d+)(:\d+)/;
my $newtime=$1 . (sprintf "%02d", $2+1) . $3;
print "$i\n$time,100 --> $newtime,000\n$text\n\n";    

Save the script as a file and make it executable, then run:

./script.pl subfile > good_subs.srt

The output I get on your sample was:

1
00:00:44,100 --> 00:01:01,000
" Myślę, więc jestem".
Kartezjusz, 1596-1650

2
00:01:01,100 --> 00:01:06,000
Trzynaste Pietro

3
00:01:06,100 --> 00:01:10,000
Podobno niewiedza uszczęśliwia.

4
00:01:10,100 --> 00:01:13,000
Po raz pierwszy w życiu
zgadzam się z tym.

5
00:01:13,100 --> 00:01:15,000
Wolałbym...

6
00:01:15,100 --> 00:01:19,000
nigdy nie odkryć
tej straszliwej prawdy.

7
00:01:19,100 --> 00:02:19,000
Teraz już wiem...

Solution 2

What Thorsten meant is something like this:

#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);

my $END = '!!ZZ_END';
my $LastTitleDuration = 5;

my $count = 1;
my $line = <STDIN>;
chomp $line;
my $next = <STDIN>;
while ($line) {
    $next = lastSubtitle($line) if !$next;
    last if !$next;
    chomp $next;
    if (!($next =~ m/^\d\d:\d\d:\d\d:.+/)) { 
        print STDERR 'Skipping bad data at line '.($count+1).":\n$line\n";
        $next = <STDIN>;
        next;
    }
    printf STDOUT
        "%d\r\n%s,100 --> %s,000\r\n%s\r\n\r\n",
        $count++,
        substr($line, 0, 8),
        substr($next, 0, 8),
        substr($line, 9)
    ;
} continue {
    $line = $next;
    $next = <STDIN>;
}

sub lastSubtitle {
    my $line = shift;
    $line =~ /^(\d\d:\d\d:)(\d\d):(.+)/;
    return 0 if $3 eq $END;
    return sprintf("$1%2d:$END", $2 + $LastTitleDuration);
} 

When I feed your sample data into this, I get:

1
00:00:44,100 --> 00:01:01,000
" Myślę, więc jestem".|Kartezjusz, 1596-1650

2
00:01:01,100 --> 00:01:06,000
Trzynaste Pietro

3
00:01:06,100 --> 00:01:10,000
Podobno niewiedza uszczęśliwia.

4
00:01:10,100 --> 00:01:13,000
Po raz pierwszy w życiu|zgadzam się z tym.

5
00:01:13,100 --> 00:01:15,000
Wolałbym...

6
00:01:15,100 --> 00:01:19,000
nigdy nie odkryć|tej straszliwej prawdy.

7
00:01:19,100 --> 00:01:24,000
Teraz już wiem...

Couple of points:

  • The subtitles actually start 1/10th second late so they do not overlap, and because I was too lazy to add in some math involving the second timestamp. They then stay on until 1/10th second before the next title.

  • The last title stays up for $LastTitleDuration (5 seconds).

  • I used CRLF line endings as per the SupRip wikipedia article although that may not be necessary.

  • It presumes the first line of input is not malformed. Beyond that, they are checked, and errors are reported to stdout, so:

    readAlongToSRT.pl < readAlong.txt > whatever.srt
    

    Should create the file but still print errors to the screen.

  • Processing will stop at a blank line.

  • See terdon's comment below re: the possible significance of | in the subtitle content. You may want to insert $line =~ s/|/\r\n/g; before the printf STDOUT line.

This took me 20 minutes and the only test data I had was those 7 lines, so don't count on it being perfect. If there are ever line breaks in the subtitles, that will cause a problem. I presumed there aren't; if that is the case I suggest you remove them from the input first rather than trying to deal with them here.

Share:
10,779

Related videos on Youtube

VinoPravin
Author by

VinoPravin

I'm a Debian user who wants to know all about the linux world. If there's a problem with some software or hardware under this operating system, I can fix it, of course I need some time to do that. I don't know many things, but sooner or later I always develop an OpenSource solution and make things work whether they like it or not.

Updated on September 18, 2022

Comments

  • VinoPravin
    VinoPravin over 1 year

    I have a subtitle file, it looks like this:

    00:00:44:" Myślę, więc jestem".|Kartezjusz, 1596-1650
    00:01:01:Trzynaste Pietro
    00:01:06:Podobno niewiedza uszczęśliwia.
    00:01:10:Po raz pierwszy w życiu|zgadzam się z tym.
    00:01:13:Wolałbym...
    00:01:15:nigdy nie odkryć|tej straszliwej prawdy.
    00:01:19:Teraz już wiem...
    

    I'm not sure what format this is, but I wanted to convert the subtitles to .srt. Unfortunately gnome-subtitles and subtitleeditor can't recognize this kind of format.

    gnome-subtitles says:

    Unable to detect the subtitle format. Please check that the file type is supported.

    subtitleeditor says:

    Please check that the file contains subtitles in a supported format.

    file output:

    UTF-8 Unicode text
    

    Is there a way to convert this file to .srt format?

    • goldilocks
      goldilocks over 10 years
      <joke>This must be "read along using a stopwatch" format.</joke>
    • VinoPravin
      VinoPravin over 10 years
      So, there's nothing I can do about it?
    • Thorsten Staerk
      Thorsten Staerk over 10 years
      you can find the srt format here en.wikipedia.org/wiki/SubRip, it should be obvious how to convert
  • terdon
    terdon over 10 years
    Damn, beat me to it and using the same approach! Nice one, +1. I think that the | in the original format should be changed to \n but that's just a guess.
  • goldilocks
    goldilocks over 10 years
    @terdon Hmmm, yeah that might make sense.
  • goldilocks
    goldilocks over 10 years
    The last subtitle in your output ends 0.1 seconds before it starts!
  • VinoPravin
    VinoPravin over 10 years
    This works pretty well. I just need to customize some entries because they're displayed for a little bit too long. Maybe there's a way to put that in the script, let's say 5-8secs max. If you want to experiment more with the subtitles, I uploaded it to pasebin : pastebin.com/vZP419eG
  • terdon
    terdon over 10 years
    @goldilocks damn, sorry, forgot use Time::Machine :). Thanks, fixed.
  • terdon
    terdon over 10 years
    @MikhailMorfikov it's possible but increases the complexity because that means that we need to manipulate times, so that 1:59 + 20 = 2:19. This means either complex code or using external modules and seemed beyond the scope of the question.
  • goldilocks
    goldilocks over 10 years
    +1 Nice job. For the time you could use an algorithm based on the length of the text string, say 1/2 second per character but not exceeding the start of the next title.