How can I convert Amazon Transcribe json response to a caption format (srt, webvvt, etc)?

10,789

Solution 1

You probably would have found a way to do that or created a script. I also tried finding some ready made solution so ended up writing some JavaScript code to generate SRT from the JSON output of Amazon Transcribe.

https://www.yash.info/aws-srt-creator.htm

I am breaking sentences at period (.). It's a standalone HTML file. Feels free to download and modify as required.

Solution 2

I've used this python script from github and it formats really nicely into docx format. The output even includes scatterplots of the confidence levels of words as well as changing the colors to lower confidence words.

https://github.com/kibaffo33/aws_transcribe_to_docx

This worked really well for me, but I think you could have this go to html fairly simply if you wanted to alter the python script.

Share:
10,789

Related videos on Youtube

Daniel Angel
Author by

Daniel Angel

Updated on November 01, 2022

Comments

  • Daniel Angel
    Daniel Angel over 1 year

    Trying to find a package that convert my json response from the Amazon AWS Transcribe service with no luck.

    You can see an example of the JSON in the JavaScript part of the Fiddle.

    I wouldn't like to take the naive approach and just "bundle" like 10 words together as that would space the captions in a weird way.

    I'd even accept a programmatic way of doing it using the Google Speech service or Speechmatics. They all return a json file broken down by word.

    Anyone has worked with that before?

    Thanks!

    • Daniel Angel
      Daniel Angel over 4 years
      @nick I just posted an answer