Reading metadata from the <track> of an HTML5 <video> using Captionator

10,415

Solution 1

The way you're accessing the cue is correct - no problems there (although there will be a change in Captionator 0.6 from the .tracks property to the .textTracks property to be more in line with the specification. If you can bear the occasional bug I would recommend using 0.6 for its greater standards compliance - I've written the below code to use .textTracks - substitute for .tracks if you'd like to continue using the stable branch.)

The issue relates to the loading of the text tracks themselves. At the moment, you're not actually telling Captionator to load the track. Because this happens asynchronously, and on request, there is that inevitable delay where their content isn't available, you'll need to write your code in a way which accommodates for loading time and the potential load error.

You're also not waiting for Captionator itself to load - potentially a user could unknowingly click the button before this had occurred - triggering a nasty JavaScript error. This won't be such a problem when testing on your local box, but as soon as you deploy to the internet you'll be seeing all sorts of race conditions and other nasties. Consider disabling the button until both the page and the caption data have loaded.


I've tried to make the Captionator API as close as possible to the actual JS API which will be landing in browsers very soon - so in future this will be the same way you'll interact with the native browser functionality. As soon as the functionality is available natively, Captionator will bow out of the way, and your code should (assuming they don't change the API again!) just work with the native API.

First of all, you need to actually request that Captionator load the content. This is done my setting the 'display mode' of the track to SHOWING, or 2.

var video = document.getElementByID("myVideo");
video.textTracks[0].mode = 2; // SHOWING

Alternately, you can assign the status of a track to HIDDEN (1) - which still triggers a load, and cueChange events will still fire - but won't paint cues to screen. In Captionator, I don't paint metadata tracks to screen at all, but the (buggy) WebKit API in development will.

video.textTracks[0].mode = 1; // HIDDEN

Then you need to listen for when the cues are loaded and available:

video.textTracks[0].onload = function() { /* Your Code Here... */ }

Or when something goes wrong:

video.textTracks[0].onerror = function() { /* Whoah, something went wrong... */ }

Once the content is loaded, you can access the TextTrack.cues array (well, technically a TextTrackCueList.) Before the load has occurred, the TextTrack.cues property will be null.

var myCueText = video.textTracks[0].cues[0].text;

Be aware that Captionator parses the cue text of every cue, except when the track kind is metadata - so ensure you assign the correct track kind. You might end up with data or tags Captionator thinks are 'invalid' being thrown out. You can turn this check off for regular cues as well, with by setting the processCueHTML option to false.


With that in mind, here's how I'd rewrite your code:

<div>
    <p id="metadataText">Metadata text should appear here</p>
    <input type='button' onclick='changeText()' value='Click here to display the metadata text' id="changetext" disabled />
</div>

<video controls autobuffer id="videoTest" width="512" height="288">
    <!-- Your video sources etc... -->

    <!-- The metadata track -->
    <track label="Metadata Track" kind="metadata" src="metadata.vtt" type="text/webvtt" srclang="en" />
</video>

<!-- Include Captionator -->
<script type="text/javascript" src="captionator.js"></script>
<script type="text/javascript">
    document.addEventListener("readystatechange",function(event) {
        if (document.readyState === "complete") {
            captionator.captionify();

            document.querySelectorAll("#changetext")[0].removeAttribute("disabled");
        }
    },false);

    function changeText() {
        // Get the metadataText paragraph
        var textOutput = document.querySelectorAll("#metadataText")[0];

        // Get the metadata text track
        var metadataTrack = document.querySelectorAll("video")[0].textTracks[0];

        if (metadataTrack.readyState === captionator.TextTrack.LOADED) {
            // The cue is already ready to be displayed!
            textOutput.innerHTML = metadataTrack.cues[0].text;

        } else {
            // We check to see whether we haven't already assigned the mode.
            if (metadataTrack.mode !== captionator.TextTrack.SHOWING) {
                textOutput.innerHTML = "Caption loading...";

                // The file isn't loaded yet. Load it in!
                metadataTrack.mode = captionator.TextTrack.SHOWING; // You can use captionator.TextTrack.HIDDEN too.

                metadataTrack.onload = function() {
                    textOutput.innerHTML = metadataTrack.cues[0].text;
                }

                metadataTrack.onerror = function() {
                    textOutput.innerHTML = "Error loading caption!";
                }
            }
        }
    }

</script>

Here, we're disabling the button, preventing users on slow connections (or just somebody with very quick reflexes!) from hitting it before either Captionator or the metadata track are ready, and listening to a load event - at which point we re-enable the button, and can retrieve the cue text as normal.

Solution 2

You may need to load your metadata VTT file via Ajax and parse and display it yourself.

I looked at the example from the HTML5 Doctors' article on video subtitling. They're using Playr, so I checked out its source code, and they're definitely requesting the VTT file asynchronously and parsing the content once it's loaded.

I was able to load the contents of the VTT file and dump it into the specified element with the following code:

function changeText() {
    var track = document.getElementById("videoTest").querySelector("track");
    var req_track = new XMLHttpRequest();
    req_track.open('GET', track.getAttribute("src"));
    req_track.onreadystatechange = function(){
        if(req_track.readyState == 4 && (req_track.status == 200 || req_track.status == 0)){
            if(req_track.responseText != ''){
              document.getElementById("metadataText").innerHTML = req_track.responseText;
            }
        }
    }
    req_track.send(null);
}

I'm not familiar with Captionator, but it looks like it has some capabilities for parsing VTT files into some sort of data structure, even if it doesn't necessarily support the metadata track type. Maybe you can use a combination of this code and Captionator's existing VTT parser?

Share:
10,415
Steph
Author by

Steph

Working to become a better programmer, one day at a time.

Updated on June 04, 2022

Comments

  • Steph
    Steph almost 2 years

    I am having trouble getting a working example that reads metadata from a WebVTT file, which was specified by the <track> element of a <video> in an HTML5 page. To be clear, I am not talking about reading the metadata out of the video file itself (as you would with an MPEG Transport Stream, for instance). What I'm talking about is the <track> element that is used for captioning videos. One of the attributes of a <track> is kind, which can be specified as any of the following values:

    • Subtitles
    • Descriptions
    • Captions
    • Navigation
    • Chapters
    • Metadata

    I am trying to use the metadata type to access text stored in the corresponding WebVTT file, which I intend to manipulate using JavaScript. I know this is possible, as it is mentioned by Silvia Pfeiffer as well as by the maker of Captionator, which is the JavaScript polyfill that I am using to implement the functionality of interpreting the <track> tags. However, I just can't get it to work.

    My code is based on the Captionator documentation's captions example. I added a button to retrieve the metadata and display it when I click the button. Unfortunately it keeps displaying "undefined" instead of the metadata. Any ideas what I might be doing incorrectly? Alternatively, does anyone know where a working example is that I could take a look at? I can't find one anywhere.

    If you care to take a look at my code, I've included it below:

    <!DOCTYPE html>
    <html>
        <head>
            <title>HTML5 Video Closed Captioning Example</title>
            <meta charset="utf-8">
            <link rel="stylesheet" type="text/css" media="screen" href="js/Captionator-v0.5-12/css/captions.css"/>
        </head>
        <body>
            <h1>HTML5 Video Closed Captioning Example</h1>
            <div>
                <p id="metadataText">Metadata text should appear here</p>
                <input type='button' onclick='changeText()' value='Click here to display the metadata text'/>
            </div>
    
            <video controls autobuffer id="videoTest" width="1010" height="464">
                <source src="http://localhost:8080/Videos/testVideo.webm" type="video/webm" />
                <source src="http://localhost:8080/Videos/testVideo.mp4" type="video/mp4" />
    
                <!-- WebVTT Track Metadata Example -->
                <track label="Metadata Track" kind="metadata" src="http://localhost:8080/Videos/Timed_Text_Tracks/testVideo_metadata.vtt" type="text/webvtt" srclang="en" />
            </video>
    
            <!-- Include Captionator -->
            <script type="text/javascript" src="js/Captionator-v0.5-12/js/captionator.js"></script>
    
            <!-- Example Usage -->
            <script type="text/javascript" src="js/Captionator-v0.5-12/js/captionator-example-api.js"></script>
            <script type="text/javascript">
                window.addEventListener("load",function() {
                    captionator.captionify(null,null,{
                        debugMode: !!window.location.search.match(/debug/i),
                        sizeCuesByTextBoundingBox: !!window.location.search.match(/boundingBox/i),
                        enableHighResolution: !!window.location.search.match(/highres/i),
                    });
    
                    var videoObject = document.getElementsByTagName("video")[0];
                    videoObject.volume = 0;
                    document.body.appendChild(generateMediaControls(videoObject));  
                },false);
    
                function changeText() {
                    document.getElementById('metadataText').innerHTML = testVar;
                    var cueText = document.getElementById("video").tracks[0].activeCues[0].getCueAsSource();
                    document.getElementById('metadataText').innerHTML = cueText;
                }
            </script>
        </body>
    </html>
    

    My WebVTT file looks like this:

    WEBVTT
    
    0
    00:00.000 --> 00:04.000
    Testing 1 2 3 . . .
    
  • Steph
    Steph about 12 years
    Thanks for the help, your code worked for me. I feel like it kind of defeats the purpose of using WebVTT and a polyfill, but I suppose there really isn't a good solution at this point in time since no one seems to have implemented support for the metadata <track> yet.
  • Brandan
    Brandan about 12 years
    It seems that way. If you come up with something good, you could always contribute it to Captionator.
  • Christopher
    Christopher about 12 years
    @Steph I can assure you that Captionator does indeed support the metadata track type. :)
  • Steph
    Steph about 12 years
    Thank you so much for the detailed response, Christopher. It was a big help!