Low latency (< 2s) live video streaming HTML5 solutions?

22,852

Technologies and Requirements

The only web-based technology set really geared toward low latency is WebRTC. It's built for video conferencing. Codecs are tuned for low latency over quality. Bitrates are usually variable, opting for a stable connection over quality.

However, you don't necessarily need this low latency optimization for all of your users. In fact, from what I can gather on your requirements, low latency for everyone will hurt the user experience. While your users in control of the robot definitely need low latency video so they can reasonably control it, the users not in control don't have this requirement and can instead opt for reliable higher quality video.

How to Set it Up

Robot Live Streaming Diagram

In-Control Users to Robot Connection

Users controlling the robot will load a page that utilizes some WebRTC components for connecting to the camera and control server. To facilitate WebRTC connections, you need some sort of STUN server. To get around NAT and other firewall restrictions, you may need a TURN server. Both of these are usually built into Node.js-based WebRTC frameworks.

The cam/control server will also need to connect via WebRTC. Honestly, the easiest way to do this is to make your controlling application somewhat web based. Since you're using Node.js already, check out NW.js or Electron. Both can take advantage of the WebRTC capabilities already built in WebKit, while still giving you the flexibility to do whatever you'd like with Node.js.

The in-control users and the cam/control server will make a peer-to-peer connection via WebRTC (or TURN server if required). From there, you'll want to open up a media channel as well as a data channel. The data side can be used to send your robot commands. The media channel will of course be used for the low latency video stream being sent back to the in-control users.

Again, it's important to note that the video that will be sent back will be optimized for latency, not quality. This sort of connection also ensures a fast response to your commands.

Video for Viewing Users

Users that are simply viewing the stream and not controlling the robot can use normal video distribution methods. It is actually very important for you to use an existing CDN and transcoding services, since you will have 10k-15k people watching the stream. With that many users, you're probably going to want your video in a couple different codecs, and certainly a whole array of bitrates. Distribution with DASH or HLS is easiest to work with at the moment, and frees you of Flash requirements.

You will probably also want to send your stream to social media services. This is another reason why it's important to start with a high quality HD stream. Those services will transcode your video again, reducing quality. If you start with good quality first, you'll end up with better quality in the end.

Metadata (chat, control signals, etc.)

It isn't clear from your requirements what sort of metadata you need, but for small message-based data, you can use a web socket library, such as Socket.IO. As you scale this up to a few instances, you can use pub/sub, such as Redis, to distribution messaging throughout the servers.

To synchronize the metadata to the video depends a bit on what's in that metadata and what the synchronization requirement is, specifically. Generally speaking, you can assume that there will be a reasonable but unpredictable delay between the source video and the clients. After all, you cannot control how long they will buffer. Each device is different, each connection variable. What you can assume is that playback will begin with the first segment the client downloads. In other words, if a client starts buffering a video and begins playing it 2 seconds later, the video is 2 seconds behind from when the first request was made.

Detecting when playback actually begins client-side is possible. Since the server knows the timestamp for which video was sent to the client, it can inform the client of its offset relative to the beginning of video playback. Since you'll probably be using DASH or HLS and you need to use MCE with AJAX to get the data anyway, you can use the response headers in the segment response to indicate the timestamp for the beginning the segment. The client can then synchronize itself. Let me break this down step-by-step:

  1. Client starts receiving metadata messages from application server.
  2. Client requests the first video segment from the CDN.
  3. CDN server replies with video segment. In the response headers, the Date: header can indicate the exact date/time for the start of the segment.
  4. Client reads the response Date: header (let's say 2016-06-01 20:31:00). Client continues buffering the segments.
  5. Client starts buffering/playback as normal.
  6. Playback starts. Client can detect this state change on the player and knows that 00:00:00 on the video player is actualy 2016-06-01 20:31:00.
  7. Client displays metadata synchronized with the video, dropping any messages from previous times and buffering any for future times.

This should meet your needs and give you the flexibility to do whatever you need to with your video going forward.

Why not [magic-technology-here]?

  • When you choose low latency, you lose quality. Quality comes from available bandwidth. Bandwidth efficiency comes from being able to buffer and optimize entire sequences of images when encoding. If you wanted perfect quality (lossless for each image) you would need a ton (gigabites per viewer) of bandwidth. That's why we have these lossy codecs to begin with.
  • Since you don't actually need low latency for most of your viewers, it's better to optimize for quality for them.
  • For the 2 users out of 15,000 that do need low latency, we can optimize for low latency for them. They will get substandard video quality, but will be able to actively control a robot, which is awesome!
  • Always remember that the internet is a hostile place where nothing works quite as well as it should. System resources and bandwidth are constantly variable. That's actually why WebRTC auto-adjusts (as best as reasonable) to changing conditions.
  • Not all connections can keep up with low latency requirements. That's why every single low latency connection will experience drop-outs. The internet is packet-switched, not circuit-switched. There is no real dedicated bandwidth available.
  • Having a large buffer (a couple seconds) allows clients to survive momentary losses of connections. It's why CD players with anti-skip buffers were created, and sold very well. It's a far better user experience for those 15,000 users if the video works correctly. They don't have to know that they are 5-10 seconds behind the main stream, but they will definitely know if the video drops out every other second.

There are tradeoffs in every approach. I think what I have outlined here separates the concerns and gives you the best tradeoffs in each area. Please feel free to ask for clarification or ask follow-up questions in the comments.

Share:
22,852

Related videos on Youtube

Titan
Author by

Titan

Updated on July 09, 2022

Comments

  • Titan
    Titan almost 2 years

    With Chrome disabling Flash by default very soon I need to start looking into flash/rtmp html5 replacement solutions.

    Currently with Flash + RTMP I have a live video stream with < 1-2 second delay.

    I've experimented with MPEG-DASH which seems to be the new industry standard for streaming but that came up short with 5 second delay being the best I could squeeze from it.

    For context, I am trying to allow user's to control physical objects they can see on the stream, so anything above a couple of seconds of delay leads to a frustrating experience.

    Are there any other techniques, or is there really no low latency html5 solutions for live streaming yet?

    • Titan
      Titan over 6 years
      while the project didn't come off in the end we had settled on wowza.com/products/capabilities/webrtc-streaming-software
    • A.J.Bauer
      A.J.Bauer over 6 years
      With MPEG-Dash being 5 seconds behind, Phoboslabs MPEG-1 about 1 second but giving warm handsets, WebRTC being a pain server side doing it on your own, this was probably a smart and time saving decision - thank you Titan.
    • andreymal
      andreymal about 6 years
      MPEG-DASH + H.264 + 0.5s GOP length = 2-3s delay
    • faraway
      faraway about 5 years
      Another WebRTC Low Latency Streaming solution is ant media server. Check it out antmedia.io
  • Titan
    Titan almost 8 years
    Sorry if that wasn't clear, 1k-10k users could be watching simultaneously
  • Brad
    Brad almost 8 years
    @GreenGiant You're going to have 1k-10k users simultaneously controlling physical objects in the same stream? That sounds unreasonable. Can you better describe what it is specifically you're trying to do? Are you sure you don't have a small number of people controlling physical objects (the people that need the low latency) and a large number of people watching (which could have higher latency)? 1-2 second delay with a stream that goes to 10k users is nearly impossible. You'll need to do custom everything basically, utilizing WebRTC on the clients but with the source being server-side.
  • Titan
    Titan almost 8 years
    Only 1-2 will have control simultaneously, but there could be thousands watching. Having variable delay in the feed for those in control vs those watching wouldn't be acceptable because the nodejs server relaying events to the client's browser to show feedback would be out of sync. Here is an example of what we've done in the past with flash (scroll to bottom for details) sidigital.co/sid
  • Brad
    Brad almost 8 years
    @GreenGiant Could you elaborate one what that feedback is? If all you need to do is sync out-of-band events to the video, you can simply delay those events. Depending on your video encoding, you may even be able to use the timestamp, but I haven't experimented to check browser compatibility on pulling the timestamp from the actual video. Server-side, you should be able to determine how far the client is behind (as far as the data you sent them), and client-side you can start displayed buffered events as soon as the video has buffered and begins to play.
  • Titan
    Titan almost 8 years
    With the robot arm previously, when the person controlling the robot dropped the ball through the hole I'd create a points bubble on the DOM that shows for all users. Visitors not in control experience the exact same thing as the person in control, except they don't actually have control. This is why it's important that all user's experience the same/similar low latency otherwise changes to the UI for score, event feedback etc will make no sense. I can't delay the events for non players because I wouldn't know their latency so I strive for lowest latency for all. It worked well Flash + RTMP
  • Titan
    Titan almost 8 years
    I've been looking into Janus janus.conf.meetecho.com/docs to replace the current webcam > nginx RTMP > flash clients setup I have currently. webcam > janus server > webRTC clients I'm currently using websockets for all communication with the robot etc so I don't need any direction there, it's just the video feed latency for all I'm struggling with in an HTML5 setup :)
  • Titan
    Titan almost 8 years
    I'm not after super high quality video by the way, 720 × 576 without audio for example would be acceptable for the size I need it in.
  • Brad
    Brad almost 8 years
    @GreenGiant You can know the delay. Please see the section about calibrating the timing by checking a response header. You will definitely not get low latency for all users. Sure, you can distribute to everyone with WebRTC, but where will you host such infrastructure? As far as I know, there is no CDN today that distributes via WebRTC, so you're stuck doing it on your own. Are you prepared to invest in cages at geographically distributed datacenters? The operational and financial overhead, I suspect, will consume your entire project. And, for what? Less users.
  • Brad
    Brad almost 8 years
    @GreenGiant Remember, that by distributing video via normal means, you will have better compatibility, and more flexibility when it comes to transcoding your video to a variety of bandwidth and codec settings.
  • Titan
    Titan almost 8 years
    I appreciate that but if you are even 5 seconds behind, you are going to be pulled into real time before the robot on your version of the stream has even finished playing (unless you have long periods between turns). With the robot example we had a queue of 700+ people waiting to play at one point, so we couldn't afford to have long periods between turns to account for other viewers being so far behind real time.
  • Brad
    Brad almost 8 years
    @GreenGiant I don't know how long your turns are (and, you really should put the full description of your problem in the question as others may have other creative solutions), but even if you have 700 people queued to play, you only have a couple on deck to play next. Switch 'em to WebRTC-mode before their turn.
  • Brad
    Brad almost 8 years
    @GreenGiant You mentioned in another comment that IE 9-11 are the norm for your corporate networks. One thing to keep in mind is that my answer allows you to support those users. They can watch the video on anything, since you have full flexibility if you split out video to have a low latency stream vs. a regular stream. To save on cost, I'd just fire up your WebRTC server, and then let YouTube handle the view-only stream. Switch them with JavaScript when they're next up to control the robot, so they are ready to go when control is switched to them.