Can you tell by the network traffic whether a video was watched or downloaded from YouTube?

5,704

Yes, it's possible to differentiate between these two use cases when looking at network traffic. The simple explanation is:

  • When you're downloading the raw video file with youtube-dl, you're loading a complete file at once.
  • When you're watching YouTube video through the browser, the Flash client downloads the video in chunks. The chunks fill up a buffer, and once that buffer is about to run out, the player fetches the next chunks.

Both can be done through HTTP these days. You can observe the client behavior when you load up a video. It is never completely downloaded at once: The buffer will be played out, then the next part will be loaded. This of course is visible in network traffic, as multiple requests are sent to YouTube for one resource over the course of time.

To cite Kuschnig et al. (see below):

A video segment is split into chunks of size lch, which are served by a standard HTTP server. The download of the video chunks is coordinated by the client. For that purpose, the client maintains nc HTTP-based request-response streams and schedules the downloads of the different chunks by using a separate queue for each stream

If you want more specifics about the YouTube streaming traffic, I could of course explain more. We currently conduct various simulated experiments regarding optimization of YouTube buffering and analysis of diverse video streaming scenarios.

Further reading:

  • Kuschnig, Robert, Ingo Kofler, and Hermann Hellwagner. "Evaluation of http-based request-response streams for internet video streaming." Proceedings of the second annual ACM conference on Multimedia systems. ACM, 2011 (PDF)

  • Stockhammer, Thomas. "Dynamic adaptive streaming over HTTP--: standards and design principles." Proceedings of the second annual ACM conference on Multimedia systems. ACM, 2011. (PDF)

Share:
5,704

Related videos on Youtube

humanityANDpeace
Author by

humanityANDpeace

Updated on September 18, 2022

Comments

  • humanityANDpeace
    humanityANDpeace over 1 year

    My question is about popular YouTube downloaders like youtube-dl (a command line program) or VideoDownloadHelper (a Firefox-browser extension).

    Comparing two cases:

    1. Watching a video on YouTube
    2. Download the video using a downloader (to be specific let's assume youtube-dl)

    Is it possible to tell – for instance by inspecting the network traffic – that the video was downloaded and not "only watched" on YouTube?

    Maybe one could compare network traffic using programs like Wireshark? I cannot do that myself, but maybe this will help somebody to answer the question.

    • Karan
      Karan over 11 years
      Unless it's some kind of special player+stream combo using anti-copying measures, when online videos are played they're also downloaded to your local machine, and you can copy them from your browser cache.
    • humanityANDpeace
      humanityANDpeace over 11 years
      I have to give this browser cache some background research. Maybe this way I can find a way / software which when downloading does not generate any difference to the simple watching of a video.
  • Karan
    Karan over 11 years
    Can't youtube-dl be made to use an https connection? Does YouTube always use https?
  • Karan
    Karan over 11 years
    So is it not entirely possible for a downloader to mimic data requests in a manner similar to that of the Flash client? Along with the correct user agent string and what not, would it still be possible to differentiate?
  • slhck
    slhck over 11 years
    Well, then of course it's splitting hairs between what's a proper video client or merely a downloader acting as such :) You're right of course: You could definitely mimic video player requests, and changing user agent strings would be another way to obscure traffic. I'm sure if you're clever enough you could fool any detection algorithm.
  • Karan
    Karan over 11 years
    True. Referring to the original question, Google/the music industry is not so stupid as to be ignorant of the fact that content can and is downloaded (often with multiple connections to the server using download accelerators). Guess either they don't care as long as it's for personal use, or don't want to reduce their popularity and/or spark off an arms race by introducing some form of DRM, or whatever. In any case, I doubt there'd be much left if all copyrighted content not uploaded by the copyright owners themselves were to be removed from YouTube. :)
  • humanityANDpeace
    humanityANDpeace over 11 years
    I see no reason why it should be impossible to make youtube-dl use https connections. Still the handling of https is a little more tricky and as it seems not required to achieve the goal (to provide a mechanism to download the resource). In the current way it would still not achieve the side-goal of downloading the data an "mimic video watching way". This goal (even with using https-connections) would not be achieved since I doubt the elaborate behaviour of the browser is immitated. I think youtube-dl is more like small python app.
  • humanityANDpeace
    humanityANDpeace over 11 years
    why the downvoting? It answers the question by showing an example of a case where it is different. At least it thereby partially responds to the question. It took some work to use wireshark and investigate this. I feel unappreciated for this work.
  • Karan
    Karan over 11 years
    Although I don't know who downvoted, don't take it so seriously. It's just how the site works.
  • humanityANDpeace
    humanityANDpeace over 11 years
    @ Karan :thanks for the consolation. Still I am confused to see a downvote on an "not-wrong" even partly helpful answer of mine. Instead of downvoting I would rather see better answers to be voted up. I am confused, since I though the site works the way that wrong answers are downvoted.
  • Karan
    Karan over 11 years
    Voting is done by people, and since when have people been known to always do what's sane or "right"? :)
  • Glenn Slayden
    Glenn Slayden about 7 years
    This answer was written in 2012 and the points it makes are somewhat misleading in today's environment. YouTube in particular is being quite aggressive with their DASH deployment, in fact requiring the use of that fragmentation protocol if you want to obtain the highest quality content rendition. Meanwhile, as regards the OP's question about distinguishing automated access, the fact that youtube-dl now supports DASH seems to obscure the premise of this answer..
  • slhck
    slhck about 7 years
    @GlennSlayden You're right in saying that DASH is now predominant as a streaming technology in YouTube, and that youtube-dl is using this protocol to fetch content. However, the traffic itself should look different, as a regular player download would fill the buffer and then enter an oscillating state where it depletes the buffer to a certain extent, then fills it up again. I am assuming that youtube-dl would do a best-effort download at full rates. (Of course, this remains to be verified…)
  • Glenn Slayden
    Glenn Slayden about 7 years
    @slhck Good points. Accurately simulating the oscillation pattern of a real-time download would quickly devolve to simply having the unattended download proceed in real time. If some automatic process does need to maintain that particular fiction, it could still try to obtain the same "net" (pun) bandwidth by pulling multiple feeds in parallel. The client would present an IP address just as suspiciously over-ravenous as before, barring specific subterfuge in that regard.