How does google analytics collect its data?

17,008

Solution 1

When html page makes a request for a ga.js file the http protocol sends big amount of data, about IP, refer, browers, language, system. There is no need to use ajax.

But still some data cant be achieved this way, so GA script puts image into html with additional parameters, take a look at this example:

http://www.google-analytics.com/__utm.gif?utmwv=4.3&utmn=1464271798&utmhn=www.example.com&utmcs=UTF-8&utmsr=1920x1200&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=10.0%20r22&utmdt=Page title&utmhid=1805038256&utmr=0&utmp=/&utmac=cookie value

This is a blank image, sometimes called a tracking pixel, that GA puts into HTML.

Solution 2

Some good answers here which individually tend to hit on one method or another for sending the data. There's a valuable reference which I feel is missing from the above answers, though, and covers all the methods.

Google refers to the different methods of sending data 'transport mechanisms'

From the Analytics.js documentation Google mentions the three main transport mechanisms that it uses to send data.

This specifies the transport mechanism with which hits will be sent. The options are 'beacon', 'xhr', or 'image'. By default, analytics.js will try to figure out the best method based on the hit size and browser capabilities. If you specify 'beacon' and the user's browser does not support the navigator.sendBeacon method, it will fall back to 'image' or 'xhr' depending on hit size.

  1. One of the common and standard ways to send some of the data to Google (which is shown in Thinker's answer) is by adding the data as GET parameters to a tracking pixel. This would fall under the category which Google calls an 'image' transport.
  2. Secondly, Google can use the 'beacon' transport method if the client's browser supports it. This is often my preferred method because it will attempt to send the information immediately. Or in Google's words:

This is useful in cases where you wish to track an event just before a user navigates away from your site, without delaying the navigation.

  1. The 'xhr' transport mechanism is the third way that Google Analytics can send data back home, and the particular transport mechanism that is used can depend on things such as the size of the hit. (I'm not sure what other factors go into GA deciding the optimal transport mechanism to use)

In case you are curious how to force GA into using a specific transport mechanism, here is a sample code snippet which forces this event hit to be sent as a 'beacon':

ga('send', 'event', 'click', 'download-me', {transport: 'beacon'});

Hope this helps.


Also, if you are curious about this topic because you'd like to capture and send this data to your own site too, I recommend creating a binding to Google Analytics' send, which allows you to grab the payload and AJAX it to your own server.

    ga(function(tracker) {

       // Grab a reference to the default sendHitTask function.
       originalSendHitTask = tracker.get('sendHitTask');

       // Modifies sendHitTask to send a copy of the request to a local server after
       // sending the normal request to www.google-analytics.com/collect.
       tracker.set('sendHitTask', function(model) {
         var payload = model.get('hitPayload');
         originalSendHitTask(model);

         var xhr = new XMLHttpRequest();
         xhr.open('POST', '/index.php?task=mycollect', true);
         xhr.send(payload);
       });
    });

Solution 3

Without looking at the code, I assume their data is collected from the HTTP headers they receive in the asynchronous request.

Remember that most browsers send data such as OS, platform, browser, version, locale, etc... Also they do have the IP so they can guesstimate your location. And I assume they have some sort of clever algorithm to decide whether you are a unique visitor or not.

Time on the site is probably calculated by using an onUnload() event.

Solution 4

Google Analytics web page provides detailed information of how Google Analytics server collect data. http://code.google.com/apis/analytics/docs/concepts/gaConceptsOverview.html

All Google Analytics data is collected and packed into the Request URL's query string and sent to Google Analytics server. The http request is made by a gif image(http://www.google-analytics.com/__utm.gif) activated by Google Analytics JS.

Solution 5

It's easy enough to tell by using something like Firebug's Net tab.

Ajax isn't needed - since data isn't being fetched from Google. They just encode the information in a query string, and then load a transparent gif using it.

Share:
17,008

Related videos on Youtube

echox
Author by

echox

Updated on November 18, 2020

Comments

  • echox
    echox over 3 years

    Yes, I know you have to embed the google analytics javascript into your page.

    But how is the collected information submitted to the google analytics server?

    For example an AJAX request will not be possible because of the browsers security settings (cross domain scripting).

    Maybe someone had already a look at the confusing google javascript code?

  • echox
    echox almost 15 years
    But google-analytics collects a lot more data, e.g. flash version, etc. They are not send with the http headers.
  • echox
    echox almost 15 years
    Thats not true, the query string is too short to contain that amount of information. There are only some unique ids and keywords encoded.
  • epascarello
    epascarello almost 15 years
    Just because the file is sitting on their servers does not magically give it the power to make an XMLHttpRequest to their servers.
  • Thinker
    Thinker almost 15 years
    Yes, but it is done other way than ajax, I added explanation in post.
  • echox
    echox almost 15 years
    Ok, i oversaw utmfl=10.0 for the flash version. Thx for the explaination.
  • tpk
    tpk over 14 years
    regarding the onUnload() event, this seems to prove GA doesn't do that: groups.google.com/group/analytics-help-troubleshoot/… also, go to your GA and check the average time for visits with 1 pageview - it's 0s.
  • xOneca
    xOneca over 10 years
    It now uses http(s)://www.google-analytics.com/collect?... (with other parameter names) to track visits. I can't find documentation about new parameter names.
  • darkace
    darkace over 8 years
    What about event trigger-based data. How would GA be sent that information?
  • Jake Wilson
    Jake Wilson about 8 years
    Why can some data not be sent via AJAX? The tracking parameters are obviously put together using JavaScript... Why can't the same parameters be sent via AJAX? Did Google choose the pixel route for efficiency?